7

I use buffer for quite a long time when I need to copy a stream or read a file.

And every time I set my buffer size to 2048 or 1024, but from my point of view a buffer is like a "bucket" which carries my "sand" (stream) from one part of my land (memory) to an other part.

So, increase my bucket capacity will in theory allow me to do less travel? Is this a good things to do in programming?

gnat
  • 20,543
  • 29
  • 115
  • 306
Cyrbil
  • 183

4 Answers4

11

There is an optimal size to a buffer. Too small a buffer can trigger more system calls than necessary, while too big a buffer can trigger unnecessary reloads of the CPU cache. The best way to answer this question for your specific situation is to use a profiler.

7

The answer is: it depends. Unfortunately, there is NO single answer to your question. The number of variables (which include the speed of the hardware, the source of the stream, the type of the disk the file is being read from, memory available, OS file caching algorithm, etc, etc) all affect the answer.

For particular situations, I advise performance measurement to see if a bugger buffer helps.

Michael Kohne
  • 10,146
6

Let's pretend you are copying a data structure from one file to another, and you use a buffer to store the data between the time you read it and the time you write it.

There is an overhead when you read and write data. On disk, the head has to find the sector and read or write the track. In memory, it takes a processor instruction to move a chunk of memory (usually 1-8 bytes at a time) plus a bus operation to move data from one part of memory to another, or between memory and the processor or memory and disk. Each chunk that you read is processed in a loop somewhere and the smaller the chunks, the more times the loop has to be executed.

If your buffer is a single byte, you will incur this overhead every time you read or write a byte of data. In our example, the disk can't read and write simultaneously, so the write may have to wait until the read is finished. For a one-byte file, this is the best you can do, but for a 1MB file, this will be extremely slow.

If you have a 10MB buffer and want to copy a 10MB file, you can read the whole thing into your buffer, then write it all out again in one step.

Now, if you want to copy a 20GB file, you probably don't have that much memory. Even if you do, if every program allocated 20GB of memory for buffers, there wouldn't be anything left! When you allocate memory, you have to release it, and both the allocation and release can take time.

If a client of some kind is waiting for whole chunks of data, sometimes smaller chunks are better. If the client gets a few chunks and knows they don't want the rest, they can abort, or maybe they can display what they have while waiting for more so that a human user can see that something is going on.

If you know the amount of data you are copying before you have to allocate your buffer, you can make a buffer that's the ideal size for the data you are copying. Either the exact size of all your data, or big enough for the data to be copied in a reasonable number of chunks. If you have to guess, some size around 1MB is reasonable for an unknown purpose.

To create the perfect sized buffer, you need to study the data that you are going to use it for. If you are copying files, how big are most of the files people copy? Then you guess at a good buffer size and time it. Tweak the size and time it again. Your total available memory may limit your maximum size. Eventually you arrive at the ideal buffer size for your specific goal.

GlenPeterson
  • 14,950
2

It all depends on what you are doing and with what machinery and so forth. Try different numbers and see what happens.

However, I have found that the bigger the buffer, the faster the reads and writes. I mention this because you talk about 1024 and 2048. Try some really big buffers instead. I found in one case I was reading 8 times as fast by switching from 8Kb up to 100Kb, and I got noticable improvements up to 1Mb.

I'm no hardware expert, but I've found generally computers do sequential byte copies at many times the speed of an individual byte copy. Maybe they do things in parallel, maybe it moves data through caches faster, maybe it's magic. But using big buffers and array copies (or loops that optimizers can turn into array copies) can save a lot of time.

RalphChapin
  • 3,320