8

I read in the book "Game Engine Architecture" by Jason Gregory that:

"It’s possible to access data items that are narrower than the width of a machine’s data bus, but it’s typically more costly than accessing items whose widths match that of the data bus. For example, when reading a 16-bit value on a 64-bit machine, a full 64 bits worth of data must still be read from memory. The desired 16-bit field then has to be masked off and possibly shifted into place within the destination register."

So, if I just use long long (64 bit integers), will my code be faster than if I use less bits integers (short, int)? If so, why still using shorts and ints if we can get larger values (when needed) and faster code with just long longs?

5 Answers5

11

Robert Harvey's answer is fully correct, but since you mentioned Game Engine Architecture, I think it is worth to add a few words to the case where speed actually could matter.

The described effect of "more CPU work" when "reading a 16-bit value on a 64-bit machine" is something which on some CPUs may occur, and on other's not. But when all what happens in a bottleneck section of a program is 32 bit integer arithmetic, then replacing 32 bit variables blindly by 64 bit variables will seldom speed up a program. Quite the opposite, this introduces the risk of making a program exhaust the CPU caches or the available RAM earlier, which can easily result in slowing it down.

In my experience, for gaining a noteable speed by more bits, one needs to utilize the extra bits actively. This can be achieved, for example,

  • by using adapted algorithms (like memory copying with 64 bits)

  • by using certain SIMD instructions which make use of the broader data bus.

I have never seen any noteable performance gain in a real world program by replacing 32 bit int data type by a 64 bit long long without changing anything else.

Doc Brown
  • 218,378
10

In practice, it's not going to matter most of the time.

  1. It won't matter in most programs.
  2. For some programs where it might potentially matter, it might still not matter because there's no significant, measurable performance difference.
  3. For those programs where there is a measurable performance difference, it might still not matter because the program performs adequately anyway.

And so, in order for this to matter, there must be:

  1. A performance difference,
  2. That can be measured,
  3. Where that measurable performance difference materially impacts your application.

"Materially" means that:

  1. The measured performance difference is significant enough to cause one of your performance requirements to fail, and
  2. Using the higher-performing technique, either alone or in concert with other improvement techniques, causes the performance requirement to succeed, as measured by your performance tests.

In short, measure. Profile your code using both techniques, and then use those measurements to determine what to do. It's the only way to be sure.


Incidentally, this is the reason many programming languages use a 32 bit signed integer as the default numeric type. It's usually the best compromise for speed, flexibility, storage space and numeric resolution.

Robert Harvey
  • 200,592
3

So, if I just use long long (64 bit integers), will my code be faster than if I use less bits integers

Not automatically no. Robert has already covered the most of it. But there was one thing I thougtht was worth mentioning. If you save four 16 bit integers in one 64 bit integer, you will of course have a bit overhead separating them. But the code can become much more cache friendly, which can impact performance to a quite large degree in some cases.

I once had an array of one million 32 bit ints and I treated them as Boolean variables. But when I changed so that each individual bit was a Boolean, then the array could be just 1/32 of the original size, which made the code MUCH faster.

It is correct that using the native word size may require less cpu instructions to do what you want, but this is not certain. The cpu may contain special instructions to deal with word fractions. But even if it really yields less cpu instructions, it still don't automatically transform into better performance.

klutt
  • 1,438
0

How many integers are we talking about? 1, 10, or 10 million?

For significant amounts of data cache usage is important. If 20,000 16 bit ints fit into L1 cache, and 20,000 64 bit ints don’t, then 16 bit will be a lot faster. If you look at individual variables, the opposite is true. The native size will be fastest. And there can be a huge difference with SIMD processors: if you have 256 bit registers you will handle sixteen 16-bit values in one instruction as fast as four 64-bit values.

gnasher729
  • 49,096
0

Addressing a specific part of your quote:

For example, when reading a 16-bit value on a 64-bit machine, a full 64 bits worth of data must still be read from memory. The desired 16-bit field then has to be masked off and possibly shifted into place within the destination register.

This is true for some architectures, but not all of them. Particularly, the x86-64 architecture has instructions for working directly with 16-bit values, so doing so on that architecture will not require mask and shift operations. Futhermore, while 64 bits[1] will still be fetched from memory, if subsequent instructions need to use the data in the other 48 bits they will be able to do so without needing to access main memory again due to the processor's cache. For this architecture specifically, therefore, this advice is wrong.

[1]: or, more likely, 128 bits, as this is the width of the cache on most modern processors

occipita
  • 209