1

I realize that this question is similar to Low level programming - what's in it for me, but the answers didn't really address my question well. Part from just an understanding, how exactly does your low level knowledge translate into faster and better programs?

There's the obvious lack of stop-the-world from garbage collection, but what else is an advantage? Do you really outperform your optimizing compiler? Do you pack your data structures in as tight as possible and be concerned about alignment? There's extra freedom naturally, but does that really translate into a faster program?

Peter Smith
  • 2,577

5 Answers5

10

It's not just about faster. Extra CPU cycles from various CPU pipeline stall penalties and extra memory accesses due to cache misses and such, in sufficiently repetitive performance critical code, can add up to take measurable extra energy. A big cost in data centers these days is energy and cooling costs, and a big cost in mobile devices is the battery weight. The transistors have become nearly free, but the nanoWatts have not.

An optimizing compiler can help, but it might not know all the possible algorithmic transformations that still can produce a useful result. (e.g. humans sometimes know how to cheat and get away with it.) The short vector/SIMD math units in many CPUs these days (even on mobile device processors) can often only be fully utilized with some low-level knowledge about them.

Embedded and signal processing programmers use this low-level knowledge all the time, sometimes just to meet spec. If you can pack your code to run in a 10 cent cheaper processor, and end up shipping millions per annum embedded in some toy, you more than paid for a software engineer's salary or three.

hotpaw2
  • 7,988
5

You're right that some low-level knowledge doesn't matter. For example, I think you'd be nuts to routinely re-jigger arithmetic to replace multiplication and division with equivalent bit-shift operations to save a cycle or two. The speed-up is likely to be microscopic, the compiler might do it anyway, and I suspect doing things like triangle_area = (base >> 1) * height is just asking for trouble when someone looks at your code in six months. Similarly, a lot older code jumps through incredible hoops to reduce the number of floating point operations; that's also not worth the hassle, mental overhead, and potential for bugs these days.

On the other hand, there are other low-level concepts that can give you relatively large gains for minimal work. For example, consider caching. For example, you could iterate over an NxN matrix like this:

for(i=0; i<N; ++i)
   for(j=0; j<N; ++j)
       do_something(matrix[i][j]);

or like this:

for(j=0; j<N; ++j)
  for(i=0; i<N; ++i)
     do_something(matrix[i][j]);

If the matrix is suitably large, one of these (depending on how the language lays out the array) will cause a cache-miss for every single array access, while the other will be optimal. Assuming you've got to iterate over the matrix somehow, why not prefer the potentially faster one? They're equally readable, equally writable, etc.

In general, I think appreciating the things going on "under the hood" can help inform your decisions when programming in higher-level languages (e.g., Can I code this in a way that doesn't trash the cache over and over again?). If you're unfamiliar with the relationship between an array index and a memory address, you might not appreciate how important it is to check array indices and buffer sizes. I think you could glean a lot of this information from the higher-level language's documentation, but it'll definitely be more in-your-face in C.

Obviously, the trick is not to go overboard. There is a theoretical speed difference between i++ and ++i, but if copying a single four-byte variable makes or breaks your web-app, the app is already in pretty dire straights.

3

Understanding how memory behaves is IMO among the most fruitful things. Particularly:

  • What CPU cache is and why it matters.
  • Why primitives, and particularly arrays of primitives, are faster than equivalent (arrays of) objects.
  • Why in multidimensional arrays rows and columns aren't equivalent at all (alignment).
  • Memory barriers, locking vs. CAS operations.

Also you can help the optimizing compiler: for example, by making simple for loop through an array with explicit start and end indices may allow the compiler to optimize away bound checking (which is about the only thing that makes Java arrays slower than native arrays).

As with all optimization, it only matters where it does :)

2

Performance isn't the only benefit to understanding low-level details in your program. For example, if an application I write crashes, I often get one or more of these things:

  • a crash log containing a backtrace in the form of addresses on the call stack, and the value of the program counter and other registers
  • a core dump, containing the state of the process's virtual memory at the time of the crash.

In order to work out what's going on there, you need some knowledge of how the memory layout of the CPU architecture works, the calling conventions of the platform your code is running on and often the ability to read a disassembly of the platform code that appears in the stack trace.

0
  • IO (file or any other output device)
  • Networking (IO, but important enough that it deserves its own bullet)
  • Memory hierarchies (understand why it's bad to go from cacheA -> cacheB -> main mem -> disk)

Of course all of the low level stuff is valuable at some level, but these three will pay pretty good dividends to an application-level programmer to understand. Edit: these also are places where low-level techniques are frequently applied.

anon
  • 1,494