7

In my company we have this obligatory practice before a review to be send the binary size impact of the code change to be measured and provided.

We must use -Os for this metric to avoid unpredictable inlining. There is no customer or product driven arguments for it. It is a big server that is installed as singleton of a dedicated machine, where the executable size is completely ignorable in comparison to all the other resources involved in the system.

The main argument for this practice is that -Os binary size correlates with code complexity.

Is measuring the binary size a reliable metric to use when evaluating code complexity? Or for evaluating anything else?

Rachel
  • 24,037
gsf
  • 274

5 Answers5

21

As noted in the comments, size of the binary could be very important for some embedded systems - especially old ones.

However, as you've noted in the update to the question

There is no customer or product driven arguments for it. It is a big server that is installed as singleton of a dedicated machine, where the executable size is completely ignorable in comparison to all the other resources involved in the system.

This is one of the most pointy-haired schemes I've heard in a long time. You'll be penalized for including a library that's well tested and solves a lot of problems, but they'll let a Bubble sort get past?

Seriously, it would be useful to see some justification for their main argument that the binary size correlates with other qualities of the code. It's entirely possible that I'm dead wrong and there is such a correlation, but I kind of doubt it.

Robert Harvey
  • 200,592
Dan Pichelman
  • 13,853
6

There have been a few changes and updates, so I'm going with rev4 of your question.

The main argument for this practice is that -Os binary size correlates with code complexity.

You're asking size-optimized binary size to act as a proxy for code complexity; from version to version that number can go up or down. If you said that observed "up" measurements generally correlate to increases in code complexity I'd say that's ... reasonable.

But that alone is hardly actionable.

You haven't said why your bosses are asking you to do it, or how they use the information. If they are rejecting a change solely because the -Os version of the binary went up 3K, that sounds mighty foolish. If they are just interested in seeing/charting/graphing changes to the code over time, that sounds okay, especially because it should/could/might be easy to compute this number during an automated build process. If nothing else, metrics can start conversations and/or act as sanity checks.

Scenario 1

Boss: I see the size is down, how'd that happen?

Coder: Marketing finally gave us permission to completely remove that set of deprecated features.

Boss: Oh, nice.

Scenario 2

Boss: A 30% increase in size? I thought the new feature involved a minor tweak!

Coder: That's what we thought too, but once we understood what the client was asking for, we had to blah blah blah....


Final note, in case someone thinks otherwise, I'll state explicitly: complexity != maintainability != quality
brian_o
  • 160
1

The size of the binary is pretty much only a good indication of the size of the binary. Code complexity fundamentally stems from "choice" (be that the choice to terminate a loop, a condition in am if statement, a switch or cond block, duck-typing, having multiple classes implementing an interface, ...). For any given binary size, there's a binary of comparable size that is composed entirely of linear, non-branching code.

Change in binary size (which is what you actually seem to be asking about) may or may not correlate to changes in complexity. Is it "more complex" to pull in a well-tested library to do one (or more) things? Probably not in any useful sense, as long as the library interfaces are clean and sufficiently "black box". Would it increase the binary size? Most probably.

Vatine
  • 4,269
1

What does code complexity mean in this context? Are we talking in terms of machine code or the C++ source code that humans have to maintain? I would say "not necessarily" even if the former, but especially "no" for the latter.

In a past company I worked at, we shipped over 90 megabytes just in terms of binaries, and that was an optimized release build stripped of debugging info and so forth. And it was a pretty complex codebase, millions of LOC, but it would be very misleading to deduce the code complexity from that binary size...

... because one weekend I just kind of fooled around with the build system and got the whole thing to result in less than 10 megabytes without changing any code or changing the optimization settings. All I did was dynamically link the CRT (C runtime lib) and a couple other big libaries in our third party section. Presto; that eliminated all sorts of code being statically linked into all of our binaries (which spanned in the hundreds with all the plugins we had). I didn't check those changes in since I was just goofing around (and wasn't sure about the impact of depending on dylibs in that case and C++ can get hairy working across module boundaries if the code wasn't written upfront to deal with it to avoid doing things like throwing across boundaries), but it at least kinda satisfied my curiosity that the binaries at least didn't have to theoretically be so massive.

When I see other products that have massive binaries, I tend to wonder if they might benefit from the same thing. On the flip side, it might not be worth caring about this stuff so much outside of embedded devices with our hard drives being epic and bandwidth being plentiful. I still like the aesthetics of small binaries though and things that install and download very quickly and so forth. It's one of the places where I'm still very impractical and would prefer to spend time making binaries smaller, starting out from days where a 10MB hard drive was considered awesome, but again I don't consider binary size to be a reflection of the complexity of a codebase since many binaries are probably statically linking redundant code over and over.

My memory is a bit fuzzy, but I somewhat recall a time around 25 years ago when you could take a C++ "Hello World" program and statically link everything and end up with a binary that was a few hundred KB, and that got a lot of criticism from C programmers. My memories might be exaggerating that a bit (maybe it wasn't that huge), but I sorta recall a time when some C++ compilers, now faded into obscurity, would tend to generate massive binaries for the most trivial programs even without any debugging info or anything like that. That was back before I/O streams were standardized in C++, I think (sorry, my memory is so hazy).

0

The binary size is the most fundamental metric of any build because it will hit the bottom line of the deliverable, for better or worse.

For large programs, as a general rule, binary size will reflect complexity. This seems counter intuitive because redundant or duplicated code would seem to be a path to greater size. In practice, however, for non-trivial programs redundant code tends to be hard to maintain so it is self limiting. In other words, the larger a program gets, the more essential it becomes to abstract it to support further growth. For this reason, binary size tends to approximate complexity.

Note that it is only a very rough metric, however. In some cases you can find bloatware by large organizations, like IBM, where they have succeeded in creating large binaries using very large teams of developers which are somewhat redundant. However, even in these case the law of maintainability applies and even IBM is not immune to it; eventually even the most massive manhour efforts will succumb to inertia if they are not abstracted.

Thus, as rule, especially for a given corporate context binary size will provide a very rough generalization of complexity.