9

If reading uninitialized memory is undefined behavior anyway, why has the C++ standard not been changed so that objects of primitive type (int, unsigned, float, double, bool, char) get default-initialized to 0? Wouldn't this prevent an entire class of typical beginner mistakes? From my understanding, it should be possible to define any new behavior in this way that has previously been undefined.

Is there a specific reason, such as a possible ABI break, that prevents this change?

4 Answers4

13

The guiding principle there is still "You don't pay for what you don't use". It has not been changed to "Let's try to play it safe, and hope the compiler fixes our performance".

It's only "try to play it safe", as zero/null while a valid value for all fundamental types, is not always sensible. Put another way, not all values are always allowed. Actually, debug-modes which initialize to something distinctly non-zero to help debug errors the compiler fails to diagnose are quite common. 0xDEADBEEF for data and OXDEADC0DE for code are long-standing favorites for their distinctiveness in a hex-viewer, 0xAA is not uncommon for a single-byte pattern.

Proving that a useless just-in-case zero-initialization is actually useless is an impossible task in the general case, especially in the face of partial information (what can that function return, given what I know about the arguments) and limited time (yes, AOT compiler have much more than JIT compiler, but combinatorial explosion is really astoundingly fast).

As a matter of Quality of Implementation, compilers are free to warn about use-before-assignment, if they can be reasonably sure it happens, so the programmer can fix their code. And most do, if you politely ask for that help.

Deduplicator
  • 9,209
2

[...] so that objects of primitive type (int, unsigned, float, double, bool, char) get default-initialized to 0? Wouldn't this prevent an entire class of typical beginner mistakes?

I don’t think so. Usually the actual mistake is that the programmer either assumed the variable would be initialized to something useful after the declaration, or simply forgot to init it right there.

In many cases forcing zero initialization would only mask the problem. 0 is a valid value for all of the fundamental types but that doesn’t imply it’s useful and semantically valid for the use case in question. Consider:

/* Just for demonstration! This is BAD code! */

int result; // uninitialized

// The function is supposed to initialize result to >=1 using an out parameter. // But it’s buggy and doesn’t touch result at all. fill_result(result);

// result is read here in some way.

Currently reading result at the end is undefined behaviour. Forced zero init wouldn’t improve the situation because the variable still does not contain a semantically valid value. It’s expected to hold a value greater than 0, after all. The error is just masked and likely to lead to problems later in the execution of the program. That’s very similar to undefined behaviour.

I can’t speak for the C++ committee, of course. And I cannot look into the mind of the C++ community as a whole. But I guess this is at least one important part of the reason why forced zero init isn’t considered as a change to the language – “you don’t pay for what you don’t use” being another important part.

Imo a better, although ultimately just as hopeless, idea would be to disallow uninitialized variables altogether. That would solve the problem, but would also be a backwards incompatible change to a commonly used feature. A proposal that’s guaranteed to break a vast amount of existing code is highly unlikely to make it into the standard.

What’s left is static and dynamic code analysis to detect such bugs.

besc
  • 1,163
  • 6
  • 7
2

If reading uninitialized memory is undefined behavior anyway, why has the C++ standard not been changed so that objects of primitive type (int, unsigned, float, double, bool, char) get default-initialized to 0?

I see two important reasons

  • compatiblity with past specifications of C++ (e.g. n3337) and past implementations (compilers like GCC)

  • performance: if the compiler generated an implicit zero-ing of every automatic variable, more machine code would have to be generated, and this hurts performance

Notice that in 2021 the GCC compiler is free software (you can improve it) and accepts plugins: you are allowed to extend GCC with your plugin providing the behavior you want (and you might use the Bismon static source code analyzer, above GCC, as a starting point. Contact me by email to basile.starynkevitch@cea.fr)

And the Clang compiler is open source, so you are allowed to improve it.

If you use GCC, invoke it with all warnings and debug info, e.g. g++ -Wall -Wextra -g and learn to use the GDB debugger.

Once you have a proof of concept implementation implementing your desired behavior, you could propose an improvement to the future C++ standard. I am sure the C++ standard committee will be happy to discuss that item with you. They would ask experimentation and benchmarks.

They are corner cases (e.g. PRNG and its seeding, or implementation of ASLR inside some operating system kernel?) where a non-initialized random value might perhaps desired.

-1

It’s allowed because C and C++ are old. And enforcing such a rule is difficult. But C++ doesn’t allow the use of undefined variables, it’s undefined behaviour. If you look at a modern language like Swift, that enforces the rule that a variable may not be used unless it is initialised in such a way that the compiler can determine that it is initialised. It allows variables not being initialised.

Forcing initialisation has the huge disadvantage that you can’t force initialisation to a reasonable value. And blindly initialising everything means the compiler cannot tell you when there was no meaningful initialisation. For example: int yearOfBirth = 0; Now it’s “initialised” but the value is nonsense.

gnasher729
  • 49,096