88

I'm sure designers of languages like Java or C# knew issues related to existence of null references (see Are null references really a bad thing?). Also implementing an option type isn't really much more complex than null references.

Why did they decide to include it anyway? I'm sure lack of null references would encourage (or even force) better quality code (especially better library design) both from language creators and users.

Is it simply because of conservatism - "other languages have it, we have to have it too..."?

zduny
  • 2,633

10 Answers10

127

I'm sure designers of languages like Java or C# knew issues related to existence of null references

Of course.

Also implementing an option type isn't really much more complex than null references.

I beg to differ! The design considerations that went into nullable value types in C# 2 were complex, controversial and difficult. They took the design teams of both the languages and the runtime many months of debate, implementation of prototypes, and so on, and in fact the semantics of nullable boxing were changed very very close to shipping C# 2.0, which was very controversial.

Why did they decide to include it anyway?

All design is a process of choosing amongst many subtly and grossly incompatible goals; I can only give a brief sketch of just a few of the factors that would be considered:

  • Orthogonality of language features is generally considered a good thing. C# has nullable value types, non-nullable value types, and nullable reference types. Non-nullable reference types don't exist, which makes the type system non-orthogonal.

  • Familiarity to existing users of C, C++ and Java is important.

  • Easy interoperability with COM is important.

  • Easy interoperability with all other .NET languages is important.

  • Easy interoperability with databases is important.

  • Consistency of semantics is important; if we have reference TheKingOfFrance equal to null does that always mean "there is no King of France right now", or can it also mean "There definitely is a King of France; I just don't know who it is right now"? or can it mean "the very notion of having a King in France is nonsensical, so don't even ask the question!"? Null can mean all of these things and more in C#, and all these concepts are useful.

  • Performance cost is important.

  • Being amenable to static analysis is important.

  • Consistency of the type system is important; can we always know that a non-nullable reference is never under any circumstances observed to be invalid? What about in the constructor of an object with a non-nullable field of reference type? What about in the finalizer of such an object, where the object is finalized because the code that was supposed to fill in the reference threw an exception? A type system that lies to you about its guarantees is dangerous.

  • And what about consistency of semantics? Null values propagate when used, but null references throw exceptions when used. That's inconsistent; is that inconsistency justified by some benefit?

  • Can we implement the feature without breaking other features? What other possible future features does the feature preclude?

  • You go to war with the army you have, not the one you'd like. Remember, C# 1.0 did not have generics, so talking about Maybe<T> as an alternative is a complete non-starter. Should .NET have slipped for two years while the runtime team added generics, solely to eliminate null references?

  • What about consistency of the type system? You can say Nullable<T> for any value type -- no, wait, that's a lie. You can't say Nullable<Nullable<T>>. Should you be able to? If so, what are its desired semantics? Is it worthwhile making the entire type system have a special case in it just for this feature?

And so on. These decisions are complex.

Eric Lippert
  • 46,558
100

Disclaimer: Since I don't know any language designers personally, any answer I give you will be speculative.

From Tony Hoare himself:

I call it my billion-dollar mistake. It was the invention of the null reference in 1965. At that time, I was designing the first comprehensive type system for references in an object oriented language (ALGOL W). My goal was to ensure that all use of references should be absolutely safe, with checking performed automatically by the compiler. But I couldn't resist the temptation to put in a null reference, simply because it was so easy to implement. This has led to innumerable errors, vulnerabilities, and system crashes, which have probably caused a billion dollars of pain and damage in the last forty years.

Emphasis mine.

Naturally it didn't seem like a bad idea to him at the time. It's likely that it's been perpetuated in part for that same reason - if it seemed like a good idea to the Turing Award-winning inventor of quicksort, it's not surprising that many people still don't understand why it's evil. It's also likely in part because it's convenient for new languages to be similar to older languages, both for marketing and learning curve reasons. Case in point:

"We were after the C++ programmers. We managed to drag a lot of them about halfway to Lisp." -Guy Steele, co-author of the Java spec

(Source: http://www.paulgraham.com/icad.html)

And, of course, C++ has null because C has null, and there's no need to go into C's historical impact. C# kind of superseded J++, which was Microsoft's implementation of Java, and it's also superseded C++ as the language of choice for Windows development, so it could've gotten it from either one.

EDIT Here's another quote from Hoare worth considering:

Programming languages on the whole are very much more complicated than they used to be: object orientation, inheritance, and other features are still not really being thought through from the point of view of a coherent and scientifically well-based discipline or a theory of correctness. My original postulate, which I have been pursuing as a scientist all my life, is that one uses the criteria of correctness as a means of converging on a decent programming language design—one which doesn’t set traps for its users, and ones in which the different components of the program correspond clearly to different components of its specification, so you can reason compositionally about it. [...] The tools, including the compiler, have to be based on some theory of what it means to write a correct program. -Oral history interview by Philip L. Frana, 17 July 2002, Cambridge, England; Charles Babbage Institute, University of Minnesota.[ http://www.cbi.umn.edu/oh/display.phtml?id=343]

Again, emphasis mine. Sun/Oracle and Microsoft are companies, and the bottom line of any company is money. The benefits to them of having null may have outweighed the cons, or they may have simply had too tight a deadline to fully consider the issue. As an example of a different language blunder that probably occurred because of deadlines:

It's a shame that Cloneable is broken, but it happens. The original Java APIs were done very quickly under a tight deadline to meet a closing market window. The original Java team did an incredible job, but not all of the APIs are perfect. Cloneable is a weak spot, and I think people should be aware of its limitations. -Josh Bloch

(Source: http://www.artima.com/intv/bloch13.html)

Doval
  • 15,487
30

Null serves a very valid purpose of representing a lack of value.

I will say I'm the most vocal person I know about the abuses of null and all the headaches and suffering they can cause especially when used liberally.

My personal stance is people may use nulls only when they can justify it's necessary and appropriate.

Example justifying nulls:

Date of Death is typically a nullable field. There are three possible situations with date of death. Either the person has died and the date is known, the person has died and the date is unknown, or the person is not dead and therefore a date of death does not exist.

Date of Death is also a DateTime field and doesn't have an "unknown" or "empty" value. It does have the default date that comes up when you make a new datetime which varies based on language utilized, but there is technically a chance that person did in fact die at that time and would flag as your "empty value" if you were to use the default date.

The data would need to represent the situation properly.

Person is dead date of death is known (3/9/1984)

Simple, '3/9/1984'

Person is dead date of death is unknown

So what's best? Null, '0/0/0000', or '01/01/1869' (or whatever your default value is?)

Person is not dead date of death is not applicable

So what's best? Null, '0/0/0000', or '01/01/1869' (or whatever your default value is?)

So lets think each value over...

  • Null, it has implications and concerns you need to be wary of, accidently trying to manipulate it without confirming it's not null first for example would throw an exception, but it also best represents the actual situation... If the person isn't dead the date of death doesn't exist... it's nothing... it's null...
  • 0/0/0000, This could be okay in some languages, and could even be an appropriate representation of no date. Unfortunately some languages and validation will reject this as an invalid datetime which makes it a no go in many cases.
  • 1/1/1869 (or whatever your default datetime value is), the problem here is it gets tricky to handle. You could use that as your lack of value value, except what happens if I want to filter out all my records I don't have a date of death for? I could easily filter out people who actually died on that date which could cause data integrity issues.

The fact is sometimes you Do need to represent nothing and sure sometimes a variable type works well for that, but often variable types need to be able to represent nothing.

If I have no apples I have 0 apples, but what if I don't know how many apples I have?

By all means null is abused and potentially dangerous, but it's necessary at times. It's only the default in many cases because until I provide a value the lack of a value and something needs to represent it. (Null)

10

I wouldn't go so far as "other languages have it, we have to have it too..." like it's some sort of keeping up with the Joneses. A key feature of any new language is the ability to interoperate with existing libraries in other languages (read: C). Since C has null pointers, the interoperability layer necessarily needs the concept of null (or some other "does not exist" equivalent that blows up when you use it).

The language designer could've chosen to use Option Types and force you to handle the null path everywhere that things could be null. And that almost certainly would lead to less bugs.

But (especially for Java and C# due to the timing of their introduction and their target audience) using option types for this interoperability layer would likely have harmed if not torpedoed their adoption. Either the option type is passed all the way up, annoying the hell out of C++ programmers of the mid to late 90's - or the interoperability layer would throw exceptions when encountering nulls, annoying the hell out of C++ programmers of the mid to late 90's...

Telastyn
  • 110,259
9

First of all, I think we can all agree that a concept of nullity is necessary. There are some situations where we need to represent the absence of information.

Allowing null references (and pointers) is only one implementation of this concept, and possibly the most popular although it is know to have issues: C, Java, Python, Ruby, PHP, JavaScript, ... all use a similar null.

Why ? Well, what's the alternative ?

In functional languages such as Haskell you have the Option or Maybe type; however those are built upon:

  • parametric types
  • algebraic data types

Now, did the original C, Java, Python, Ruby or PHP support either of those features ? No. Java's flawed generics are recent in the history of the language and I somehow doubt the others even implement them at all.

There you have it. null is easy, parametric algebraic data types are harder. People went for the simplest alternative.

Matthieu M.
  • 15,214
6

Null/nil/none itself is not evil.

If you watch his misleadingly named famous speach "The Billion dollar Mistake", Tony Hoare talks about how allowing any variable to be able to hold null was a huge mistake. The alternative - using Options - does not in fact get rid of null references. Instead it allows you to specify which variables are allowed to hold null, and which aren't.

As a matter of fact, with modern languages that implement proper exception handling, null dereference errors aren't any different than any other exception - you find it, you fix it. Some alternatives to null references (the Null Object pattern for example) hide errors, causing things to silently fail until much later. In my opinion, its much better to fail fast.

So the question is then, why do languages fail to implement Options? As a matter of fact, the arguably most popular language of all time C++ has the ability to define object variables that cannot be assigned NULL. This is a solution to the "null problem" Tony Hoare mentioned in his speech. Why does the next most popular typed language, Java, not have it? One might ask why it has so many flaws in general, especially in its type system. I don't think you can really say that languages systematically make this mistake. Some do, some don't.

B T
  • 350
4

Because programming languages are generally designed to be practically useful rather than technically correct. The fact is that null states are a common occurrence due to either bad or missing data or a state that has not yet been decided. The technically superior solutions are all more unwieldy than simply allowing null states and sucking up the fact that programmers make mistakes.

For example, if I want to write a simple script that works with a file, I can write pseudocode like:

file = openfile("joebloggs.txt")

for line in file
{
  print(line)
}

and it will simply fail if joebloggs.txt doesn't exist. The thing is, for simple scripts that's probably okay and for many situations in more complex code I know it exists and the failure won't happen so forcing me to check wastes my time. The safer alternatives achieve their safety by forcing me to deal correctly with the potential failure state but often I don't want to do that, I just want to get on.

4

There are clear, practical uses of the NULL (or nil, or Nil, or null, or Nothing or whatever it is called in your preferred language) pointer.

For those languages that does not have an exception system (e.g. C) a null pointer can be used as a mark of error when a pointer should be returned. For example:

char *buf = malloc(20);
if (!buf)
{
    perror("memory allocation failed");
    exit(1);
}

Here a NULL returned from malloc(3) is used as a marker of failure.

When used in method/function arguments, it can indicate use default for the argument or ignore the output argument. Example below.

Even for those languages with exception mechanism, a null pointer can be used as indication of soft error (that is, errors that is recoverable) especially when the exception handling is expensive (e.g. Objective-C):

NSError *err = nil;
NSString *content = [NSString stringWithContentsOfURL:sourceFile
                                         usedEncoding:NULL // This output is ignored
                                                error:&err];
if (!content) // If the object is null, we have a soft error to recover from
{
    fprintf(stderr, "error: %s\n", [[err localizedDescription] UTF8String]);
    if (!error) // Check if the parent method ignored the error argument
        *error = err;
    return nil; // Go back to parent layer, with another soft error.
}

Here, the soft error does not cause the program to crash if not caught. This eliminates the crazy try-catch like Java have and have a better control in program flow as soft errors are not interrupting (and the few remaining hard exceptions are usually not recoverable and left uncaught)

4

There are two related, but slightly different issues:

  1. Should null exist at all? Or should you always use Maybe<T> where null is useful?
  2. Should all references be nullable? If not, which should be the default?

    Having to explicitly declare nullable reference types as string? or similar would avoid most (but not all) of the problems null causes, without being too different from what programmers are used to.

I at least agree with you that not all references should be nullable. But avoiding null is not without its complexities:

.NET initializes all fields to default<T> before they can first be accessed by managed code. This means that for reference types you need null or something equivalent and that value types can be initialized to some kind of zero without running code. While both of these have severe downsides, the simplicity of default initialization may have outweighed those downsides.

  • For instance fields you can work around this by requiring initialization of fields before exposing the this pointer to managed code. Spec# went this route, using different syntax from constructor chaining compared with C#.

  • For static fields ensuring this is harder, unless you pose strong restrictions on what kind of code may run in a field initializer since you can't simply hide the this pointer.

  • How to initialize arrays of reference types? Consider a List<T> which is backed by an array with a capacity larger than the length. The remaining elements need to have some value.

Another problem is that it doesn't allow methods like bool TryGetValue<T>(key, out T value) which return default(T) as value if they don't find anything. Though in this case it's easy to argue that the out parameter is bad design in the first place and this method should return a discriminating union or a maybe instead.

All of these problems can be solved, but it's not as easy as "forbid null and all is well".

CodesInChaos
  • 5,847
3

Most useful programming languages allow data items to be written and read in arbitrary sequences, such that it will often not be possible to statically determine the order in which reads and writes will occur before a program is run. There are many cases where code will in fact store useful data into every slot before reading it, but where proving that would be difficult. Thus, it will often be necessary to run programs where it would be at least theoretically possible for code to attempt to read something which has not yet been written with a useful value. Whether or not it is legal for code to do so, there's no general way to stop code from making the attempt. The only question is what should happen when that occurs.

Different languages and systems take different approaches.

  • One approach would be to say that any attempt to read something that has not been written will trigger an immediate error.

  • A second approach is to require code to supply some value in every location before it would be possible to read it, even if there would be no way for the stored value to be semantically useful.

  • A third approach is to simply ignore the problem and let whatever would happen "naturally" just happen.

  • A fourth approach is to say that every type must have a default value, and any slot which has not been written with anything else will default to that value.

Approach #4 is vastly safer than approaach #3, and is in general cheaper than approaches #1 and #2. That then leaves the question of what the default value should be for a reference type. For immutable reference types, it would in many cases make sense to define a default instance, and say that the default for any variable of that type should be a reference to that instance. For mutable reference types, however, that wouldn't be very helpful. If an attempt is made to use a mutable reference type before it has been written, there generally isn't any safe course of action except to trap at the point of attempted use.

Semantically speaking, if one has an array customers of type Customer[20], and one attempts Customer[4].GiveMoney(23) without having stored anything to Customer[4], execution is going to have to trap. One could argue that an attempt to read Customer[4] should trap immediately, rather than waiting until code attempts to GiveMoney, but there are enough cases where it's useful to read a slot, find out that it doesn't hold a value, and then make use of that information, that having the read attempt itself fail would often be a major nuisance.

Some languages allow one to specify that certain variables should never contain null, and any attempt to store a null should trigger an immediate trap. That is a useful feature. In general, though, any language which allows programmers to create arrays of references will either have to allow for the possibility of null array elements, or else force the initialization of array elements to data which cannot possibly be meaningful.

supercat
  • 8,629