8

In SQL, NULL means "unknown value". Thus, every comparison with NULL yields NULL (unknown) rather than TRUE or FALSE.

From a conceptional point of view, this three-valued logic makes sense. From a practical point of view, every learner of SQL has, one time or another, made the classic WHERE myField = NULL mistake or learned the hard way that NOT IN does not do what one would expect when NULL values are present.

It is my impression (please correct me if I am wrong) that the cases where this three-valued logic helps (e.g. WHERE myField IS NOT NULL AND myField <> 2 can be shortened to WHERE myField <> 2) are rare and, in those cases, people tend to use the longer version anyway for clarity, just like you would add a comment when using a clever, non-obvious hack.

Is there some obvious advantage that I am missing? Or is there a general consensus among the development community that this has been a mistake?

Heinzi
  • 9,868

5 Answers5

3

I think the crux of the problem is that as well as "UNKNOWN" it is also used to mean "NOT APPLICABLE" or "ABSENT" e.g. You have a PERSONS table with a SPOUSE_ID. What do you put in their for a single person? In most cases a designer will make this field NULLable to be filled with the partners ID when available and left blank for sad singles and happy bachelors.

In my experience this is actually the most common use for NULLs. So while a comparison of two UNKNOWN values should result in another UNKNOWN; a comparison between two ABSENT values should result in equality -- but SQL does not allow for this.

It would have been trivial to add another extra operator (say "==") to the mass of SQL keywords and operators which would indicate you want 2 nulls to be considered equal.

While I think the relational model is sound and has a long future ahead, I think the mess that is SQL is due for a total rethink. It would be nice if we could start again from the very beginning and have an API based on Codd's original relational algebra.

1

The problem here is that NULL isn't a value--it's a nebulous set of values, and you don't know which one it is. Setting equality here is meaningless, because then the values wouldn't be NULL. It enforces the mathematical underpinnings of relational databases. In many ways, it's like asking why infinity = infinity isn't valid.

mgw854
  • 1,828
  • 10
  • 11
1

In your particular example, it doesn't make a big difference, but in general, this behavior is necessary for queries to give valid results.

Three-valued logic is just an instance of the more general principle that any expression which contains a NULL evaluates to NULL. E.g. NULL - 17 yields NULL.

This happens because NULL means unknown or not-applicable. Let's say you don't know the balance on a given bank account. Then you withdraw $17. What is the balance now? The only reasonable answer is "I still don't know". Giving any number would be wrong.

Even if a database does not contain any NULL's in the stored data, SQL queries can easily return unknown results. A valid query can return an empty set, and a lot of operations, like taking the MAX of a given column, will not have a meaningful result for an empty set. So NULL's are unavoidable, and you need some way to handle them which does not give you misleading output.

Eliminating three-valued logic would mean that any boolean expression would either be true or false, even if the answer is unknowable or meaningless. So you would not be able to distinguish between a valid result and an invalid result, which would make the database pretty useless (if not dangerous).

JacquesB
  • 61,955
  • 21
  • 135
  • 189
0

This seems to be a duplicate question... https://stackoverflow.com/questions/7078837/why-doesnt-sql-support-null-instead-of-is-null

I don't think it's a mistake - this behavior is described in the ANSI standard.

Most databases allow you to change behavior of the equality operator.

set ansi_nulls on
if null = null
   print 'this will not print' 
set ansi_nulls off
if null = null  
   print 'this should print'

Also, you can note that many programming languages expect similar semantics when comparing objects against null.

0

TRUE and FALSE make statements about the content of a value. NULL indicates the complete absence of any value at all. If NULL behaved the same way as FALSE, then you would be unable to account correctly for the absence of data in a query, something that is very important in a database. I think that alone makes NULL different enough from TRUE/FALSE that it merits being handled specially.

Evicatos
  • 672