In SQL Server, technically, NULL = NULL is false. By that logic, no NULL is equal to any NULL and all NULLs are distinct. Shouldn't this imply that all NULLs are unique, and a unique index should allow any number of NULLs?
5 Answers
Why does it work this way? Because way back when, someone made a design decision without knowing or caring about what the standard says (after all, we do have all kinds of weird behaviors with NULLs, and can coerce different behavior at will). That decision dictated that, in this case, NULL = NULL.
It wasn't a very smart decision. What they should have done is have the default behavior adhere to the ANSI standard, and if they really wanted this peculiar behavior, allow it through a DDL option like WITH CONSIDER_NULLS_EQUAL or WITH ALLOW_ONLY_ONE_NULL.
Of course, hindsight is 20/20.
And we have a workaround, now, anyway, even if it isn't the cleanest or most intuitive.
You can get the proper ANSI behavior in SQL Server 2008 and above by creating a unique, filtered index.
CREATE UNIQUE INDEX foo ON dbo.bar(key) WHERE key IS NOT NULL;
This allows more than one NULL value because those rows are completely left out of the duplicate checking. As an added bonus, this would end up being a smaller index than one that consisted of the entire table if multiple NULLs were allowed (especially when it isn't the only column in the index, it has INCLUDE columns, etc). However, you may want to be aware of some of the other limitations of filtered indexes:
- 181,950
- 28
- 405
- 624
Correct. The implementation of a unique constraint or index in sql server allows one and only one NULL. Also correct that this technically doesn't fit with the definition of NULL but it's one of those things they did to make it more useful even though it isn't "technically" correct. Note a PRIMARY KEY (also a unique index) does not allow NULLs (of course).
- 24,307
- 13
- 63
- 116
This may not be technically accurate, but philosophically it helps me sleep at night...
Like several other have said or alluded to, if you think of NULL as unknown, then you cannot determine whether one NULL value is in fact equal to another NULL value. Thinking of it this way, the expression NULL == NULL should evaluate to NULL, meaning unknown.
A Unique constraint would need a definitive value for the comparison of the column values. In other words, when comparing a single column value against any other column value using the equality operator, it must evaluate to false to be valid. Unknown isn't really false even though it is often treated as falsy. Two NULL values could be equal, or not... it simply can't be definitively determined.
It helps to think of a unique constraint as restricting values that can be determined to be distinct from one another. What I mean by this is if you run a SELECT that looks something like this:
SELECT * from dbo.table1 WHERE ColumnWithUniqueContraint="some value"
Most people would expect one result, given that there is a unique constraint. If you allowed multiple NULL values in ColumnWithUniqueConstraint, then it would be impossible to select a single distinct row from the table using NULL as the compared value.
Given that, I believe that regardless of whether or not it's implemented accurately with respect the definition of NULL, it's definitely a lot more practical in most situations than allowing multiple NULL values.
- 25,255
- 13
- 54
- 100
- 39
- 3
First - stop using the phrase "Null value", it will just lead you astray. Instead, use the phrase "null marker" - a marker in a column indicating that the actual value in this column is either missing or inapplicable (but note that the marker does not say which of those options is actually the case¹).
Now, imagine the following (where the database does not have complete knowledge of the modeled situation).
Situation Database
ID Code ID Code
-- ----- -- -----
1 A 1 A
2 B 2 (null)
3 C 3 C
4 B 4 (null)
The integrity rule we are modelling is "the Code must be unique". The real-world situation violates this, so the database shouldn't allow both items 2 and 4 to be in the table at the same time.
The safest, and least-flexible, approach would be to disallow null markers in the Code field, so there is no possibility of inconsistent data. The most flexible approach would be to allow multiple null markers and worry about uniqueness when values are entered.
The Sybase programmers went with the somewhat-safe, not-very-flexible approach of only allowing one null marker in the table - something commentators have been complaining about ever since. Microsoft have continued this behaviour, I guess for backwards compatibility.
¹ I am sure I read somewhere that Codd considered implementing two null markers - one for unknown, one for inapplicable - but rejected it, but I can't find the reference. Am I remembering correctly?
P.S. My favourite quote about null: Louis Davidson, "Professional SQL Server 2000 Database Design", Wrox Press, 2001, page 52. "Boiled down to a single sentence: NULL is evil."
- 4,389
- 1
- 17
- 23
One of the major purposes of a UNIQUE constraint is to prevent duplicate records. If one needs to have a table in which there can be multiple records where a value is "unknown", but no two records are allowed to have the same "known" value, then the unknown values should be assigned artificial unique identifiers before they are added to the table.
There are a few rare cases in which a column which has a UNIQUE constraint and contains a single null value; for example, if a table contains a mapping between column values and localized text descriptions, a row for NULL would make it possible to define the description that should appear when that column in some other table is NULL. The behavior of NULL allows for that usage case.
Otherwise, I see no basis for a database with a UNIQUE constraint on any column to allow the existence of many identical records, but I see no way to prevent that while allowing multiple records whose key values are not distinguishable. Declaring that NULL isn't equal to itself won't make NULL values distinguishable from each other.
- 93
- 2