4

I'd like to ask two supplementary/follow-on questions, further to this previous/existing question: Is there any tangible difference between a unique clustered index and a clustered primary key?

  • That question starts with, "I understand that there may be a difference in meaning or intent between the two". I'm a programmer not a DBA, and this fundamental might be unclear to me: what is the difference in meaning or intent between the two?

  • My summary of the accepted answer, i.e. its most important statement IMO, is that it says, "I don't think there's any difference". If that's so then why did Microsoft implement "clustered indexes"? Why not just say instead, "It's always clustered on the primary key, and you should define as the primary key whatever you want it to be clustered on"?

It seems to me that a primary key (already) is a unique clustered index.


Furthermore, here's a specific problem by way of example.

Let's say I have a table of Users (with a userId as its primary key), and a second table (e.g. Items) which defines items owned by each user. A user can own many items; each item is owned by one user, and has an itemId.

So the itemId could be the primary key of the Items table; and each row in the Items table (which has an itemId) also has a userId to identify its owner.

That's a good way to define a 1-N relationship, isn't it? Assume a foreign key contraint on userId, with Users being the parent table.

At run-time I usually want to retrieve all the items owned by a user, therefore the Items table should be clustered on its userId column.

[Users]
  userId
  + plus other user-specific fields

[Items]
  userId
  itemId
  + plus other item-specific fields

I think there are two ways to define this Items table:

  • itemId is primary non-clustered key, and (userId,itemId) is unique clustered index

... or:

  • (userId,itemId) is primary clustered key, and itemId is unique non-clustered index.

Which of the above two is better or more correct, semantically and/or practically, and why?


In case it makes a difference, the itemID is an artificial key: its purpose is to disambiguate/identify the item (and/or identify the item, within the set of items owned by the user).

  • 'Physically' the itemID is probably globally unique (or unique within the table, anyway): because databases make it easy to create a globally-unique artificial key.

  • 'Logically' I wouldn't mind if it were not-globally-unique, but were instead only unique-within-each-user, such that I needed both userId and itemId to uniquely identify an item, i.e.

    (userId,itemId) is primary clustered key and itemId is unique non-clustered index.

So I think it isn't altogether wrong to see (userId,itemId) as a composite primary key?

Apparently it's fine and normal to use "two separate attributes" as the primary key of an associative table when it's an N-to-N relationship. Is it wrong (e.g. harmful in some way, for some reason) to use two attributes as the primary key of an object in a 1-to-N relationship? Is it wrong to say that the owner ID is part of the object's identity?

ChrisW
  • 187
  • 1
  • 7

4 Answers4

2

why did Microsoft implement "clustered indexes"? Why not just say instead, "It's always clustered on the primary key, and you should define as the primary key whatever you want it to be clustered on"?

To answer this question: The primary key on the table does not have to be the Clustered Index Key. A Clustered Index is the place where SQL Server stores the data of the entire table. The Index itself has what are called "key values." Key Values (1 to many) define how the index is ordered, along with statistics and other useful things.

In fact, often it's useful to have a Primary Key that is not the Clustered Index Key. For example, some applications create their own keys or GUIDs in order to relate different tables through foreign keys. Using a GUID as a Clustered Index Key is a bad idea though, since GUIDs don't always have an order, unless you create them sequentially. To solve this, you could create a Clustered Index on an identity column, and place the Primary Key on the GUID column.

Your question has many more questions. I think we need to break this down into many questions, since you're asking about a lot of index concepts.

Arthur D
  • 952
  • 5
  • 11
2

Just dropping this here because I haven't seen it mentioned in other discussions on the same topic...

Primary key constraints in Sql Server have to have a unique name database wide while index names only have to be unique on the table.

So if you have a system that stamps out tables of the same form (e.g. a table per day to store orders or something like that), you might want to lean towards a UNIQUE CLUSTERED INDEX over a PRIMARY KEY constraint to avoid the naming collisions.

The "UNIQUE CLUSTERED will allow 1 NULL" point mentioned can be brought into alignment with PRIMARY KEY by declaring the column(s) in question NOT NULL.

user1664043
  • 307
  • 4
  • 13
1

I'm going to take a stab at answering this, please correct me if I'm wrong.

There's little or no 'real' (tangible) difference between the two. The only real difference is that a UNIQUE CLUSTERED INDEX can contain NUL values whereas a PRIMARY KEY cannot.

The ability to define them separately (i.e. to define clustering or something other than the primary key) was probably implemented so that you can do your design in two independent stages:

  1. Logical design where you choose the primary/identity key (artificial or otherwise)
  2. Physical design where you optimize for performance (by choosing indexes)

Default behaviour (if not otherwise specified) is to cluster on the primary key; but there's no need to let that physical design choice define the choice of primary key (because you can define a clustered index that's not the default key).

As suggested in @ArthurD's comment there may be two ways where either way might arguably be used, with no clear-cut preference for one or the other.


Edited to add:

There's a specific, theoretical definition of what a 'primary key' is: see What does this definition of a Primary key mean?

So for example if itemId were unique then (userId,itemId) would not (should not or could not) be a primary key: so itemId would be the primary key, and the clustering that you want on (userId,itemId) would be on a non-primary-key index.

Conversely if itemId were only unique within each userId, then obviously `itemd couldn't by itself be a primary key.

ChrisW
  • 187
  • 1
  • 7
-1

Unique clustered index : when created, it creates an index not any other key and also enforce the uniqueness for the key columns used. this is one way of keeping non-primary key columns unique and have the table shorted physically according to the key columns used.

Clustered primary key : When created, it creates primary key & index both and used when table is needed to be shorted physically in same order as primary key column. Because by definition primary key is unique, it can be clustered or non-clustered without loosing its uniqueness