21

I have been researching the concept of ROWGUID's recently and came across this question. This answer gave insight, but has led me down a different rabbit hole with the mention of changing the primary key value.

My understanding has always been that a primary key should be immutable, and my searching since reading this answer has only provided answers which reflect the same as a best practice.

Under what circumstances would a primary key value need to be altered after the record is created?

5crammed
  • 321
  • 2
  • 3
  • 8

8 Answers8

27

If you were using a person's name as a primary key and their name changed you would need to change the primary key. This is what ON UPDATE CASCADE is used for since it essentially cascades the change down to all related tables that have foreign-key relationships to the primary key.

For example:

USE tempdb;
GO

CREATE TABLE dbo.People
(
    PersonKey VARCHAR(200) NOT NULL
        CONSTRAINT PK_People
        PRIMARY KEY CLUSTERED
    , BirthDate DATE NULL
) ON [PRIMARY];

CREATE TABLE dbo.PeopleAKA
(
    PersonAKAKey VARCHAR(200) NOT NULL
        CONSTRAINT PK_PeopleAKA
        PRIMARY KEY CLUSTERED
    , PersonKey VARCHAR(200) NOT NULL
        CONSTRAINT FK_PeopleAKA_People
        FOREIGN KEY REFERENCES dbo.People(PersonKey)
        ON UPDATE CASCADE
) ON [PRIMARY];

INSERT INTO dbo.People(PersonKey, BirthDate)
VALUES ('Joe Black', '1776-01-01');

INSERT INTO dbo.PeopleAKA(PersonAKAKey, PersonKey)
VALUES ('Death', 'Joe Black');

A SELECT against both tables:

SELECT *
FROM dbo.People p
    INNER JOIN dbo.PeopleAKA pa ON p.PersonKey = pa.PersonKey;

Returns:

enter image description here

If we update the PersonKey column, and re-run the SELECT:

UPDATE dbo.People
SET PersonKey = 'Mr Joe Black'
WHERE PersonKey = 'Joe Black';

SELECT *
FROM dbo.People p
    INNER JOIN dbo.PeopleAKA pa ON p.PersonKey = pa.PersonKey;

we see:

enter image description here

Looking at the plan for the above UPDATE statement, we clearly see both tables are updated by a single update statement by virtue of the foreign key defined as ON UPDATE CASCADE:

enter image description here click the image above to see it in more clarity

Finally, we'll cleanup our temporary tables:

DROP TABLE dbo.PeopleAKA;
DROP TABLE dbo.People;

The preferred1 way to do this using surrogate keys would be:

USE tempdb;
GO

CREATE TABLE dbo.People
(
    PersonID INT NOT NULL IDENTITY(1,1)
        CONSTRAINT PK_People
        PRIMARY KEY CLUSTERED
    , PersonName VARCHAR(200) NOT NULL
    , BirthDate DATE NULL
) ON [PRIMARY];

CREATE TABLE dbo.PeopleAKA
(
    PersonAKAID INT NOT NULL IDENTITY(1,1)
        CONSTRAINT PK_PeopleAKA
        PRIMARY KEY CLUSTERED
    , PersonAKAName VARCHAR(200) NOT NULL
    , PersonID INT NOT NULL
        CONSTRAINT FK_PeopleAKA_People
        FOREIGN KEY REFERENCES dbo.People(PersonID)
        ON UPDATE CASCADE
) ON [PRIMARY];

INSERT INTO dbo.People(PersonName, BirthDate)
VALUES ('Joe Black', '1776-01-01');

INSERT INTO dbo.PeopleAKA(PersonID, PersonAKAName)
VALUES (1, 'Death');

SELECT *
FROM dbo.People p
    INNER JOIN dbo.PeopleAKA pa ON p.PersonID = pa.PersonID;

UPDATE dbo.People
SET PersonName = 'Mr Joe Black'
WHERE PersonID = 1;

For completeness, the plan for the update statement is very simple, and shows one advantage to surrogate keys, namely only a single row needs to be updated as opposed to every row containing the key in a natural-key scenario:

enter image description here

SELECT *
FROM dbo.People p
    INNER JOIN dbo.PeopleAKA pa ON p.PersonID = pa.PersonID;

DROP TABLE dbo.PeopleAKA;
DROP TABLE dbo.People;

The output from the two SELECT statements above are:

enter image description here

Essentially, the outcome is approximately the same. One major difference is the wide natural key is not repeated in every table where the foreign key occurs. In my example, I'm using a VARCHAR(200) column to hold the person's name, which necessitates using a VARCHAR(200) everywhere. If there are a lot of rows and a lot of tables containing the foreign key, that will add up to a lot of wasted memory. Note, I'm not talking about disk space being wasted since most people say disk space is so cheap as to be essentially free. Memory, however, is expensive and deserves to be cherished. Using a 4-byte integer for the key will save a large amount of memory when you consider the average name length of around 15 characters.

Tangential to the question about how and why keys can change is the question about why to choose natural keys over surrogate keys, which is an interesting and perhaps more important question, especially where performance is a design-goal. See my question here about that.


1 - http://weblogs.sqlteam.com/mladenp/archive/2009/10/06/Why-I-prefer-surrogate-keys-instead-of-natural-keys-in.aspx

Hannah Vernon
  • 70,928
  • 22
  • 177
  • 323
9

While you can use a key which is natural and/or mutable as your PK, in my experience that leads to problems, which can often be prevented by the use of a PK which meets these conditions:

 Guaranteed Unique, Always Exists, Immutable, and Concise.

For example, many companies in the US try to use Social Security Numbers as personal ID numbers, (and PKs) in their systems. Then they run into the following problems - data entry errors leading to multiple records that have to be repaired, people who do not have an SSN, people whose SSN is changed by the government, people who have duplicate SSNs.

I've seen every single one of those scenarios. I've also seen companies who didn't want their customers to be "just a number", which meant that their PK ended up being 'first+middle+last+DOB+zip' or some similar nonsense. While they did add enough fields to almost guarantee uniqueness, their queries were horrendous, and updating any one of those fields meant chasing down data consistency issues.

In my experience, a PK generated by the database itself is almost always a better solution.

I recommend this article for additional pointers: http://www.agiledata.org/essays/keys.html

Byron Jones
  • 206
  • 1
  • 6
7

The primary key could be altered when synchronization is involved. This could be the case when you have a disconnected client and it synchronizes the data with the server at certain intervals.

A few years ago I worked on a system where all event data on the local machine had negative row Ids, like -1, -2, etc. When the data was synchronized to the server, the row Id on the server was applied to the client. Let's say the next row Id on the server was 58. Then -1 would become 58, -2 59 and so forth. That row ID change would be cascaded to all child FK records on the local machine. The mechanism was also used to determine which records were previously synched.

I am not saying this was a good design, but it is an example of the primary key changing over time.

Jon Raynor
  • 191
  • 6
5

Any design that involves changing the PRIMARY KEY on a regular basis is a recipe for disaster. The only good reason for changing it would be an amalgamation of two previously separate databases.

As pointed out by @MaxVernon occasional changes may occur - then use ON UPDATE CASCADE, although the majority of systems nowadays use an ID as a surrogate PRIMARY KEY.

Purists such as Joe Celko and Fabian Pascal (a site worth following) disagree with the use of surrogate keys, but I think that they have lost this particular battle.

Vérace
  • 30,923
  • 9
  • 73
  • 85
3

Interestingly enough, the linked question about ROWGUID sort of provides its own use case: when you have conflicting primary keys in databases that need to be synchronized. If you have two databases that need to be reconciled and they use sequences for primary keys, you'll want one of the keys to change so it can remain unique.

In an ideal world, this would never happen. You would use GUIDs for the primary keys to begin with. Realistically, though, you might not even have a distributed database when you start designing, and converting it to GUIDs may have been an effort that got prioritized below making it distributed because it was deemed to be higher impact than implementing the key update. This could happen if you have a large code base that depends on integer keys and would require major revision to convert to GUID. There's also the fact that sparse GUIDs (GUIDs that aren't very close to each other, which happens if you generate them randomly as you should) can cause problems for certain kinds of indexes, as well, which means you way want to avoid using them as primary keys (mentioned by Byron Jones).

jpmc26
  • 1,652
  • 3
  • 20
  • 38
3

Stability is a desirable property for a key but it is a relative thing and not an absolute rule. In practice it's often useful to change the values of keys. In relational terms data is identifiable only by its (super)keys. It follows that if there is only one key in a given table then the distinction between A) changing a key value, or B) replacing the set of rows in a table with some similar or different set of rows containing other key values, is essentially an issue of semantics rather than logic.

A more interesting example is the case of a table having multiple keys where the values of one or more of those keys might have to change in relation to other key values. Take the example of an Employee table with two keys: LoginName and Badge Number. Here's a sample row from that table:

+---------+--------+
|LoginName|BadgeNum|
+---------+--------+
|ZoeS     |47832   |
+---------+--------+

If ZoeS loses her badge then maybe she is allocated a new one and gets a new badge number:

+---------+--------+
|LoginName|BadgeNum|
+---------+--------+
|ZoeS     |50282   |
+---------+--------+

Later, she might decide to change her login name:

+---------+--------+
|LoginName|BadgeNum|
+---------+--------+
|ZSmith   |50282   |
+---------+--------+

Both key values changed - in relation to each other. Note that it doesn't necessarily make any difference which one is deemed to be "primary".

In practice "immutability", i.e. absolutely never changing a value, is unachievable or at least impossible to verify. To the extent that change makes a difference at all, the safest course is probably to assume that any key (or any attribute) might need to change.

nvogel
  • 3,807
  • 21
  • 25
0

One possible scenario is let's say you have affiliates that have unique ID and you know they will not duplicate across affiliates as they have unique starting character. The affiliates load data to a master table. There records are processed and then assigned a master ID. Users need access to the records as soon as they are loaded even if they are not yet processed. You want the master ID to be based on the order processed and you will not always process in the order the records were loaded. I know a bit fabricated.

paparazzo
  • 5,048
  • 1
  • 19
  • 32
-1

Imagine a situation like when someone chose National Insurance Number (NIN) as a Primary Key and somehow an operator inserts a row with the wrong NIN. After inserting the value, there are two ways to correct the error:

  1. Delete the mistaken record and insert a new one
  2. Update the value to the correct one and use On Update Cascade if there is a referential integrity constraint on that column
Paul White
  • 94,921
  • 30
  • 437
  • 687