7

I have a large table, but not huge (less than 2 million rows on aging hardware), and when adding a non-nullable column to an existing table, I usually follow the structure set out here to avoid problems with the script timing out when our database migrations run on deployment (FYI - This isn't a full text index problem).

So, in summary, I:

  • Alter the table and add the column as NULL and do not add a default constraint
  • Backfill the column in batches
  • Alter the table and change the column to be NOT NULL and add the default constraint

However, in the following case I want to add a new UNIQUEIDENTIFER column and fill it with NEWSEQUENTIALID() rather than NEWID() values.

Without running in batches, my script would look like this:

IF NOT EXISTS (
    SELECT NULL FROM INFORMATION_SCHEMA.COLUMNS 
    WHERE TABLE_NAME = 'Invoice' 
    AND COLUMN_NAME = 'InternalId')
BEGIN
    ALTER TABLE Invoice 
        ADD InternalId UNIQUEIDENTIFIER NOT NULL 
        CONSTRAINT [DF_Invoice_InternalId] DEFAULT (NEWSEQUENTIALID())
END
GO

However if I split this up into batches, and attempt to fill the nullable InternalId with the following:

IF NOT EXISTS (
    SELECT NULL FROM INFORMATION_SCHEMA.COLUMNS 
    WHERE TABLE_NAME = 'Invoice' 
    AND COLUMN_NAME = 'InternalId')
BEGIN
    ALTER TABLE Invoice 
        ADD InternalId UNIQUEIDENTIFIER NULL
END

DECLARE @MaxId INT, @LoopStart INT, @LoopEnd INT, @LoopSize INT = 50000 SELECT @MaxId = MAX(InvoiceId) FROM Invoice SELECT @LoopStart = MIN(InvoiceId) FROM Invoice SET @LoopEnd = @LoopStart + @LoopSize

PRINT 'Updating InternalIds to a new GUID' WHILE @LoopStart <= @MaxId BEGIN -- update internal id UPDATE I SET InternalId = NEWSEQUENTIALID() FROM Invoice I WHERE I.InvoiceId BETWEEN @LoopStart AND @LoopEnd

SET @LoopStart = @LoopEnd + 1
SET @LoopEnd = @LoopEnd + @LoopSize

END

IF EXISTS ( SELECT NULL FROM INFORMATION_SCHEMA.COLUMNS WHERE TABLE_NAME = 'Invoice' AND COLUMN_NAME = 'InternalId' AND IS_NULLABLE = 'YES') BEGIN ALTER TABLE Invoice ALTER COLUMN InternalId UNIQUEIDENTIFIER NOT NULL END

IF NOT EXISTS (SELECT NULL FROM sys.objects WHERE name = 'DF_Invoice_InternalId') BEGIN ALTER TABLE Invoice ADD CONSTRAINT [DF_Invoice_InternalId]
DEFAULT ((NEWSEQUENTIALID())) FOR [InternalId] END

I get the following error:

Msg 302, Level 16, State 0, Line 40 The newsequentialid() built-in function can only be used in a DEFAULT expression for a column of type 'uniqueidentifier' in a CREATE TABLE or ALTER TABLE statement. It cannot be combined with other operators to form a complex scalar expression.

Any tips on how to work around this? Or am I over-thinking this?

The reason for doing this change is to expose the Sequential IDs (InternalId or could be called PublicId) externally in an API, as a replacement for the current sequential numeric Ids (InvoiceId). The numeric Id (the primary key) should have been kept internal, as it exposes a sequential and guessable internal value. The Sequential GUID is still sequential, but also not so easily guessable. To illustrate the point, I'm doing something like this, but @First is being provided via an API call. It is used for polling and processing new invoices using a watermarking process.

CREATE TABLE #Test (
    Id INT NOT NULL IDENTITY(1,1) PRIMARY KEY,
    DateCreated DATETIME NOT NULL DEFAULT(GETDATE()),
    Code NVARCHAR(50) NOT NULL
)

DECLARE @Id INT DECLARE @NO_OF_CHARS INT = 10 SET @Id = 1

WHILE @Id <= 12000 BEGIN

INSERT INTO #Test (Code) VALUES (SUBSTRING (REPLACE(CONVERT(VARCHAR(40), NEWID()), '-',''), 1, @NO_OF_CHARS)) SET @Id = @Id + 1 END

ALTER TABLE #Test ADD InternalId UNIQUEIDENTIFIER NOT NULL DEFAULT(NEWSEQUENTIALID())

DECLARE @First UNIQUEIDENTIFIER SELECT * FROM #Test SELECT @First = InternalId FROM #Test WHERE Id = 1 SELECT * FROM #Test WHERE InternalID > @First

DROP TABLE #Test

nateirvin
  • 756
  • 1
  • 6
  • 22
Rebecca
  • 299
  • 2
  • 12

3 Answers3

2

You could add the column as nullable with the constraint and then update the column with DEFAULT value. Conceptually something like this:

ALTER TABLE Invoice 
    ADD InternalId UNIQUEIDENTIFIER NULL 
    CONSTRAINT [DF_Invoice_InternalId] DEFAULT (NEWSEQUENTIALID())

UPDATE I SET InternalId = DEFAULT FROM Invoice I

ALTER TABLE Invoice ALTER COLUMN InternalId UNIQUEIDENTIFIER NOT NULL

But based on your motivation I would advice against using NEWSEQUENTIALID. It IS guessable. If someone gets one of the generated IDs he can easilly guess former and following values. Take a look at this StackOverflow answer

In most cases, the next newsequentialid can be predicted by taking the current value and adding one to the first hex pair.

In other words:

1E29E599-45F1-E311-80CA-00155D008B1C

is followed by

1F29E599-45F1-E311-80CA-00155D008B1C

is followed by

2029E599-45F1-E311-80CA-00155D008B1C

rois
  • 595
  • 1
  • 4
  • 12
1

A colleague pointed out how you can use the DEFAULT keyword in this instance, although I'm not 100% sure how it works, it works.

Using the example from above, the following shows how this works:

-- create the example table
CREATE TABLE #Test (
    Id INT NOT NULL IDENTITY(1,1) PRIMARY KEY,
    DateCreated DATETIME NOT NULL DEFAULT(GETDATE()),
    Code NVARCHAR(50) NOT NULL
)

DECLARE @Id INT DECLARE @NO_OF_CHARS INT = 10 SET @Id = 1

WHILE @Id <= 12000 BEGIN

INSERT INTO #Test (Code) VALUES (SUBSTRING (REPLACE(CONVERT(VARCHAR(40), NEWID()), '-',''), 1, @NO_OF_CHARS)) SET @Id = @Id + 1 END

-- add the new column with a default but it is still nullable ALTER TABLE #Test ADD InternalId UNIQUEIDENTIFIER NULL DEFAULT(NEWSEQUENTIALID()) GO

DECLARE @MaxId INT, @LoopStart INT, @LoopEnd INT, @LoopSize INT = 50000 SELECT @MaxId = MAX(Id) FROM #Test SELECT @LoopStart = MIN(Id) FROM #Test SET @LoopEnd = @LoopStart + @LoopSize

PRINT 'Updating InternalIds to a new GUID' WHILE @LoopStart <= @MaxId BEGIN -- update internal id using the DEFAULT -- https://dba.stackexchange.com/questions/72604/adding-non-nullable-newsequentialid-column-to-existing-table UPDATE #Test SET InternalId = DEFAULT WHERE Id BETWEEN @LoopStart AND @LoopEnd

SET @LoopStart = @LoopEnd + 1
SET @LoopEnd = @LoopEnd + @LoopSize

END

-- now make this not null ALTER TABLE #Test ALTER COLUMN InternalId UNIQUEIDENTIFIER NOT NULL

-- do a basic test DECLARE @First UNIQUEIDENTIFIER SELECT * FROM #Test SELECT @First = InternalId FROM #Test WHERE Id = 1 SELECT * FROM #Test WHERE InternalId > @First

DROP TABLE #Test

Alternatively, this SQL Fiddle shows it in action:

SQL Fiddle

MS SQL Server 2017 Schema Setup:

CREATE TABLE Test (
    Id INT NOT NULL IDENTITY(1,1) PRIMARY KEY,
    DateCreated DATETIME NOT NULL DEFAULT(GETDATE()),
    Code NVARCHAR(50) NOT NULL
)

Query 1:

DECLARE @Id INT
DECLARE @NO_OF_CHARS INT = 10
SET @Id = 1

WHILE @Id <= 10 BEGIN
INSERT INTO Test (Code) VALUES (SUBSTRING (REPLACE(CONVERT(VARCHAR(40), NEWID()), '-',''), 1, @NO_OF_CHARS)) SET @Id = @Id + 1 END

-- add the new column with a default but it is still nullable ALTER TABLE Test ADD InternalId UNIQUEIDENTIFIER NULL DEFAULT(NEWSEQUENTIALID())

Results:

Query 2:

DECLARE @MaxId INT, @LoopStart INT, @LoopEnd INT, @LoopSize INT = 50000
SELECT @MaxId = MAX(Id) FROM Test
SELECT @LoopStart = MIN(Id) FROM Test
SET @LoopEnd = @LoopStart + @LoopSize

PRINT 'Updating InternalIds to a new GUID' WHILE @LoopStart <= @MaxId BEGIN -- update internal id using the DEFAULT -- https://dba.stackexchange.com/questions/72604/adding-non-nullable-newsequentialid-column-to-existing-table UPDATE Test SET InternalId = DEFAULT WHERE Id BETWEEN @LoopStart AND @LoopEnd

SET @LoopStart = @LoopEnd + 1
SET @LoopEnd = @LoopEnd + @LoopSize

END

-- now make this not null ALTER TABLE Test ALTER COLUMN InternalId UNIQUEIDENTIFIER NOT NULL

Results:

Query 3:

-- do a basic test
SELECT * FROM Test

Results:

|  Id |              DateCreated |       Code |                           InternalId |
|-----|--------------------------|------------|--------------------------------------|
| 191 | 2022-01-28T13:59:08.947Z | D61874FEB6 | BFC0423E-F36B-1410-80BF-800000000000 |
| 192 | 2022-01-28T13:59:08.947Z | C596780C9F | C0C0423E-F36B-1410-80C0-800000000000 |
| 193 | 2022-01-28T13:59:08.947Z | 3356A26232 | C1C0423E-F36B-1410-80C1-800000000000 |
| 194 | 2022-01-28T13:59:08.947Z | D6D509EDE7 | C2C0423E-F36B-1410-80C2-800000000000 |
| 195 | 2022-01-28T13:59:08.947Z | E956CBCF4B | C3C0423E-F36B-1410-80C3-800000000000 |
| 196 | 2022-01-28T13:59:08.947Z | E32EAB6E92 | C4C0423E-F36B-1410-80C4-800000000000 |
| 197 | 2022-01-28T13:59:08.947Z | B2137001AA | C5C0423E-F36B-1410-80C5-800000000000 |
| 198 | 2022-01-28T13:59:08.947Z | 0FB26856C9 | C6C0423E-F36B-1410-80C6-800000000000 |
| 199 | 2022-01-28T13:59:08.947Z | 5DFCD86CAC | C7C0423E-F36B-1410-80C7-800000000000 |
| 200 | 2022-01-28T13:59:08.947Z | C4BFCCDC59 | C8C0423E-F36B-1410-80C8-800000000000 |

Query 4:

-- test the watermark
DECLARE @First UNIQUEIDENTIFIER = (SELECT TOP 1 InternalId FROM Test ORDER BY Id ASC)
--SELECT 'Watermark is: ' AS [Note], @First AS [WatermarkId]
SELECT * FROM Test WHERE InternalId > @First

Results:

|  Id |              DateCreated |       Code |                           InternalId |
|-----|--------------------------|------------|--------------------------------------|
| 192 | 2022-01-28T13:59:08.947Z | C596780C9F | C0C0423E-F36B-1410-80C0-800000000000 |
| 193 | 2022-01-28T13:59:08.947Z | 3356A26232 | C1C0423E-F36B-1410-80C1-800000000000 |
| 194 | 2022-01-28T13:59:08.947Z | D6D509EDE7 | C2C0423E-F36B-1410-80C2-800000000000 |
| 195 | 2022-01-28T13:59:08.947Z | E956CBCF4B | C3C0423E-F36B-1410-80C3-800000000000 |
| 196 | 2022-01-28T13:59:08.947Z | E32EAB6E92 | C4C0423E-F36B-1410-80C4-800000000000 |
| 197 | 2022-01-28T13:59:08.947Z | B2137001AA | C5C0423E-F36B-1410-80C5-800000000000 |
| 198 | 2022-01-28T13:59:08.947Z | 0FB26856C9 | C6C0423E-F36B-1410-80C6-800000000000 |
| 199 | 2022-01-28T13:59:08.947Z | 5DFCD86CAC | C7C0423E-F36B-1410-80C7-800000000000 |
| 200 | 2022-01-28T13:59:08.947Z | C4BFCCDC59 | C8C0423E-F36B-1410-80C8-800000000000 |

The "watermark" value being 'BFC0423E-F36B-1410-80BF-800000000000'

Rebecca
  • 299
  • 2
  • 12
0

Be aware that NEWSEQUENTIALID cannot guarantee 100% sequential GUIDs for the life of the table's existence. NEWSEQUENTIALID may start back at a lower value after each server reboot. Please review the Microsoft Doc on this. If that's going to be an issue, you may want to re-evaluate its use for the initial update.

As I understand it, NEWSEQUENTIALID's main purpose is to reduce pain when using a GUID as a clustered key, so as to avoid random inserts all across your data pages and page splits

Brendan McCaffrey
  • 3,444
  • 2
  • 8
  • 29