Lets start with the differences between Primary Key, Clustered or NonClustered Index and Heap.
Primary Key
Restrictions:
- The column used in a Primary Key must be unique
- If a primary key contains more than one column, each combination of values for all these columns must be unique
- Must be NOT NULL
The PK itself is not meant to improve performance. It is a constraint which is used to make sure that the key (1 column or a combination of Column) is unique in the table:
ALTER TABLE dbo.test ADD CONSTRAINT PK_test PRIMARY KEY NONCLUSTERED(ID);
A Primary Key will be either NONCLUSTERED (just a special constraint used for Foreign Keys) or CLUSTERED (clustered index used for storage at the page level). By Default SQL Server Management Studio create them as CLUSTERED.
Clustered Index
A Clustered Index is used to store and order rows and their data at the leaf level. Unlike non clustered indexes, it contains the data for all the columns at the leaf level of the index. Rows will be ordered by the columns used in the Index key.
To be efficient, a Clustered index key should be:
- Narrow = is a small as possible (number of bytes for the index index key)
- Unique = when duplicates exist, it adds a 4 bytes "uniqueifier" value to duplicates (=the key is bigger)
- Static = is never updated
- Ever Increasing = consevutive values are inserted in order to avoid fragmentation. Data will be added at the end of the index (1, 2, 3, 4 but not 1, 2, 4, 3...)
You can only have one Clustered Index on a table. Rows are recorded in the Clustered key order. This is why it is better to have an ever increasing key.
New data will be recorded at the end of the index in that case.
With a random key, rows will have to be inserted in between data pages and it will create fragmentation and row movements.
NonClustered Index
They don't store the row data at the leaf level. They only store the values(s) of the Clustered key(s) from the Clustered Index. The key is then use as a pointer to the real data in the Clustered index.
Because the clustered index keys are used in indexes branches and leaves, a smaller key will make non clustered index smaller. With smaller index, it will have less pages to read, it will keep more data in memory cache and more rows can be retrieved out of each pages of data being read.
Heap
Heap is a table without Clustered Index. It can have a Primary Key or no PK.
In order to distinguish rows, it use a RowIDentifier of 8 bytes.
Because new rows are added at the end of the Heap with no specific order, there is no fragmentation when you insert new rows
Question 1
The primary key won't improve performances. It will just make sure the combination of column used in the PK is unique.
The clustered index may improve performance if the key and other non clustered indexes are well choosen based on you queries and typical usage.
Question 2
You can use either of this queries:
ALTER TABLE dbo.test ADD CONSTRAINT PK_test PRIMARY KEY CLUSTERED(ID);
ALTER TABLE dbo.test ADD CONSTRAINT PK_test PRIMARY KEY NONCLUSTERED(ID);
If you use the first, you cannot add a 2nd custom clustered index.
If you use the second, you table will remain a Heap (no clustered index with a 8 bytes RID key) until you add a Clustered index.
Question 3:
If we look at your columns, what are the candidate keys:
You already have an nonclustered index. Because there is no clustered index, this index use Row IDentifier (RID) of 8 bytes at the moment. 8 is not too big but this is not a narrow key either.
They are not good candidates because they are wide (16 bytes) and are random (not ever increasing). However, they are unique. They will make non clustered index bigger and slower. NEWSEQUENTIALID could be used in order to make it ever increasing but it won't make it smaller.
It can be a good candidate if you insert them in order (ever increasing). You may have duplicates which will have to rely on a uniqueifier (4 bytes) to distinguyish them. It also cannot be a Primary Key if it is not unique.
Datetime storage size is 8 bytes. It would be better to convert it to datetime2 which takes 6 to 8 bytes depending on the precision you need.
- New column of type int with identity
In that case, the storage size would be 4 bytes which is big enough for 1.7M rows. This is much smaller than datetime or uniqueidentifier hence small indexes.
Identity will also make them unique and ever increasing.
Solution?
There is no Black or White answer and it has to be decided according to you needs, queries and usages.
I would probably not keep the Heap and I would create a Clustered index.
Without knowing the queries you execute on this table I would either create a Clustered index on a new int identity column or create it on the datetime2 columns.
It would help to know which query are executed. This way, good indexes can be created with an efficient Clustered Index.
Either Clustered or Non Clustered indexes also create Statistics. They help the engine when it looks for the best execution plan.
Too many, useless or poor indexes, either Clustered or Non Clustered, will reduce performances.