What is the best way to design a table with an arbitrary id?

Question

I have the need to create a table with a unique id as the PK. The ID is a surrogate key. Originally, I had a natural key, but requirement changes have undermined this idea. Then, I considered adding an auto incrementing identity. But, this presents problems.

A. I can't specify my own ID.
B. The ID's are difficult to reset.

Both of these together make it difficult to copy over this table with new data or move the table across domains, e.g. Dev to QA. I need to refer to these ID's from the front end, JavaScript...so they must not change.

So, the only way I am aware of to meet all these challenges is to make a GUID ID. This way, I can overwrite the ID's when I need to or I can generate a new one without concern for order (E.G. an int based id would require I know the last inserted ID).

Is a GUID the best way to accomplish my goals? Considering that a GUID is a string and joining on a string is an expensive task, is there a better way?

JDT · Accepted Answer · 2011-11-22T15:08:03.517

GUID (the uniqueidentifier-data type in SQLServer) seems like the best option based on the information you provided. I work on a large application that has tables with 500000+ rows using GUID's as primary keys with some pretty large joins and the bottleneck is never in the joins of the primary keys.

Various other sources also seem to indicate that GUID's are a good choice, these are worth checking out:

http://www.informit.com/articles/printerfriendly.aspx?p=25862 talks about GUID performance and gives a clear indication with numbers as to the impact.
http://databases.aspfaq.com/database/what-should-i-choose-for-my-primary-key.html gives a decent pro/con analysis of the usage of GUID's versus integers.

Some of the bigger issues seem to be:

Loss of chronology (no way to know which row was inserted last and no ability to remove everything inserted after a certain point in time), can be fixed easily by using an 'insertdate'-column containing the date and time the row was added
Larger storage requirements but unless you deal with very large databases storage is pretty cheap
Harder to remember and work with in testing which is a valid issue but there are often alternatives to using a GUID to identify a row

I'm inclined to say that performance here would not be the issue and based on your requirements GUID are a very good - if not your only - choice.

score 1 · Answer 2 · answered Nov 22 '11 at 14:43

1

Based on what you have told us, GUID is the best "built-in" way to meet your needs. trying to come up with a "roll-your-own" key that meets your needs sounds error prone, difficult, and like you would be reinventing the wheel.

answered Nov 22 '11 at 14:43

automatic

111
2

score 1 · Answer 3 · answered Nov 22 '11 at 14:58

Note that while GUID sounds good, you must be careful about:

1-Your server must support it (Microsoft backend for example) so you can generate it automatically to guarantee its correctness and uniqueness.

2-Your database(s) must support it (Microsoft SQL Server for example), because if you move your data to another database, it may be cumbersome to generate it.

3-You must have a consistent way to generate the values (either from the Client or from the server) - This way you guarantee consistency (this is a personal guess).

4-The user must not have to enter it because it is very cumbersome.

5-It does not preserve sequence, so you can't 'reset' it and start over. but you can assume that it would be unique.

score 1 · Answer 4 · answered Nov 23 '11 at 04:10

Considering that a GUID is a string and joining on a string is an expensive task, is there a better way?

If you make a GUID the PK, it will be a clustered index on the table. The reason why auto incrementing integer IDs are popular for PKs are that over time, the clustered index fragments very little, so little if any maintenance is needed on the PK. The index is sequential.

GUIDs will work out fine, provided regular maintenance is done on the index. If there are large amounts of inserts and deletes on the table, just make sure the process doing those inserts/deletes rebuilds the index after it completes. Also, over time a GUID index will fragment after many inserts/updates, so just rebuild the index/PK at regular intervals to keep the queries that use it running optimally.

What is the best way to design a table with an arbitrary id?

4 Answers4