TL;DR: The question below boils down to: When inserting a row, is there a window of opportunity between the generation of a new Identity value and the locking of the corresponding row key in the clustered index, where an external observer could see a newer Identity value inserted by a concurrent transaction? (In SQL Server.)
Detailed version
I have a SQL Server table with an Identity column called CheckpointSequence, which is the key of the table's clustered index (which also has a number of additional nonclustered indexes). Rows are inserted into the table by several concurrent processes and threads (at isolation level READ COMMITTED, and without IDENTITY_INSERT). At the same time, there are processes periodically reading rows from the clustered index, ordered by that CheckpointSequence column (also at isolation level READ COMMITTED, with the READ COMMITTED SNAPSHOT option being turned off).
I currently rely on the fact that the reading processes can never "skip" a checkpoint. My question is: Can I rely on this property? And if not, what could I do to make it true?
Example: When rows with identity values 1, 2, 3, 4, and 5 are inserted, the reader must not see the row with value 5 prior to seeing the one with value 4. Tests show that the query, which contains an ORDER BY CheckpointSequence clause (and a WHERE CheckpointSequence > -1 clause), reliably blocks whenever row 4 is to be read, but not yet committed, even if row 5 has already been committed.
I believe that at least in theory, there may be a race condition here that might cause this assumption to break. Unfortunately, documentation on Identity doesn't say a lot about how Identity works in the context of multiple concurrent transactions, it only says "Each new value is generated based on the current seed & increment." and "Each new value for a particular transaction is different from other concurrent transactions on the table." (MSDN)
My reasoning is, it must work somehow like this:
- A transaction is started (either explicitly or implicitly).
- An identity value (X) is generated.
- The corresponding row lock is taken on the clustered index based on the identity value (unless lock escalation kicks in, in which case the whole table is locked).
- The row is inserted.
- The transaction is committed (possibly quite a lot of time later), so the lock is removed again.
I think that between step 2 and 3, there is a very tiny window where
- a concurrent session could generate the next identity value (X+1) and execute all the remaining steps,
- thus allowing a reader coming exactly at that point of time to read the value X+1, missing the value of X.
Of course, the probability of this seems extremely low; but still - it could happen. Or could it?
(If you're interested in the context: This is the implementation of NEventStore's SQL Persistence Engine. NEventStore implements an append-only event store where every event gets a new, ascending checkpoint sequence number. Clients read events from the event store ordered by checkpoint in order to perform computations of all sorts. Once an event with checkpoint X has been processed, clients only consider "newer" events, i.e., events with checkpoint X+1 and above. Therefore, it is vital that events can never be skipped, as they'd never be considered again. I'm currently trying to determine if the Identity-based checkpoint implementation meets this requirement. These are the exact SQL statements used: Schema, Writer's query, Reader's Query.)
If I'm right and the situation described above could arise, I can see only two options of dealing with them, both of which are unsatisfactory:
- When seeing a checkpoint sequence value X+1 before having seen X, dismiss X+1 and try again later. However, because
Identitycan of course produce gaps (e.g., when the transaction is rolled back), X might never come. - So, same approach, but accept the gap after n milliseconds. However, what value of n should I assume?
Any better ideas?
