1

I am doing a batch update where I process records sequentially (User 1, User 2 etc). Each record has multiple update queries associated to it. I need to skip if there is a data issue for an individual record (user in this case). Ideally, I would like to process them in parallel, but haven't reached that level yet (lots of challenges).

Is it possible to do it such that:

  1. There is a single transaction.
  2. If there is a failure the statements associated with that record gets rolled back, without affecting others.
  3. Commit.

For example, I have 4 users in a CSV file. If 3 are good and 1 is bad, 3 should get committed (or aborted) atomically; 1 should get skipped with errors.

Observed:

do everything for user 1;
do everything for user 2;
--> if there is failure, it rolls back the *entire* transaction
do everything for user 3;

In fact, any error level >= 16 is rolling back the entire transaction.

Expected:

do everything for user 1;
do everything for user 2;
--> if there is failure, roll back this *block* only
do everything for user 3;
do everything for user 4;

It's a normal try-catch requirement in any programming language; however, couldn't see a SQL Server equivalent (involving transactions). I have read about checkpoints, but not sure if that's an option to consider.

Nishant
  • 899
  • 2
  • 13
  • 20

1 Answers1

2

It is not possible to commit some part(s) of a transaction and roll back others. That would defeat the point of a transaction: Everything inside the transaction succeeds or fails as a unit. In other words, a transaction is atomic.

It sounds very much like you do not want one transaction for the whole process. That would mean either all records were processed successfully, or none were. Any error would ensure the persistent database returned to the state it was in before any of the changes occurred.

From your description, it seems you should instead start a new transaction for each user.

Perform all the changes necessary for that user, then commit or rollback depending on whether any fatal error conditions were encountered. You might like to build in a limited number of retries for transient (resolvable) errors like a deadlock. Or log them somewhere to be retried at a later time.

That arrangement would lead to the outcome you indicate you want: Users without errors are successful, users with errors have all their changes undone.

Ideally, you would also write your routine to restart from the point of failure in case of a disaster. This avoids reprocessing user data that was already successfully processed into the database when the failure occurred.

For an exhaustive guide, see Error and Transaction Handling in SQL Server by Erland Sommarskog.


The change to running a transaction per user might have an effect on performance. Whether you notice it depends on how much other work is done per user and other factors.

Remember that each statement would run in its own auto-commit transaction if you weren't using an explicit transaction.

If this turns out to be an important factor in your particular situation, you could look into using delayed durability, improving your system's log throughout, or running the process on more than one thread, for example.

You don't necessarily have to write all that yourself. Many tools and technologies exist to read data from CSV input and pipeline those changes into a database, perhaps in parallel. For example, SSIS, Azure Data Factory, and Synapse Pipelines.

Feel free to ask separate follow-up questions as needed.

Paul White
  • 94,921
  • 30
  • 437
  • 687