2

I need to write a query that will insert a row only once, even if the query is run multiple times. Being new to SQL (well new-ish), I did an if not exists(...) but a friend said he preferred deleting the row if it exists and then adding it again.

  • What might be the advantages of deleting over EXISTS or vice-versa?
  • Is there another way of doing this?
Mat
  • 10,289
  • 4
  • 43
  • 40
robasta
  • 155
  • 1
  • 5

2 Answers2

6

As mentioned in the comments...

  • There is no reason to DELETE/INSERT instead of just UPDATE or checking with EXISTS, but there are some reasons NOT to
    • More IO to remove a record and add a new one than an in-row update
    • Depending on clustered index you may increase fragmentation
    • Log file growth
    • Increased locking on the rows in question - EXISTS is about as non-intrusive as you can be, DELETE is the opposite

I'm sure there are many others as well. Checking for existence is in every way superior to deleting an existing record. The act of deleting the existing record REQUIRES A CHECK FOR EXISTENCE, so why on earth would you do anything extra after that?

JNK
  • 18,064
  • 6
  • 63
  • 98
4

Some people find the code for DELETE-followed-by-INSERT easier to follow especially if some rows are to be UPDATEd as well as some INSERTed, but I would recommend avoiding it. If you do use this method, make sure this or your overall process is wrapped in a transaction so that if something goes wrong nothing happens (or you might delete a row then fail to insert the update).

For large amounts of data, especially when you expect the row(s) to already exist most of the time, DELETE+INSERT can be significanly less efficient: you have one or two row modification operations (DEL+INS instead of INT or UPD) instead of one or zero (INS, UPD, or nothing if the row exists and is correct) that will update the transaction log and data files, and as well as the pages containing the rows you are affecting all relevant index pages will need to be updated too (twice: once for the delete and once for the insert). For small amounts of data this inefficiency issue is not likely to be significant, but keep future scalability in mind. As well as the extra I/O operations that can be immediately measured, you may be increasing heap and/or index fragmentation which can cause other performance problems in the longer term.

In instances where you have a properly defined database with relevant foreign key relations implemented the DELETE operation may simply error due to the rows being referred to by entries in other tables, so the technique definitely can not be universally used. Even worse, if ON DELETE CASCADE is used you could inadvertently delete a great many rows from elsewhere which won't be replaced by your subsequent INSERT. Similar problems can occur if there are triggers set to take action on the table to affect data elsewhere. This is the more important reason not to develop the habbit IMO.

Since SQL2008 the most efficient way to INSERT-or-UPDATE or INSERT-or-NOTHING (or other combinations including DELETEing some rows) is the new MERGE statement. Unlike check+do or delete+insert the structures are only scanned once and the required insert/update/delete operations done in one go as needed. MSSQL's MERGE statement is not dissimilar to (but is more flexible than) what some databases call an UPSERT (update or insert) operation. Similar operations are available in other databases: for example, IIRC, Oracle implements much the same functionality in this respect as MSSQL.

Edit: see the links in Aaron's comment for important caveats to be aware of with MERGE that might make it far less suitable then I suggested.

David Spillett
  • 32,593
  • 3
  • 50
  • 92