10

Please bear with me as I'm not a DBA nor am I a SQL server expert.

We added an index to a table and noticed after about a day or so, the storage for the index grew to around 200 MB.
The reason for this, we surmised, is due to the high number of ghost records.
However when we do looked into it, we got the following results for the tables in the db:

TableName record_count ghost_record_count version_ghost_record_count
Table1 19,936,533,500 5000 473,444,900
Table2 902,990,900 100 9,143,000
Table3 5,502.563,178 800 120,117,300
Table4 1,548,468,000 500 29,863,400

Seeing that the version ghost records is so big, we figured restarting the application (not the database) server might clear the records. This did not seem to be the case however. To be honest I am stumped as to what could be the reason for the persistence of the version ghost records. I'd appreciate any insight into how to troubleshoot/manage this. Or even some additional knowledge or past experience about this topic since I've found very little on the interwebs. Thank you.

Dark Star1
  • 271
  • 2
  • 11

2 Answers2

15

Version ghost records in particular are 'soft deleted' rows kept around because some currently executing statement or a long-running open transaction might need that version of the row to produce the correct results.

If you are lucky, this means that someone has left a transaction open for a long time that is (a) using a row versioning transaction isolation level like Read Committed Snapshot Isolation (RCSI) or Snapshot Isolation (SI); and (b) that transaction has accessed user data in the database.

A quick way to see if there are any open transactions is to run DBCC OPENTRAN;. Resolving that transaction will allow the ghost cleanup process to start to catch up.

SQL Server 2022 has an improvement for ghost cleanup so RCSI (statement-level snapshot) transactions can be cleaned up when the statement completes (not waiting for the transaction to end). Since you're not running SQL Server 2022, you're stuck with the old behaviour.

If you are unlucky, you have a Database Availability Group (AG) and there is either some problem with a replica, or there is a long running transaction on one of the replicas.

Queries on replicas always run under Snapshot Isolation, regardless of the isolation level requested. A long-running transaction on a replica might need old row versions to deliver correct results. Since the primary is an exact copy of the replicas, it cannot clean up old ghost versions until every replica has indicated no current transaction could possibly need that data.

You can see if a replica is holding up version ghost cleanup by querying the low_water_mark_for_ghosts column of sys.dm_hadr_database_replica_states. If that column isn't changing over the course of 30s or so, you almost certainly have a long-running replica transaction or some other problem with the replica(s) that is preventing ghost cleanup from making progress.

In the happy scenario, you will find a long-running transaction on a replica and resolve it as for the non-AG case. Otherwise, you have some sort of AG problem that will need expert attention.


There are other scenarios. You might simply have a very large number of databases on the one instance, or a tremendous rate of ghost production due to heavy delete activity (or updates performed by the engine as a delete-insert pair for various reasons, including replication and index key updates).

In other words, the rate of ghost production exceeds the capabilities of the cleanup process. There is not enough information in the question to say whether this applies or not. If other tables and databases on the same instance have deletes but no ghost problem, capacity probably isn't the issue.

Paul White
  • 94,921
  • 30
  • 437
  • 687
7

I think you need to start by reading up on ghost records & why they exist in the database, this may give you some insights into why with your application the numbers are so high, in the long run it seems likely to me you will need to make some architectural changes.

https://learn.microsoft.com/en-us/sql/relational-databases/ghost-record-cleanup-process-guide?view=sql-server-ver15

That said as ghost records are database constructs I'm not sure why you would think restarting the application server would make a difference, as you have found out it doesn't.

There is however a system stored procedure which can accomplish this task. You should read the documentation carefully as it is far from cost free.

https://learn.microsoft.com/en-us/sql/relational-databases/system-stored-procedures/sp-clean-db-file-free-space-transact-sql?view=sql-server-ver15

Stephen Morris - Mo64
  • 4,656
  • 1
  • 10
  • 18