3

I ran DBCC CheckDB in our Production SQL Server 2014. It reported back with one consistency error:

Failed:(-1073548784) Executing the query "DBCC CHECKDB(N'MYDB')  WITH NO_INF..." failed with the following error:

"Table error: Object ID 629577392, index ID 1, partition ID 72057594387039319, alloc unit ID 72057594045596759 (type LOB data). The off-row data node at page (1:6009), slot 24, text ID 15754068079 is not referenced.

CHECKDB found 0 allocation errors and 1 consistency errors in table 'MYTABLE' (object ID 629577281). CHECKDB found 0 allocation errors and 1 consistency errors in database 'MYDB'.

repair_allow_data_loss is the minimum repair level for the errors found by DBCC CHECKDB (MYDB).". Possible failure reasons: Problems with the query, "ResultSet" property not set correctly, parameters not set correctly, or connection not established correctly.

Since we do not have a good backup for that DB, we have to route to the below scenario.

What I did to troubleshoot:

  1. I restored the recent backup file from Prod to UAT and ran DBCC CheckDB again to replicate the error. The same consistency error came back.
  2. Ran DBCC CHECKTABLE (MYTABLE) = same consistency error shows:
Table error: Object ID 629577392, index ID 1, partition ID 72057594387039319, alloc unit ID 72057594045596759 (type LOB data).

The off-row data node at page (1:6009), slot 24, text ID 15754068079 is not referenced. DBCC results for 'MYTABLE'. There are 53635 rows in 2705 pages for object "MYTABLE". CHECKTABLE found 0 allocation errors and 1 consistency errors in table 'MYTABLE' (object ID 629577281). repair_allow_data_loss is the minimum repair level for the errors found by DBCC CHECKTABLE

  1. Then implemented the following steps below:

-- step 1 ALTER DATABASE [MYDB] SET SINGLE_USER WITH ROLLBACK IMMEDIATE;

-- step 2 DBCC CHECKDB(N'[MYDB]', REPAIR_ALLOW_DATA_LOSS);

Msg 8964, Level 16, State 1, Line 3
Table error: Object ID 629577392, index ID 1, partition ID 72057594387039319, alloc unit ID 72057594045596759 (type LOB data). The off-row data node at page (1:6009), slot 24, text ID 15754068079 is not referenced.
        The error has been repaired.
There are 53635 rows in 2705 pages for object "MYTABLE".

-- step 3 DBCC CHECKDB ('[MYDB]') ---(Re-run to check for any additional errors = no errors reported)

-- step 4 ALTER DATABASE [MYDB] SET MULTI_USER; ---(setting it back to multi user mode = users can access the DB)

My questions/concerns:

  1. How do I know which data has been lost? From my understanding it doesn't seem like any data was lost because before the repair the table had 53635 rows. After the repair it still has 53635 rows.

  2. Do we need to to schedule downtime/ outage for that exact site when setting the DB to single user mode?

  3. After doing the repair in Production, what are some best practices to keep aware of?

sqllover2020
  • 73
  • 2
  • 13

2 Answers2

0

How do I know which data has been lost?

Well that is really difficult to find AFAIK, the best way is test your database and app functionality. You said the data rows are same please also check the constraints and business rules. Since the corruption was limited to one table and you were able to take backup and do a restore, I guess corruption was not that severe. In this case you can run checkdb repair on newly restored database and then compare the table from both the database. Well that is tough but there is hardly ant other way.

Do we need to to schedule downtime/ outage for that exact site when setting the DB to single user mode?

Yes, as name suggests if single user is allowed others needs to be kicked off including application. Repair of a database is offline operation so downtime will be needed.

After doing the repair in Production, what are some best practices to keep aware of?

Run checkdb again on the database and if it does not points to any error than I guess your repair was successful. Please also note there are something which Checkdb cannot fix. If you want to make this short because database is big then you can run dbcc checkconstraint(). Complete checkdb will include checkconstraint.

Also Read How Does EMERGENCY-Mode Repair Work?

Shanky
  • 19,148
  • 4
  • 37
  • 58
0

It looks like it's orphaned data from an off-row LOB column (such as varchar(max) or varbinary(max)) and DBCC has deleted it. No whole rows were deleted.

Given that it wasn't referenced by any row, either the row lost the reference somehow and you'll have to work out which row is missing data; or it was supposed to be set to null anyway and the data ended up orphaned.


As documented, to use a repair option, the database must be in single-user mode. Therefore downtime must be scheduled.

However, you can repair just that one table using DBCC CHECKTABLE WITH REPAIR_ALLOW_DATA_LOSS. This will minimize the downtime.


Databases are normally very resilient. Even an unexpected shutdown is normally recoverable to a consistent state.

Generally, database inconsistency can come from one of only a few things:

  • Physical disk fault. For this, the best solution is mirrored drives.
  • Operating system issues, such as buggy drivers or antivirus, an actual virus/malware, other OS bug. Of these, buggy drivers are the most likely.
  • Memory fault. This is highly unlikely with ECC server RAM.
  • On drives without Power Loss Protection, an unexpected power loss can also cause it. Ensure your drives have PLP.

Regular scheduling of DBCC is essential to monitoring corruption occurrence, the server will not necessarily notice it of its own accord.

Furthermore, a good backup schedule is the best way to avoid needing to mess about with repairs. A full backup every night, plus transaction log backups every 10 or 15 minutes, will get you minimal data loss.

Charlieface
  • 17,078
  • 22
  • 44