3

We have a few Basic Availability groups in production and I've been reading about the limitations of them. I'm concerned about the following limitations:

No backups on secondary replica.

No integrity checks on secondary replicas.

Suppose that we haven't failed over in months. In our case, that seems to imply that we haven't run DBCC CHECKDB for a long time on any of the databases on the secondary. There could have been a storage corruption issue that occurred months ago that we still don't know about. If a disaster occurs on the primary replica and we fail over to the secondary, we might end up with the production application pointing at corrupt data.

Would it be considered a best practice to perform one of the following on a fixed schedule?

  1. Perform a planned manual failover to switch the primary and secondary and leave the former secondary in the primary role until the next planned failover.
  2. Take a database snapshot of the secondary replica and run DBCC CHECKDB against that.

Or am I overthinking the risks here?

J. Mini
  • 1,161
  • 8
  • 32
Joe Obbish
  • 32,976
  • 4
  • 74
  • 153

3 Answers3

2

Definitely not overthinking things when it comes to checkdb. :-)
Backups (and restore process) should periodically be verified by full restore; simlarly AG secondaries - and secondary data copies maintained by any other means such as storage-level replication - should get integrity checks periodically.

Does frequency of failover have any licensing effect/liability for Basic Availability Groups? If not, failing over periodically can be a good idea in its own right to make sure all services and connecting clients maintain failover awareness. Maybe that timeframe could line up with a reasonable checkdb schedule between the systems, too. Of course this also depends on whether the service outage associated with planned failover can be tolerated.

Otherwise, I recommend checking out the option you mentioned of creating the snapshot of the Basic Availability Group secodary and running checkdb against that. Yes, as has been mentioned in a comment checkdb uses an internal snapshot database anyway unless TABLOCK is specified. But the explicit snapshot db may be enough to circumvent the Basic Availability Group secondary database prohibition against checkdb. I've heard this practice was employed in the past to circumvent a similar prohibition against checkdb in Database Mirroring secondary databases :-).

sqL_handLe
  • 1,576
  • 10
  • 15
1

"Or am I overthinking the risks here?"

No, I think you're worrying about the right things.

You can't spell Schrodinger without DR.

"Schoinge" doesn't roll off the tongue the same way.

Just like you can't rely on your backups in a disaster if you haven't tested restores, you can't rely on your DR site if you haven't tested using it.

For the purposes of this question, I am also going to assume that "DR" means "a secondary datacenter in a separate location for when the primary data center is unable to serve users."

What I would do:

I would personally take copy-only backups from the secondary as part of my existing DR plan. This ensures I have a backup copy at the DR location, in addition to any backup copies at the primary location. Boom, now I have my offsite backups for the DR plan. ✅

Then, I need to test restores of those backups on a regular basis to ensure I'm not relying on corrupt backups. I can implement some sort of automated "Restore my most recent offsite backup" plan, possibly by extending what I am already doing with testing restores of my backups from Production. While those backups are restored, I would do a CHECKDB on that restored copy. Since the restored copy originated from the DR AG replica, this should be sufficient for validating that the DR AG replica doesn't have corruption. ✅

Test failovers too

Testing that you can fail over to the DR site & serve users from that location is an important part of DR testing, but I'd separate that kind of DR test from running CHECKDB. If failing over to DR to test DR readiness also requires completing CHECKDB that could require the DR test be a very long test if your databases are big. I wouldn't want to entangle the two.

AMtwo
  • 16,348
  • 1
  • 33
  • 64
-1

I would just failover to the secondary, making it the new primary, and run CHECKDB against the database in the basic availability group. I don't see an advantage to using a database snapshot for CHECKDB and there are disadvantages to using DB snapshots as mentioned here.

Among those disadvantages are that as data changes in the source database the sparse file grows larger and larger. Also, additional overhead happens on the source database for updates due to the copy-on-write technique.

Lee M
  • 396
  • 2
  • 11