2

In trying to determine the most appropriate High Availability option we are focused on reducing downtime (either planned or unplanned). I've been able to gather stats (from reading lots of MSDN documentation and blogs) on Failover Cluster Instances (FCI).

What I haven't found is documentation around the failover stats/times of Availability Groups.

As a comparison (FCI):
Failing over a 2008R2 cluster can range from 30 seconds to 5 minutes (depending on traffic volumes as well as hardware/network setup):

When doing a manual failover, it is completing its writes to the LUNs, switching the LUNs to the new Active node and starting up the SQL Server instance on the new Active node.

When doing an automatic failover, after starting up the node it will do a consistency check on the databases and roll back any in-flight transactions.

Availability Groups
I understand that Availability Groups should be able to fail over quicker, when you compare the same hardware and traffic volumes.

What I haven't been able to find is any real world actual metrics comparing the two.

Specifically does anyone have any metrics on how long it takes to fail over the primary write node in an Availability Group that is being actively used?
(fail over to a synchronous secondary) Ideally this would include references to any Microsoft or trustworthy sources. Not opinion based please, only metrics.

Andrew Bickerton
  • 3,254
  • 5
  • 30
  • 38

1 Answers1

7

Personal observation from experience with both FCI and AG failovers, with reasonable high volume transactional system (40k trx/sec). For each consider 6 databases ranging in size from 500MB to 4TB in size. Times listed for failover are what it takes for the database to up and in a writeable state on the new node. Your mileage can, and will vary, but this is at least a data point for you.

Cluster failover: 47 seconds (avg)

AG Failover: 10 seconds

Nic
  • 4,063
  • 1
  • 16
  • 22