0

Due to maintenance our primary site will be offline for few hours. I have an availability group where there are 2 nodes at the primary site and one node at the DR Site.

Since the Primary site will be off so I will need to failover to the DR site before the maintenance. My question is if I failover to the DR site and shut off all the nodes at the Primary site one after another and once the maintenance is over I simply turn back on the nodes at the primary server one by one and then failover to the primary site will that create a split brain scenario.

Since the primary site will be down will the Windows server failover cluster go down and the availability group becomes inaccessible?

SQL_NoExpert
  • 1,107
  • 1
  • 21
  • 37

2 Answers2

1

if I failover to the DR site and shut off all the nodes at the Primary site one after another and once the maintenance is over I simply turn back on the nodes at the primary server one by one and then failover to the primary site will that create a split brain scenario.

Possibly, and you should plan for it. The primary nodes are supposed to communicate with the DR node and not try to form a local quorum group and bring the AG online. But don't rely on that behavior, as is depends on the ability of the primary nodes to communicate with the DR node.

Instead of forcing quorum in DR, remove the quorum votes from the primary nodes after you fail over to the DR node. Then after the primary site comes back up, restore the votes and fail back.

David Browne - Microsoft
  • 49,000
  • 3
  • 53
  • 102
0

Do you have automatic failover setup between your primary site and your DR?

  • NO, you don't need to worry about it.
  • YES, you should change to manual failover for the duration of your maintenance.

But this depends on where the witness is. If you are using the second node in your primary site as the witness (likely) and have automatic failover then you do run the risk of split brain. But if you use a cloud witness (i.e. Azure) then you shouldn't need to worry about it; a node won't elect to be primary automatically if it can't talk to the witness.

Do you have enough log space in your DR site to support your transactional load while the primary is having maintenance? I.E. if your maintenance is going to be for a day+ then you may want to consider breaking the Availability Group and rebuilding it after the fact. Keep an eye on your log file in your DR site to make sure you don't have an unexpected outage.

Jonathan Fite
  • 9,414
  • 1
  • 25
  • 30