1

After SQL Server 2019 Availability Group failover, I need to wait half hour to connect database by available listener ip from client.

There are the following servers in this case:

  • node1, node2, AD-server (same subnet)
  • client (different subnet)
  • node1 and node2 are always on nodes which uses AD user as login

The case happens when failover

  • node1 can connect to the database by available listener ip
  • node2 can connect to the database by available listener ip
  • AD-server can connect to the database by available listener ip
  • client can't connect to the database by available listener ip (also can't telnet)

After half hour, client can connect database by available listener ip (also can telnet)

I suspect the case is caused by DNS. Anyone have the same experience?

There are some solution I tested but didn't solve the problem:

2 Answers2

1

node1, node2, AD-server (same subnet)

After SQL Server 2019 Availability Group failover, I need to wait half hour to connect database by available listener ip from client.

Pretty much every technology that does this, does it by Gratuitous ARP due to the fact the IPs are on the same subnet. Windows is no different. You'll need to work with whomever on your team does network configuration and have them check the switches for the proper setting for GARP.

Many places tend to turn it off (all cloud providers do as well) for 'security' reasons.

Sean Gallardy
  • 38,135
  • 3
  • 49
  • 91
1

20231230 update:

After testing in UAT, we found that the ARP table is not updated after failover. After that, we found that the registry key " ArpRetryCount " is set as 0 in the DB's window server. (In HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Services\Tcpip\Parameters)

Change this key value to 3 and reboot, case solved!

ref: https://www.myitblog.co.uk/microsoft/microsoft-failover-cluster-node-not-sending-out-gratuitous-arp-requests-after-a-failover/

ref2: https://kb.vmware.com/s/article/1028373