2

So when updating a two node cluster, our procedure is this:

  1. Backup all databases.
  2. Failover all SQL cluster roles to Node A.
  3. Install the patches on Node B (passive).
  4. Restart Node B.
  5. Failover all SQL cluster roles to Node B.
  6. Install the patches on Node A (passive).
  7. Restart Node A

My understanding is that in step 5, the updated node, will perform some updates before taking on incoming connections, as can be seen in the log.

What happens if another failover occurs before Node A is updated? If, for example, I have updated to a higher CU on Node B. Will the failover fail or just fail over and start itself on a lower CU level?

2 Answers2

2

When the database is opened on Node B after Node B has been patched with a higher revision of SQL Server code, that database is upgraded via a series of scripts. Those scripts are managed using the same ACID-compliance as any other activity done in the database. The upgrade is an all-or-nothing action controlled via transactions.

If there is a cluster failover during upgrade of the database on node B, any uncommitted changes made will roll backwards or forwards as part of the recovery process when the database is opened on node A. This should allow Node A to open the database reliably. That being said, if the database has completely upgraded to the new version prior to the failover, Node A will not be able to open the database if it hasn't been upgraded to the same engine code as Node B. You can avoid this scenario by having a third member in the cluster such that you upgrade the unused Node C first, then upgrade the unused Node B, then failover to Node B, then upgrade Node A. If there is a problem at Node A preventing upgrade, or a failover needs to happen for any reason while the update of Node A is happening, you simply failover from Node B to Node C, and rectify the issue with Node A at your leisure.

FYI, for business-critical data, part of your backup process in step 1 should probably involve testing that backup via a restore to some other machine just to ensure you have a reliable backup.

Hannah Vernon
  • 70,928
  • 22
  • 177
  • 323
-1

My coworker argued that the following statement was wrong in Hannah's answer above:

Node A will not be able to open the database if it hasn't been upgraded to the same engine code as Node B

After testing it out on a two node cluster it seems my coworker was right. The SQL server downgrades back to a lower CU when failing over to unpatched node. See below:

  • Node A, Active node, unpatched
  • Node B, Passive node, unpatched
  1. Node B, Patched up, from CU22 to CU25

Node A, Active, before failover:

select @@VERSION;

Microsoft SQL Server 2017 (RTM-CU22) (KB4577467) - 14.0.3356.20 (X64)
Aug 20 2020 22:33:27
Copyright (C) 2017 Microsoft Corporation
Enterprise Edition: Core-based Licensing (64-bit) on Windows Server 2019 Standard 10.0 <X64> (Build 17763: ) (Hypervisor)

  1. Failover to Node B

Node B, Active, after failover

select @@VERSION;

Microsoft SQL Server 2017 (RTM-CU25) (KB5003830) - 14.0.3401.7 (X64)
Jun 25 2021 14:02:48
Copyright (C) 2017 Microsoft Corporation
Enterprise Edition: Core-based Licensing (64-bit) on Windows Server 2019 Standard 10.0 <X64> (Build 17763: ) (Hypervisor)

  1. Second failover back to Node A

4. Node A, Active, after failover

select @@VERSION;

Microsoft SQL Server 2017 (RTM-CU22) (KB4577467) - 14.0.3356.20 (X64)
Aug 20 2020 22:33:27
Copyright (C) 2017 Microsoft Corporation Enterprise Edition: Core-based Licensing (64-bit) on Windows Server 2019 Standard 10.0 <X64> (Build 17763: ) (Hypervisor)