Using SQL Server 2016, Always-On
Running a simple SELECT on a table within a secondary Availability Group.
select distinct some_column
from some_table oo (nolock)
inner join some_other_ ss (nolock) on ss.some_column= oo.some_other_column
SSMS eventually sends back this error:
Msg 3948, Level 16, State 2, Line 14061 The transaction was terminated because of the availability replica config/state change or because ghost records are being deleted on the primary and the secondary availability replica that might be needed by queries running under snapshot isolation. Retry the transaction.
The same SELECT with (NOLOCK) works fine on the primary AG.
With or without the NOLOCK it fails most of the time 9/10 and sometimes executes correctly but that's rare. No changes were made to the AG at all.
There are. no changes in the AlwaysOn_Health extended event session.
The low_water_mark_for_ghosts in sys.dm_hadr_database_replica_states is NULL for some and for others there is a value. For those with a value, this is also the primary.
Using the script provided by Nic:
;
WITH PrimaryStats
AS ( SELECT DB_NAME(database_id) AS DatabaseName ,
low_water_mark_for_ghosts ,
ar.replica_server_name ,
ar.availability_mode_desc
FROM sys.dm_hadr_database_replica_states hdrs
JOIN sys.availability_replicas ar ON hdrs.replica_id = ar.replica_id
WHERE ar.replica_server_name = @@SERVERNAME
)
SELECT DB_NAME(database_id) AS DatabaseName ,
hdrs.low_water_mark_for_ghosts AS LowWaterMarkSecondaryReplica,
ps.low_water_mark_for_ghosts AS LowWaterMarkLocalReplica,
ps.low_water_mark_for_ghosts - hdrs.low_water_mark_for_ghosts AS GhostWatermarkDiff,
ar.replica_server_name AS ReplicaNode,
DATEDIFF(SECOND, last_redone_time, GETDATE()) AS RedoDiffSec,
last_redone_time
FROM sys.dm_hadr_database_replica_states hdrs
JOIN sys.availability_replicas ar ON hdrs.replica_id = ar.replica_id
JOIN PrimaryStats ps ON ps.DatabaseName = DB_NAME(database_id)
ORDER BY
DatabaseName ASC,'NODE ' + right(ar.replica_server_name ,1);
The output is:
When the LowWaterMarkSecondaryReplica is NULL, its because that's the instance I'm currently executing this query from.
