tl;dr: why is CHECKDB reading the transaction log for a user database with memory optimized tables?
It appears that CHECKDB is reading the transaction log file of the user database when it's checking on one of my databases - in particular, a database that uses in-memory OLTP tables.
CHECKDB for this database still finishes in a reasonable amount of time, so I'm mostly just curious about the behavior; but it's definitely the longest duration for CHECKDB of all the databases on this instance.
In looking over Paul Randal's epic "CHECKDB From Every Angle: Complete description of all CHECKDB stages," I see that pre-SQL 2005 CHECKDB used to read the log in order to get a consistent view of the database. But since this is 2016, it uses an internal database snapshot.
However, one of the prerequisites for snapshots is that:
The source database must not contain a MEMORY_OPTIMIZED_DATA filegroup
My user database has one of these filegroups, so it looks like snapshots are off the table.
According to the CHECKDB docs:
If a snapshot cannot be created, or TABLOCK is specified, DBCC CHECKDB acquires locks to obtain the required consistency. In this case, an exclusive database lock is required to perform the allocation checks, and shared table locks are required to perform the table checks.
Okay, so we're doing database and table locking instead of snapshots. But that still doesn't explain why it has to read the transaction log. So what gives?
I've provided a script below to reproduce the scenario. It uses sys.dm_io_virtual_file_stats to identify the log file reads.
Note that most of the time it reads a small portion of the log (480 KB), but it occasionally reads much more (48.2 MB). In my production scenario, it reads most of the log file (~1.3 GB of the 2 GB file) every night at midnight when we run CHECKDB.
Here's an example of the outputs I've gotten so far with the script:
collection_time num_of_reads num_of_bytes_read
2018-04-04 15:12:29.203 106 50545664
Or this:
collection_time num_of_reads num_of_bytes_read
2018-04-04 15:25:14.227 1 491520
If I replace the memory optimized objects with regular tables, the output looks like this:
collection_time num_of_reads num_of_bytes_read
2018-04-04 15:21:03.207 0 0
Why is CHECKDB reading the log file? And especially, why does it occasionally read a much larger portion of the log file?
Here is the actual script:
-- let's have a fresh DB
USE [master];
IF (DB_ID(N'LogFileRead_Test') IS NOT NULL)
BEGIN
ALTER DATABASE [LogFileRead_Test]
SET SINGLE_USER WITH ROLLBACK IMMEDIATE;
DROP DATABASE [LogFileRead_Test];
END
GO
CREATE DATABASE [LogFileRead_Test]
GO
ALTER DATABASE [LogFileRead_Test]
MODIFY FILE
(
NAME = LogFileRead_Test_log,
SIZE = 128MB
);
-- Hekaton-yeah, I want memory optimized data
GO
ALTER DATABASE [LogFileRead_Test]
ADD FILEGROUP [LatencyTestInMemoryFileGroup] CONTAINS MEMORY_OPTIMIZED_DATA;
GO
ALTER DATABASE [LogFileRead_Test]
ADD FILE
(
NAME = [LatencyTestInMemoryFile],
FILENAME = 'C:\Program Files\Microsoft SQL Server\MSSQL13.SQL2016\MSSQL\DATA\LogFileRead_Test_SessionStateInMemoryFile'
) TO FILEGROUP [LatencyTestInMemoryFileGroup];
GO
USE [LogFileRead_Test]
GO
CREATE TYPE [dbo].[InMemoryIdTable] AS TABLE (
[InMemoryId] NVARCHAR (88) COLLATE Latin1_General_100_BIN2 NOT NULL,
PRIMARY KEY NONCLUSTERED HASH ([InMemoryId]) WITH (BUCKET_COUNT = 240))
WITH (MEMORY_OPTIMIZED = ON);
GO
CREATE TABLE [dbo].[InMemoryStuff] (
[InMemoryId] NVARCHAR (88) COLLATE Latin1_General_100_BIN2 NOT NULL,
[Created] DATETIME2 (7) NOT NULL,
CONSTRAINT [PK_InMemoryStuff_InMemoryId] PRIMARY KEY NONCLUSTERED HASH ([InMemoryId]) WITH (BUCKET_COUNT = 240)
)
WITH (MEMORY_OPTIMIZED = ON);
GO
-- RBAR is the new black (we need some logs to read)
declare @j int = 0;
while @j < 100000
begin
INSERT INTO [dbo].[InMemoryStuff](InMemoryId, Created) VALUES ('Description' + CAST(@j as varchar), GETDATE());
set @j = @j + 1;
end
-- grab a baseline of virtual file stats to be diff'd later
select f.num_of_reads, f.num_of_bytes_read
into #dm_io_virtual_file_stats
from sys.dm_io_virtual_file_stats(default, default) f
where database_id = db_id('LogFileRead_Test') and file_id = FILE_IDEX('LogFileRead_Test_log');
-- hands off my log file, CHECKDB!
GO
DBCC CHECKDB ([LogFileRead_Test]) WITH NO_INFOMSGS, ALL_ERRORMSGS, DATA_PURITY;
-- grab the latest virtual file stats, and compare with the previous capture
GO
select f.num_of_reads, f.num_of_bytes_read
into #checkdb_stats
from sys.dm_io_virtual_file_stats(default, default) f
where database_id = db_id('LogFileRead_Test') and file_id = FILE_IDEX('LogFileRead_Test_log');
select
collection_time = GETDATE()
, num_of_reads = - f.num_of_reads + t.num_of_reads
, num_of_bytes_read = - f.num_of_bytes_read + t.num_of_bytes_read
into #dm_io_virtual_file_stats_diff
from #dm_io_virtual_file_stats f, #checkdb_stats t;
drop table #checkdb_stats;
drop table #dm_io_virtual_file_stats;
-- CHECKDB ignored my comment
select collection_time, num_of_reads, num_of_bytes_read
from #dm_io_virtual_file_stats_diff d
order by d.collection_time;
drop table #dm_io_virtual_file_stats_diff;
-- I was not raised in a barn
USE [master];
ALTER DATABASE [LogFileRead_Test]
SET SINGLE_USER WITH ROLLBACK IMMEDIATE;
DROP DATABASE [LogFileRead_Test];
Since this repro generally just generates 1 or 106 log file reads, I thought I'd dig into the 1 with a file_read and file_read_completed extended events session.
name timestamp mode offset database_id file_id size duration
file_read 2018-04-06 10:51:11.1098141 Contiguous 72704 9 2 0 NULL
file_read_completed 2018-04-06 10:51:11.1113345 Contiguous 72704 9 2 491520 1
And here's the VLF details (DBCC LOGINFO()) for context on those offsets and such:
RecoveryUnitId FileId FileSize StartOffset FSeqNo Status Parity CreateLSN
0 2 2031616 8192 34 2 64 0
0 2 2031616 2039808 35 2 64 0
0 2 2031616 4071424 36 2 64 0
0 2 2285568 6103040 37 2 64 0
0 2 15728640 8388608 38 2 64 34000000005200001
0 2 15728640 24117248 39 2 64 34000000005200001
0 2 15728640 39845888 40 2 64 34000000005200001
0 2 15728640 55574528 0 0 0 34000000005200001
0 2 15728640 71303168 0 0 0 34000000005200001
0 2 15728640 87031808 0 0 0 34000000005200001
0 2 15728640 102760448 0 0 0 34000000005200001
0 2 15728640 118489088 0 0 0 34000000005200001
So, the CHECKDB operation:
- started reading 63 KB (64,512 bytes) into the first VLF,
- read 480 KB (491,520 bytes), and
- did not read the last 1441 KB (1,475,584 bytes) of the VLF
I also captured the callstacks, in case they are helpful.
file_read callstack:
(00007ffd`999a0860) sqlmin!XeSqlPkg::file_read::Publish+0x1dc | (00007ffd`999a0b40) sqlmin!XeSqlPkg::file_read_enqueued::Publish
(00007ffd`9a825e30) sqlmin!FireReadEvent+0x118 | (00007ffd`9a825f60) sqlmin!FireReadEnqueuedEvent
(00007ffd`9980b500) sqlmin!FCB::AsyncRead+0x74d | (00007ffd`9980b800) sqlmin!FCB::AsyncReadInternal
(00007ffd`9970e9d0) sqlmin!SQLServerLogMgr::LogBlockReadAheadAsync+0x6a6 | (00007ffd`9970ec00) sqlmin!LBH::Destuff
(00007ffd`9970a6d0) sqlmin!LogConsumer::GetNextLogBlock+0x1591 | (00007ffd`9970ab70) sqlmin!LogPoolPrivateCacheBufferMgr::Lookup
(00007ffd`9a9fcbd0) sqlmin!SQLServerLogIterForward::GetNext+0x258 | (00007ffd`9a9fd2d0) sqlmin!SQLServerLogIterForward::GetNextBlock
(00007ffd`9aa417f0) sqlmin!SQLServerCOWLogIterForward::GetNext+0x2b | (00007ffd`9aa418c0) sqlmin!SQLServerCOWLogIterForward::StartScan
(00007ffd`9aa64210) sqlmin!RecoveryMgr::AnalysisPass+0x83b | (00007ffd`9aa65100) sqlmin!RecoveryMgr::AnalyzeLogRecord
(00007ffd`9aa5ed50) sqlmin!RecoveryMgr::PhysicalRedo+0x233 | (00007ffd`9aa5f790) sqlmin!RecoveryMgr::PhysicalCompletion
(00007ffd`9aa7fd90) sqlmin!RecoveryUnit::PhysicalRecovery+0x358 | (00007ffd`9aa802c0) sqlmin!RecoveryUnit::CompletePhysical
(00007ffd`9a538b90) sqlmin!StartupCoordinator::NotifyPhaseStart+0x3a | (00007ffd`9a538bf0) sqlmin!StartupCoordinator::NotifyPhaseEnd
(00007ffd`9a80c430) sqlmin!DBTABLE::ReplicaCreateStartup+0x2f4 | (00007ffd`9a80c820) sqlmin!DBTABLE::RefreshPostRecovery
(00007ffd`9a7ed0b0) sqlmin!DBMgr::SyncAndLinkReplicaRecoveryPhase+0x890 | (00007ffd`9a7edff0) sqlmin!DBMgr::DetachDB
(00007ffd`9a7f2cd0) sqlmin!DBMgr::CreatePhasedTransientReplica+0x869 | (00007ffd`9a7f3630) sqlmin!DBMgr::StrandTransientReplica
(00007ffd`9a7f2ae0) sqlmin!DBMgr::CreateTransientReplica+0x118 | (00007ffd`9a7f2cd0) sqlmin!DBMgr::CreatePhasedTransientReplica
(00007ffd`99ec6d30) sqlmin!DBDDLAgent::CreateReplica+0x1b5 | (00007ffd`99ec6f90) sqlmin!FSystemDatabase
(00007ffd`9abaaeb0) sqlmin!UtilDbccCreateReplica+0x82 | (00007ffd`9abab000) sqlmin!UtilDbccDestroyReplica
(00007ffd`9ab0d7e0) sqlmin!UtilDbccCheckDatabase+0x994 | (00007ffd`9ab0ffd0) sqlmin!UtilDbccRetainReplica
(00007ffd`9ab0cfc0) sqlmin!DbccCheckDB+0x22d | (00007ffd`9ab0d380) sqlmin!DbccCheckFilegroup
(00007ffd`777379c0) sqllang!DbccCommand::Execute+0x193 | (00007ffd`77737d70) sqllang!DbccHelp
(00007ffd`777e58d0) sqllang!CStmtDbcc::XretExecute+0x889 | (00007ffd`777e6250) sqllang!UtilDbccSetPermissionFailure
(00007ffd`76b02eb0) sqllang!CMsqlExecContext::ExecuteStmts<1,1>+0x40d | (00007ffd`76b03410) sqllang!CSQLSource::CleanupCompileXactState
(00007ffd`76b03a60) sqllang!CMsqlExecContext::FExecute+0xa9e | (00007ffd`76b043d0) sqllang!CCacheObject::Release
(00007ffd`76b03430) sqllang!CSQLSource::Execute+0x981 | (00007ffd`76b039b0) sqllang!CSQLLock::Cleanup
file_read_completed callstack:
(00007ffd`99995cc0) sqlmin!XeSqlPkg::file_read_completed::Publish+0x1fc | (00007ffd`99995fe0) sqlmin!XeSqlPkg::file_write_completed::Publish
(00007ffd`9a826630) sqlmin!FireIoCompletionEventLong+0x227 | (00007ffd`9a8269c0) sqlmin!IoRequestDispenser::Dump
(00007ffd`9969bee0) sqlmin!FCB::IoCompletion+0x8e | (00007ffd`9969c180) sqlmin!IoRequestDispenser::Put
(00007ffd`beaa11e0) sqldk!IOQueue::CheckForIOCompletion+0x426 | (00007ffd`beaa1240) sqldk!SystemThread::GetCurrentId
(00007ffd`beaa15b0) sqldk!SOS_Scheduler::SwitchContext+0x173 | (00007ffd`beaa18a0) sqldk!SOS_Scheduler::Switch
(00007ffd`beaa1d00) sqldk!SOS_Scheduler::SuspendNonPreemptive+0xd3 | (00007ffd`beaa1db0) sqldk!SOS_Scheduler::ResumeNoCuzz
(00007ffd`99641720) sqlmin!EventInternal<SuspendQueueSLock>::Wait+0x1e7 | (00007ffd`99641ae0) sqlmin!SOS_DispatcherPool<DispatcherWorkItem,DispatcherWorkItem,SOS_DispatcherQueue<DispatcherWorkItem,0,DispatcherWorkItem>,DispatcherPoolConfig,void * __ptr64>::GetDispatchers
(00007ffd`9aa437c0) sqlmin!SQLServerLogMgr::CheckLogBlockReadComplete+0x1e6 | (00007ffd`9aa44670) sqlmin!SQLServerLogMgr::ValidateBlock
(00007ffd`9970a6d0) sqlmin!LogConsumer::GetNextLogBlock+0x1b37 | (00007ffd`9970ab70) sqlmin!LogPoolPrivateCacheBufferMgr::Lookup
(00007ffd`9a9fcbd0) sqlmin!SQLServerLogIterForward::GetNext+0x258 | (00007ffd`9a9fd2d0) sqlmin!SQLServerLogIterForward::GetNextBlock
(00007ffd`9aa417f0) sqlmin!SQLServerCOWLogIterForward::GetNext+0x2b | (00007ffd`9aa418c0) sqlmin!SQLServerCOWLogIterForward::StartScan
(00007ffd`9aa64210) sqlmin!RecoveryMgr::AnalysisPass+0x83b | (00007ffd`9aa65100) sqlmin!RecoveryMgr::AnalyzeLogRecord
(00007ffd`9aa5ed50) sqlmin!RecoveryMgr::PhysicalRedo+0x233 | (00007ffd`9aa5f790) sqlmin!RecoveryMgr::PhysicalCompletion
(00007ffd`9aa7fd90) sqlmin!RecoveryUnit::PhysicalRecovery+0x358 | (00007ffd`9aa802c0) sqlmin!RecoveryUnit::CompletePhysical
(00007ffd`9a538b90) sqlmin!StartupCoordinator::NotifyPhaseStart+0x3a | (00007ffd`9a538bf0) sqlmin!StartupCoordinator::NotifyPhaseEnd
(00007ffd`9a80c430) sqlmin!DBTABLE::ReplicaCreateStartup+0x2f4 | (00007ffd`9a80c820) sqlmin!DBTABLE::RefreshPostRecovery
(00007ffd`9a7ed0b0) sqlmin!DBMgr::SyncAndLinkReplicaRecoveryPhase+0x890 | (00007ffd`9a7edff0) sqlmin!DBMgr::DetachDB
(00007ffd`9a7f2cd0) sqlmin!DBMgr::CreatePhasedTransientReplica+0x869 | (00007ffd`9a7f3630) sqlmin!DBMgr::StrandTransientReplica
(00007ffd`9a7f2ae0) sqlmin!DBMgr::CreateTransientReplica+0x118 | (00007ffd`9a7f2cd0) sqlmin!DBMgr::CreatePhasedTransientReplica
(00007ffd`99ec6d30) sqlmin!DBDDLAgent::CreateReplica+0x1b5 | (00007ffd`99ec6f90) sqlmin!FSystemDatabase
(00007ffd`9abaaeb0) sqlmin!UtilDbccCreateReplica+0x82 | (00007ffd`9abab000) sqlmin!UtilDbccDestroyReplica
(00007ffd`9ab0d7e0) sqlmin!UtilDbccCheckDatabase+0x994 | (00007ffd`9ab0ffd0) sqlmin!UtilDbccRetainReplica
(00007ffd`9ab0cfc0) sqlmin!DbccCheckDB+0x22d | (00007ffd`9ab0d380) sqlmin!DbccCheckFilegroup
(00007ffd`777379c0) sqllang!DbccCommand::Execute+0x193 | (00007ffd`77737d70) sqllang!DbccHelp
These stacktraces correlate with Hannah's answer indicating that CHECKDB is using an internal snapshot in spite of the presence of the Hekaton tables.
I've read that snapshots perform recovery to undo uncommitted transactions:
Uncommitted transactions are rolled back in a newly created database snapshot because the Database Engine runs recovery after the snapshot has been created (transactions in the database are not affected).
But this still doesn't explain why a large chunk of the log file is often read in my production scenario (and occasionally in the repro provided here). I don't think I have that many in-flight transactions at a given time in my app, and there certainly aren't any in the repro here.