2

I've been trying to track down the source of my tempdb log filling up every few weeks where the tempdb log space used will continue to grow until it reaches 99% and stays there for hours while the log file grows. After adding in some additional logging and email alerts, I think I identified the source of the problem, at least for this incident, not sure if the cause is the same in past incidents.

dbcc opentran in the tempdb reported this:

Oldest active transaction:
    SPID (server process ID): 47s
    UID (user ID) : -1
    Name          : sort_init
    LSN           : (41:3017:42)
    Start time    : Dec 19 2020 10:03:20:410AM
    SID           : 0x01
DBCC execution completed. If DBCC printed error messages, contact your system administrator.

The spid is 47s which I discovered is a system spid. I ran sp_WhoIsActive with @show_system_spids = 1 which brought back data on the process. After running sp_WhoIsActive over the course of an hour, tempdb_allocations ended up at 4,221,088, tempdb_current was 4,128,656, open_tran_count was 1 (throughout the entire incident) and sql_text was always null. After running out of ideas, I ended up restarting SQL Server. Several times prior to restarting SQL server, I issued a checkpoint in tempdb but that didn't help, likely since the transaction from the system process was still active.

The Start time from the dbcc opentran output is 10:03am and it was 1:00pm when I restarted SQL Server. I ran dbcc opentran a few times over the course of an hour and the same output was returned each time.

In this particular case, the server was rebooted and it restarted at 10:02:54 -- 26 seconds before this long running system process started at 10:03:20. Past incidents don't appear to have happened immediately after a system reboot. I don't have many details on past incidents, but at least the last time this occurred 9 days ago, in the sp_WhoIsActive output I collected (without @show_system_spids set to 1) there were no long running processes captured which likely means it was a similar system process running / filling up the tempdb log. This 99% full tempdb log incident 9 days ago was ongoing for 7 hours.

Nothing stands out in the SQL Server log and Windows logs. I'm running Microsoft SQL Server 2017 (RTM-CU15-GDR) (KB4505225) - 14.0.3192.2 (X64).

Is there a way to determine what a system SPID is doing? Or any tips on how I can better troubleshoot this problem when it happens next time? The only clue from this incident is the Name value of sort_init in the dbcc opentran output which could be helpful but seems generic at the same time.

Ben Amada
  • 293
  • 3
  • 9

1 Answers1

1

There are some newer DMVs designed specifically for troubleshooting what is happening in TEMPDB. Take a look at sys.dm_db_session_space_usage and sys.dm_db_task_space_usage. See this link for a discussion of those DMSVs

Lee M
  • 396
  • 2
  • 11