7

I've run into a problem on our production sql server where temp table objects take a long time to drop (obvious when small and synchronous drop is used). I can't reproduce this on other sql servers (similarly spec'd with same numbers of spindles serving tempdb data files (split in same number of files (1 per physical core)). On SQL 2005 Enterprise (SP2 - 3042).

Update: one more factor - which is looking the most likely. this server has >500 databases on it. Another server with > 800 also runs these drops slow. That's the only other server I've got with a lot of dbs on it.

Second Update: restart of the problem servers will allow the create and drop statements to execute instantaneously. Performance of the test degrades over the next few hours (while application is running) until it hits (what appears to be) a plateau. I've got a job running in the background that is testing this every 30 mins. I'll see what results are after a few days and see if execution times are the same. I think they will be.

Third Update: While none of the executing statements have shown latch waits on CPU resources, using sp_whoisactive I see that during a delta_interval = 30 (seconds) run, running query the CPU_delta reports roughly 30,000 (milliseconds?) and when I watch perf during execution there appears to be one core worth of cpu spike during the execution time. these are on 16 cpu boxes so it can be a bit tough to see via perfmon when other traffic is occurring, but it appears to be spiking a cpu worth during execution of drop statements.

Create and destruction of 20 tiny temp tables with unique names (one column, no rows) takes less than 20ms on most servers I test it on. On one server it takes > 5 seconds. The vast majority (>95%) of the time is spent on the drop statements.

During execution there are no explicit waits, and no blocking being reported, and perfmon doesn't show any load on the I/O subsystem for either data or log files.

I've looked at peak and low usage times, when high number of tables marked for destruction and low. The operation takes 5 seconds or so to handle the 20 drop statements, no matter what. The issue is causing perceivable (by clients) slowdown to responsiveness.

Sample code, I created 20 like objects to get the 5 second timing. It appears to be about 300ms per drop.

PRINT CONVERT(varchar(30),GETDATE(),113)


CREATE TABLE [#Objects1]
(
  [Id] uniqueidentifier NOT NULL
)


CREATE TABLE [#Objects12]
(
  [Id] uniqueidentifier NOT NULL
)
...
DROP TABLE [#Objects1]
DROP TABLE [#Objects12]
...
PRINT CONVERT(varchar(30),GETDATE(),113)

Timing consistently 5 to 6 seconds to execute

11 Nov 2011 12:56:52:073 - Begin temp table creation

11 Nov 2011 12:56:52:090 - End temp table creation

11 Nov 2011 12:56:52:090 - Begin temp table drop

11 Nov 2011 12:56:57:230 - Complete temp table drop

Could you also run USE tempdb; DBCC LOGINFO; and record the number of rows returned. Please add all output from the scripts to your original question.

I noticed originally that I had about 271 vlogs, so I shrank and regrew to see if fragmentation was a problem. Made no difference. Current DBCC Loginfo

FileId,FileSize,StartOffset,FSeqNo,Status,Parity,CreateLSN
2,253952,8192,101603,2,64,0
2,262144,262144,101604,2,64,0
2,262144,524288,101605,2,64,85000000038300574
2,262144,786432,101606,2,64,85000000038300574
2,262144,1048576,101578,2,128,86000000001600001
2,262144,1310720,101579,2,128,86000000001600001
2,262144,1572864,101580,2,128,86000000001600001
2,262144,1835008,101581,2,128,86000000001600001
2,262144,2097152,101582,2,128,86000000001600001
2,262144,2359296,101583,2,128,86000000023400002
2,262144,2621440,101584,2,128,86000000023500756
2,327680,2883584,101585,2,128,86000000023500756
2,327680,3211264,101586,2,128,86000000023500756
2,393216,3538944,101587,2,128,86000000023500756
2,393216,3932160,101588,2,128,86000000023500756
2,458752,4325376,101589,2,128,86000000023500756
2,253952,4784128,101590,2,128,86000000023500756
2,270336,5038080,101591,2,128,86000000023500756
2,253952,5308416,101592,2,128,86000000023500756
2,270336,5562368,101593,2,128,86000000023500756
2,253952,5832704,101594,2,128,86000000023500756
2,335872,6086656,101595,2,128,86000000023500756
2,253952,6422528,101596,2,128,86000000023500756
2,401408,6676480,101597,2,128,86000000023500756
2,253952,7077888,101598,2,128,86000000023500756
2,466944,7331840,101599,2,128,86000000023500756
2,253952,7798784,101600,2,128,86000000023500756
2,253952,8052736,101601,2,128,86000000023500756
2,278528,8306688,101602,2,128,86000000023500756
2,133627904,8585216,101607,2,64,101336000000013600462
2,133627904,142213120,101563,0,128,101336000000013600462
2,133627904,275841024,101564,0,128,101336000000013600462
2,133627904,409468928,101565,0,128,101336000000013600462
2,133627904,543096832,101566,0,128,101336000000013600462
2,133627904,676724736,101567,0,128,101336000000013600462
2,133627904,810352640,101568,0,128,101336000000013600462
2,133627904,943980544,101569,0,128,101336000000013600462
2,133627904,1077608448,101570,0,128,101336000000013600462
2,133627904,1211236352,101571,0,128,101336000000013600462
2,133627904,1344864256,101572,0,128,101336000000013600462
2,133627904,1478492160,101573,0,128,101336000000013600462
2,133627904,1612120064,101574,0,128,101336000000013600462
2,133627904,1745747968,101575,2,128,101336000000013600462
2,133627904,1879375872,101576,2,128,101336000000013600462
2,134479872,2013003776,101577,2,128,101336000000013600462

IO stats output:

Database Name,physical_name,io_stall_read_ms,num_of_reads,avg_read_stall_ms,io_stall_write_ms,num_of_writes,avg_write_stall_ms,io_stalls,total_io,avg_io_stall_ms
msdb,H:\Microsoft SQL Server\MSSQL.1\MSSQL\DATA\MSDBData.mdf,47691565,817329,58.4,10747142,533509,20.1,58438707,1350838,43.3
tempdb,H:\Microsoft SQL Server\MSSQL.1\MSSQL\DATA\templog.ldf,54457,30177,1.8,145691717,8235651,17.7,145746174,8265828,17.6
model,H:\Microsoft SQL Server\MSSQL.1\MSSQL\DATA\modellog.ldf,547,122,4.4,2273,239,9.5,2820,361,7.8
tempdb,P:\tempdb_data3.mdf,2606066,1112043,2.3,17829023,1919954,9.3,20435089,3031997,6.7
tempdb,P:\tempdb_data2.mdf,2484793,1111808,2.2,17270161,1922735,9.0,19754954,3034543,6.5
tempdb,P:\tempdb_data5.mdf,2514469,1112086,2.3,17066589,1919802,8.9,19581058,3031888,6.5
tempdb,P:\tempdb_data7.mdf,2542070,1112551,2.3,17049649,1920204,8.9,19591719,3032755,6.5
tempdb,P:\tempdb_data6.mdf,2517767,1112237,2.3,17043756,1924983,8.9,19561523,3037220,6.4
tempdb,P:\tempdb_data0.mdf,2476811,1113570,2.2,17084779,1926333,8.9,19561590,3039903,6.4
tempdb,P:\tempdb_data4.mdf,2462179,1111649,2.2,17073336,1920058,8.9,19535515,3031707,6.4
tempdb,P:\tempdb_data1.mdf,2456317,1111859,2.2,16997589,1922438,8.8,19453906,3034297,6.4
model,H:\Microsoft SQL Server\MSSQL.1\MSSQL\DATA\model.mdf,5194,798,6.5,612,240,2.5,5806,1038,5.6
master,H:\Microsoft SQL Server\MSSQL.1\MSSQL\DATA\master.mdf,40640,7326,5.5,2868,1548,1.9,43508,8874,4.9
msdb,H:\Microsoft SQL Server\MSSQL.1\MSSQL\DATA\MSDBLog.ldf,8015,950,8.4,1012107,312084,3.2,1020122,313034,3.3
master,H:\Microsoft SQL Server\MSSQL.1\MSSQL\DATA\mastlog.ldf,640,141,4.5,198283,99134,2.0,198923,99275,2.0

Waitstats output:

wait_type,wait_time_s,pct,running_pct

PAGEIOLATCH_EX,0.02,100.00,100.00

TokenAndPermUserStore size is 2952kb.

SELECT SUM(single_pages_kb + multi_pages_kb) AS "SecurityTokenCacheSize(kb)" FROM sys.dm_os_memory_clerks WHERE name = 'TokenAndPermUserStore' 
JorgeSandoval
  • 383
  • 3
  • 12

4 Answers4

8

You didn't mention the IOPS used by the tempdb LUN. Are you seeing any IO waits on the LUN device? Is anything else on that LUN? Is another system using the disk group/pool that the LUN is carved from?

Ditto master db -- same q's.

Also, are you using local temp tables (#tablename) or global temp tables (##tablename)? If global could another SPID be keeping the shared lock on the global temp tables when you're trying to delete them?

bretlowery
  • 81
  • 1
8

The answer to my perf problem is a little known (though I had seen it a long time ago) issue related to collection of dmv statistics on systems with many databases where code uses temp tables a lot (or presumably you have a lot of tables). On our slowest servers sys.dm_db_index_usage_stats had > 200k records, with servers returning 10k records still having a measureable lag.

KB article describes the problem. http://support.microsoft.com/kb/2003031

Query performance decreases when sys.dm_db_index_usage_stats has large number of rows

Consider the following scenario:

  • In Microsoft SQL Server 2005, you frequently perform DDL operation that involve dropping and recreation of lot of tables (especially temporary tables in tempdb database).
  • You have a large number of entries (100,000 or more) in the sys.dm_db_index_usage_stats dynamic management view (DMV).

This issue is fixed in SQL 2008, but can only be alleviated in 2005 by code changes to reduce dependence on temp tables or by enabling trace flag 2330.

This KB has saved my bacon. Restarting after adding this trace flag to start up params has reduced drop times down to <100 ms levels (from 40000 ms!), and I can expect other perf gains on these servers as well!

Additional reading that led me to the conclusion that this was the problem

http://www.pythian.com/news/5211/sql-server-more-light-shed-on-non-yielding-scheduler-and-indexes-stats/

Thanks to all who contributed to this question, I've gained quite a few very useful tools for diagnosing perf issues in the process.

JorgeSandoval
  • 383
  • 3
  • 12
6

When a drop-table is waiting for more than a second or two, run Adam Machanic's sp_WhoIsActive. It's a great tool to find out why any query is running slow. In the waits column, you'll see the waits for the specific task, and if it's stuck waiting on CPU (which won't show up in sys.dm_os_wait_stats) you'll see that too.

Also, I'd be remiss if I didn't point out that your build is out of support. As a consultant, I tell my clients, "I would hate to do hours of troubleshooting only to find that it's been fixed in a service pack - and end up billing you thousands of dollars for something you could have fixed for free. Plus you're out of support anyway." You can find the most recent SQL Server service packs & cumulative updates at http://sqlserverbuilds.blogspot.com. Build 3042 is way out of date - the supported version is SP4.

Brent Ozar
  • 43,325
  • 51
  • 233
  • 390
5

Edit: 2011-11-12 13:00

As TokenAndPermStore looks to be a "normal" size, it's unlikely to be a factor. Next place to look is probably spinlock stats. Can you also (as I mentioned in chat) determine if the problem persists if you remove the explicit drop statements and let the tables fall out of scope instead.

Edit: 2011-11-12 01:00

As you're on 2005SP2, I'm starting to wonder if this might be a TokenAndPermUserStore issue. It would explain why the problem is alleviated by a restart of the instance and why the problem is prevalent on servers with large numbers of database.

Can you run the following query and add the output to your question.

SELECT SUM(single_pages_kb + multi_pages_kb) AS "SecurityTokenCacheSize(kb)"
FROM sys.dm_os_memory_clerks
WHERE name = 'TokenAndPermUserStore'

The quick hack fix for this is to create a job that monitors the size of the cache and runs DBCC FREESYSTEMCACHE ('TokenAndPermUserStore') when it exceeds a certain size. IIRC (I'll check tomorrow and update) I've set the limit at 100MB on servers where this caused a problem. SP3 introduces a mechanism to control the cache size, without having to brute force clear it.

Original answer:

Might take a bit of to and fro but can we start with a look at the IO and wait statistics? (scripts in GIST)

Modify your create/drop script to run DBCC SQLPERF("sys.dm_os_wait_stats",CLEAR); at the start, followed by the wait stats script after your final drop. Result will probably be easier to read if you post them as CSV (so they can be copied to Excel).

Could you also run USE tempdb; DBCC LOGINFO; and record the number of rows returned.

Mark Storey-Smith
  • 31,860
  • 9
  • 90
  • 125