1

Our database has been giving this error occassionally... "Timeout expired. The timeout period elapsed prior to completion of the operation or the server is not responding. ---> System.ComponentModel.Win32Exception (0x80004005): The wait operation timed out at System.Data.SqlClient.SqlInternalConnection.OnError"

I have run perfmon with below results. How can I check what is causing such high disk I/O? The Avg. Disk Queue Length counter (blue) is very high.

Thanks!

enter image description here enter image description here

K09
  • 1,454
  • 13
  • 39
  • 61

2 Answers2

7

To the question you asked:

I wouldn't rely on disk queue alone. In fact I rarely even ever look at disk queue lengths unless I'm getting in deep with a problem. It is best to look at your disk's latency. Those are the Avg. Disk Sec/Read (or /Write and /Transfer) counters. That tells you what your disk latency is from Windows' perspective. So the time that the request was taking after sent to the disk and brought back.. Disk Queuing nowadays doesn't tell you a lot because most IO subsystems are able to handle a disk queue depth and have multiple spindles doing work in your RAID group often. Finally - In this case - your disk queue length doesn't even look that bad. From here it looks like the max it was in the time of this screenshot (for the average length) was 1.377. That's nothing on most SQL Server systems. Look at your actual latency. Also I don't look at % Disk Time.. I look at the idle time instead. That is a more reliable counter and you just have to do a little math to read it.. The more idle, the less activity.

To The General Question Behind Your Question

I'll ask this one by starting with a question - Why did you go right to your IO? There could be any number of things causing your slowdown. And to answer that exhaustively here is tough but a high level of some things to look at/consider:

  • Are you experiencing blocking? I would download SP_Whoisactive and have a look at that while you are getting these errors. Do you see blocking? Do you see the query behind the request(s) that are timing out? What is the duration?
    • Have you analyzed your SQL Server Wait Stats to see what your chief cause of waits are?
    • Do you know which query or queries are causing the timeouts? If so can you look at those and see if there is any room for tuning?

There could be many other things here. This could be on the connection or network. It could be blocking, it could be a need for index tuning and query tuning, it could be that you expect the queries to take longer than the default 30 second timeout, etc.

But I'd try and gather more data and then choose a path to go down. This is an old whitepaper but it is very useful to performance tuning by waits. While there will be new wait type Tom didn't mention in this paper, it still very much applies and will help you out.

Mike Walsh
  • 18,278
  • 6
  • 50
  • 74
4

Just a side note: I have heard from a few presentations that average disk queue length can be a bit skewed if you are looking at a virtual machine. With the different cache points between that VM to the physical disk, it can give you different readings. Just something to be cautious of when checking it.

When I hear an application is receiving timeout errors, the first question I ask the developers or application admin is have you changed the default behavior of the SQLClient being used. Applications I have come across lately have messed with this setting, usually increased it or ended up making it shorter.

As Mike already mentioned there are a ton of things that can be involved in this one, not just disk issues. Do you have a baseline or previous measures of that disk activity to know if it is actually increased?

With timeout errors I usually go toward checking Profiler or Extended Events for long running queries. You can find examples of this for Extended Events here and here. The quickest check would be Profiler on the specific database the application uses and just filter for duration close to the default timeout setting being used.

You can also just go after the expensive queries in your plan cache as well. I will check that if I can often just to see what is being done.