Disclaimer: I am a developer. Please be nice to me. I am not the developer that is responsible for what follows. I try to be one of the good ones.
I have inherited a support ticket that involves a specific process causing a block on the client's SQL Server 2008R2 installation. I can trigger the block at any time on the client's server, but cannot reproduce it anywhere else. We even went so far as to create a virtual server with exactly the same hardware stats as the client's server, restored the exact same database to the exact same SQL Server setup, but no dice--can't replicate it. The process itself is ugly: a stored procedure is called, which then executes several other stored procedures, all using named transactions, some nested in cursors. The process follows this pseudocode:
sp_Outermost (named transaction Trans_Outermost)
sp_Nested1 (Trans_Nested1)
sp_Nested2 (Trans_Nested2)
sp_Nested3 (Trans_Nested3)
sp_Nested3_1 (Trans_Nested3_1)
sp_Nested3_1_1 (Trans_Nested3_1_1)
sp_Nested3_1_1_1 (Trans_Nested3_1_1_1)
sp_Nested3_1_1_1_1 (Trans_Nested3_1_1_1_1)
Sorry...not sure how else to describe it.
There is no TRY-CATCH logic in any of the stored procedures, though there is some "custom" error handling that involves GOTOs and setting an "error number" (more on that later).
When I look at the process in Activity Monitor, the Task State is RUNNING, Command is SELECT, and the Wait Type is ASYNC_NETWORK_IO.
If I run DBCC OPENTRAN or look at sys.dm_tran_session_transactions and sys.dm_tran_active_transactions, it lists the outermost transaction (Trans_Outermost) as being the open transaction. However, when I run a query against sys.dm_exec_connections and sys.dm_exec_sessions, I am informed that the query being executed is actually sp_Nested3_1_1_1. This is always the case. Further, running a query ganked from this answer, I see that the statement being waited on is always this:
SET @ErrorNum = 85656
This @ErrorNum variable is declared and used in every one of these stored procedures. It strikes me as very, very odd that a simple SET statement could cause so much trouble, but I don't believe in coincidence.
I commented out every usage of @ErrorNum in sp_Nested3_1_1_1 to see if that made a difference and, well, it kind of did. There's a stored proc that writes to an audit log table that is called from every other stored procedure. Now, the error statement comes from that procedure, but still involves @ErrorNum.
SET @ErrorNum = 85026
So, my question is, how can I figure out what the root cause of this blocking is? If a local variable with the same name is declared and used in nested locations on a server with inadequate hardware, could that cause a problem? Where else can I look here?
I just figured out that these procedures use RAISERROR WITH SETERROR with a set of custom message IDs, all more than 85000. Not sure if this matters, but it's where I'm googling now.
I commented out some code in sp_Nested3_1_1_1 and sp_Nested3_1_1_1_1, specifically surrounding this @ErrorNum business, and now it's telling me that what appears to be a perfectly legitimate line of code in sp_Nested3_1_1 is the problem.
SELECT CASE WHEN @Attached_ID = -1 THEN
SCOPE_IDENTITY()
ELSE
@Attached_ID
END AS Attached_ID
This seems completely arbitrary to me and makes me wonder if it has to do with the poor hardware, and the fact that they're running two other enterprise DBs on it in addition to ours.
I'm using the following queries as well as Activity Monitor to identify when/where locks occur:
SELECT t.text,
QUOTENAME(OBJECT_SCHEMA_NAME(t.objectid, t.dbid)) + '.'
+ QUOTENAME(OBJECT_NAME(t.objectid, t.dbid)) proc_name,
c.connect_time,
s.last_request_start_time,
s.last_request_end_time,
s.status
FROM sys.dm_exec_connections c
JOIN sys.dm_exec_sessions s
ON c.session_id = s.session_id
CROSS APPLY sys.dm_exec_sql_text(c.most_recent_sql_handle) t
WHERE c.session_id = 61;--your blocking spid
As well as the first query from this answer.