Negative session_ids

2

January 27, 2014 by Kenneth Fisher

I probably had the most fun all week when a query I was running came up blocked. Sounds strange right? Well the blocking_session_id was a negative 2 (-2)! I’ve never seen anything like it before. Once I had resolved my problem (see below). I started doing some research on negative session_ids in general.

TL;DR: If you are getting blocked by a -2 session id the code to fix it is at the bottom.

To start with there are the BOL entries for sys.dm_exec_requests

blocking_session_id
ID of the session that is blocking the request. If this column is NULL, the request is not blocked, or the session information of the blocking session is not available (or cannot be identified).
-2 = The blocking resource is owned by an orphaned distributed transaction.
-3 = The blocking resource is owned by a deferred recovery transaction.
-4 = Session ID of the blocking latch owner could not be determined at this time because of internal latch state transitions.

And sys.dm_tran_locks

request_session_id
Session ID that currently owns this request. The owning session ID can change for distributed and bound transactions. A value of -2 indicates that the request belongs to an orphaned distributed transaction. A value of -3 indicates that the request belongs to a deferred recovery transaction, such as, a transaction for which a rollback has been deferred at recovery because the rollback could not be completed successfully.

Note that between the two of them we see possible negative values of -2, -3 and -4.

I’m going to go in reverse order since that matches how difficult it was to find information on each (-4 was by far the hardest).

-4 : Session ID of the blocking latch owner could not be determined at this time because of internal latch state transitions.
I found a couple of links on this. Here and Here. Both links had some interesting information and in both cases it appeared to be an issue with tempdb. The second in particular had a block of information from Microsoft.

This is directly from Microsoft:
Troubleshooting contention in DDL operations
Evaluate your application and query plans and see if you can minimize the creation of temporary tables. To do this, monitor the perfmon counters Temp Tables Creation Rate and Temp Tables For Destruction. You can also run SQL Profiler to correlate the values of these counters with the currently running queries. This will help you identify the queries that are causing the contention in system catalog. This might occur, for example, if a temporary object is being created inside a loop or a stored procedure.
Verify if temp objects (temp tables and variables) are being cached. SQL2005 caches Temp objects only when the following conditions are satisfied:

  • Named constraints are not created.
  • Data Definition Language (DDL) statements that affect the table are not run after the temp table has been created, such as the CREATE INDEX or CREATE STATISTICS statements.
  • Temp object is not created by using dynamic SQL, such as: sp_executesql N’create table #t(a int)’.
  • Temp object is created inside another object, such as a stored procedure, trigger, and user-defined function; or is the return table of a user-defined, table-valued function.

-3 : The blocking resource is owned by a deferred recovery transaction.
Really the only good information I could find was a post by Paul Randal and a BOL entry. Of course given the sources I consider both of these pretty definitive. Both state that this is a problem with an uncommitted transaction that was trying to be rolled back when a database is being brought online. From what I understood of Paul’s post the only way to recover from this was to restore the database from backup.

-2 : The blocking resource is owned by an orphaned distributed transaction.
This is the particular problem I ran into. I did some research and didn’t come up with anything quickly so I posted on dba.stackexchange.com and between Martin Smith and Thomas Stringer they pointed me in the right direction. I found this link on www.sqlservercentral.com and it had the solution I ended up using. (Getting the req_transactionUOW from syslockinfo and using that value with KILL.) The problem here is again a transaction that is trying to be rolled back. In this case it’s a distributed transaction that got lost by MSDTC (Microsoft Distributed Transaction Coordinator). A distributed transaction is one that is taking place in multiple databases, frequently on multiple servers. If a problem occurs and the correct state of the transaction can’t be determined you run into this problem.

Edit: Since -2s are the most common problem I decided I should probably pull the fix out of the links above and put an updated version of them here. First use sys.dm_tran_locks to find the request_owner_guid. This is also known as the UOW, the Unit of Work ID of a distributed transaction. The KILL command can take this is a parameter as well as a session_id. So once we have it we can actually KILL the connection.

SELECT request_owner_guid
FROM sys.dm_tran_locks 
WHERE request_session_id = -2

Take each row from the result and use it to create a KILL statement.

KILL 'D21D8DAB-C75C-411A-879B-FFDE10E3F397'

There you go. Negative session_ids. Hope you had as much fun with them as I did. I do want to point out that these are not always a problem. You will see the occasional negative session id that isn’t an error. Only kill them if you really need to. (Like any other session.)

2 thoughts on “Negative session_ids

  1. Raju says:

    Hi Ken,

    Awesome article, seen the -ve session_ids before, but never had a chance to look into the internals of it. Thank you for sharing.

    I have a question related to dm_xe_session_targets and dm_xe_sessions, can you please share some knowledge(digging into DL). Oh also how long does the dmv hold the data before it deletes or overwrites the information? Can the threshold value be altered?

    Thank you
    Raju

    • Thanks. I was a bit worried I didn’t include enough detail but since the information I had in the links was so good I decided not to duplicate. I don’t have a lot of experience with the XE stuff so I’ll certainly give it a shot. Can’t say when though.

      The DMV (DMObject) question I can answer fairly easily. In general if your instance is restarted then they are cleared. Some specific cases can clear out certain data more quickly. The TRAN and certain EXEC DMOs for example are all current data only. The DMOs that give information into the procedure cache (sys.dm_exec_query_stats, sys.dm_exec_procedure_stats etc) lose data as the cache itself clears out. There is no way to my knowledge to change how this is handled. If you want to clear the data more quickly you can of course do something like clear the procedure cache using DBCC CLEARPROCACHE to clear out that data but I don’t recommend it. In order to keep the data longer you will need to set up some type of data pull to separate storage tables and manage them yourself.

      Hope that helps & thanks for the comment!

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

Enter your email address to follow this blog and receive notifications of new posts by email.

Join 1,646 other followers

Follow me on Twitter

ToadWorld Pro of the Month November 2013
%d bloggers like this: