Finding the worst running query in a stored procedure


September 10, 2015 by Kenneth Fisher

The other day I was asked to tune a stored procedure. Not exactly an uncommon task, but I worked something out in the process. I typically want to start by determining what the slowest part of the SP is. Normally I use sys.dm_exec_query_stats and run a query something like this:

FROM sys.dm_exec_query_stats qs
CROSS APPLY sys.dm_exec_sql_text(qs.sql_handle) st
WHERE st.objectid = object_id('Test.dbo.Sneezy');

There is a bit of a problem with this though. sys.dm_exec_sql_text is slow, particularly when running against a large cache. I have one system where this can take 5-10 minutes to run. So this time I decided to use sys.dm_exec_procedure_stats instead. It’s fast and has an object_id column.

SELECT * FROM sys.dm_exec_procedure_stats 
WHERE object_id = object_id('Test.dbo.Sneezy');

It doesn’t, however, break the code down. It’s just the stats for the procedure as a whole. So for my first pass I grabbed the sql_handle (or plan_handle) out and then ran this query:

SELECT * FROM sys.dm_exec_query_stats 
CROSS APPLY sys.dm_exec_sql_text(qs.sql_handle) st
WHERE sql_handle = 0x0300060016B804056B1D2101FCA4000001000000000000000000000000000000000000000000000000000000

And I’m good to go! Each of the queries in the SP broken down. I still need more information though.

Well to make a long story, well, not quite as long, here is what I ended up with.

SELECT CAST(qp.query_plan AS XML) AS XML_Plan,
            ((CASE WHEN qs.statement_end_offset = -1 THEN DATALENGTH(st.text)
                ELSE qs.statement_end_offset END) - qs.statement_start_offset)/2 + 1)  AS SqlText,
FROM sys.dm_exec_query_stats qs
JOIN sys.dm_exec_procedure_stats ps
	ON qs.sql_handle = ps.sql_handle
CROSS APPLY sys.dm_exec_sql_text(qs.sql_handle) st
CROSS APPLY sys.dm_exec_text_query_plan(qs.plan_handle, 
		statement_start_offset, statement_end_offset) qp
WHERE PS.object_id = object_id('Test.dbo.Sneezy');

I ended up with the query plan and text for each individual query within the SP and the associated stats. For example min/max/total duration/cpu time, number of executions, etc. All of which really helps you pin down the parts of the SP you really need to work on.

14 thoughts on “Finding the worst running query in a stored procedure

  1. […] A T-SQL Hack for Mahalanobis Distance Work of art – the intra query parallelism deadlock graphs Finding the worst running query in a stored procedure The #TSQL2sday #70 roundup Why is my tSQLt test not […]

  2. Commenter with question says:

    Thanks for this! I got the following error when I tried one procedure:

    Msg 6335, Level 16, State 101, Line 1
    XML datatype instance has too many levels of nested nodes. Maximum allowed depth is 128 levels.

    I know this probably is itself an indicator that the procedure is in dire need of improvement, but I’m wondering if you can explain why this error happens and what it is telling me about the design problem in the procedure code.


    • Unfortunately I don’t have a good answer for most of your issue. I can tell you that if you remove the cast statement in the first line that will get rid of your error but it also means you won’t be able to easily see the visual query plan. You will at least be able to see the queries. I’ll see if I can get someone else to look at your comment and see if they have any better answers for you.

      • The XML data type supports up to 128 nest levels, and that’s why you get this error.
        It’s one of the very rare cases where working with sys.dm_exec_text_query_plan doesn’t work.

        I think there is a way to load the textual form to SQL Sentry Llan Explorer, but not sure..

      • In fact, as weird as it may sound, that’s the reason this DMV was introduced in the first place – to allow viewing execution plans that are bigger than the nesting limit in a textual way.
        You can hear about it in Bob Beauchemin ‘s session at around 21:00 in this link:

      • Alex Friedman says:

        Matan, nice reference!

        Just to add another tidbit: if you’re using Adam Machanic’s excellent sp_whoisactive with the @get_plans parameter, it faces the same issue and actually catches error 6335 to rework the xml to still display it somehow (although without the ability to open the plan directly, you still have to save into a file or paste into Plan Explorer).

  3. Ray Herring says:

    Thanks Kenneth.
    I have a similar query I have worked on for weeks and had just about given up on getting the query plan for the individual queries. Your concise query showed me exactly where my problem was.

  4. Alex Friedman says:

    Very useful, thanks!
    In my case I had to do a small adjustment to the join to get the blazing fast performance:

    FROM sys.dm_exec_procedure_stats ps
    INNER LOOP JOIN sys.dm_exec_query_stats qs

    With the order change + the LOOP hint it works pretty much instantly, which really is much better than the long running queries on dm_exec_query_stats I was also using.

    • Alex Friedman says:

      And another small fix, joining also by plan_handle to avoid duplicate rows when there are several plans for the same sql_handle:

      FROM sys.dm_exec_procedure_stats ps
      INNER LOOP JOIN sys.dm_exec_query_stats qs
      ON qs.sql_handle = ps.sql_handle
      AND qs.plan_handle = ps.plan_handle

  5. gserdijn says:

    Great post. Shows why the cached_time in sys.dm_exec_procedure_stats can give the wrong indication of a stable plan. I got a question which puzzles me.

    When I executed the query for a stored procedure which timed out today, the query gave back 7 rows. The problematic part, subplan 8 with a unwanted parallel execution, was missing from the resultset.

    After adding OPTION (MAXDOP 1) to the stored procedure, the query in your blog post resulted in 8 rows.

    Is it possible that something like rebuilding indexers and updating statistics could invalidate the subplan? So that it was missing from the output?

    • There are any number of reasons a plan/subplan might be removed the cache. Updating stats is definitely among them. In fact I’ve read that most of the time when you update stats and get a better plan it isn’t because the stats changed. It’s because the old plan was invalidated and a new one created.

      On a side note remember to be careful with MAXDOP = 1. If you are just doing it to remove CXPACKET waits I wouldn’t. Usually CXPACKET gets a bad rap and very well may not be your problem at all. Take a look at sys.dm_os_waiting_tasks when your query is running to see the individual threads. The CXPACKET waits are the threads that are done and waiting on the rest. What you want to look at is those other threads. What are they waiting on?

      • gserdijn says:

        Thank you for your feedback. It’s always good to learn more. 🙂

        Regarding your side note: In this case the parallel plan used clustered index scans on quite a few tables each containing millions of rows, whereas the ideal plan would involve clustered index seeks.

        Parallelism is something which can be very useful, but for me it is a huge warning sign that either the query is poorly written and/or that indices are missing. It should trigger thorough investigation, and the query in your post will be used quite a lot I think.

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

Enter your email address to follow this blog and receive notifications of new posts by email.

Join 1,655 other followers

Follow me on Twitter

ToadWorld Pro of the Month November 2013
%d bloggers like this: