You can’t use aggregate/windowing functions with CROSS APPLY

11

October 6, 2014 by Kenneth Fisher

One of my favorite features of CROSS APPLY is the ability to use it instead of a calculated variable. Well recently I was working on performance tuning a rather annoying query (which I will blog about in more detail later) and one of the steps I took was an attempt to combine CROSS APPLY and a windowing function. As you can tell from the title it didn’t work.

-- Setup code 
CREATE TABLE CrossApplyWindow ( 
Code char(3), 
RegionId char(3) 
) 
GO 
 
INSERT INTO CrossApplyWindow VALUES  
('abc','123'), ('def','123'), ('ghi','123'), 
('abc','456'), ('ghi','456'), 
('abc','789'), ('def','789'), ('ghi','789'), ('jkl','789') 
GO 

The initial query was pulling a distinct count of codes by regionid.

SELECT RegionId, COUNT(DISTINCT Code) AS Calc 
FROM CrossApplyWindow 
GROUP BY RegionId; 
GO

Simple enough but in this particular case I wanted to do it using a windowing function. Unfortunately COUNT DISTINCT doesn’t work with windowing functions. (Side note: it would be nice if you voted for the connect item in the link.) So after getting some help I ended up using DENSE RANK() and MAX(). Interestingly enough in the case of my query it was much more efficient. My example here is a simplified version of the query, so it may not show any performance improvement. I haven’t actually checked.

SELECT  
    RegionId,  
    Calc = MAX(dr)  
FROM 
( 
    SELECT   
        RegionId, 
        dr = DENSE_RANK() OVER  
                (PARTITION BY RegionId ORDER BY Code) 
    FROM CrossApplyWindow 
) AS X 
GROUP BY RegionId; 
GO

So far so good. As part of my tuning effort I tried to get rid of the subquery by using CROSS APPLY.

SELECT RegionId, MAX(x.dr) AS Calc 
FROM CrossApplyWindow 
CROSS APPLY (SELECT DENSE_RANK() OVER  
    (ORDER BY Code) AS dr) x 
GROUP BY RegionId; 
GO

It certainly looks better right? Well here is the output from the previous query.

CrossApplyxWindowing1

And here is the output from the CROSS APPLY query.

CrossApplyxWindowing2

That doesn’t look right does it? Calc is only showing up as 1 for each region when obviously from the data and the previous query we can see that these results aren’t correct. The best I can tell is that CROSS APPLY only receives the row it is associated with. So no windowing functions (or aggregate functions for that matter) will work with it. Here is the best proof I could come up with.

SELECT CrossApplyWindow.*, x.* 
FROM CrossApplyWindow 
CROSS APPLY (SELECT Code, RegionId) x; 
GO

CrossApplyxWindowing3

11 thoughts on “You can’t use aggregate/windowing functions with CROSS APPLY

  1. Benson says:

    Hey Fisher, I could have approached this way with window functions:
    SELECT RegionId, MAX(x.dr) AS Calc
    FROM CrossApplyWindow ca
    CROSS APPLY (SELECT DENSE_RANK() OVER (ORDER BY Code ) AS dr
    FROM CrossApplyWindow caw
    WHERE ca.regionid = caw.regionid
    ) x
    GROUP BY RegionId;
    GO

  2. StevenFox says:

    You are correct that cross apply is effectively called per row, in fact, that was the original intention (to facilitate joining to a table valued function)

    Benson’s cross apply certainly works, but using a cross apply here will result in a better plan than an inner join because it limits the set before the join happens.

    Quick question though,
    Why couldn’t you do the following:

    SELECT regionId,COUNT(code)
    FROM CrossApplyWindow
    GROUP BY RegionId, code

    That should effectively give you a count distinct too.

    • I had realized that the intent of the CROSS APPLY was single row. I was hoping that windowing functions would go outside of that though.

      I don’t see any reason why your idea wouldn’t work. I may have to give that a shot. It does have the limitation that if the query is more complex that those 2 columns it may not work properly.

  3. Qaiser Mehmood says:

    Hey Kenneth and StevenFox,
    This query will not give us the correct result

    SELECT regionId, COUNT(code)
    FROM CrossApplyWindow
    GROUP BY RegionId, code

    Reason: Grouping on above two columns will give us the the result like this
    RegionId Calc
    123 1
    456 1
    789 1
    123 1
    789 1
    123 1
    456 1
    789 1
    789 1
    because GROUP BY groups values as distinct from RegionId and Code. Please let me know if I’m wrong.

    • StevenFox says:

      You are of course 100% correct!

      my intention more like the following:

      select x.regionId, count(x.regionId)
      from
      (
      SELECT regionId,code
      FROM CrossApplyWindow
      GROUP BY RegionId, code
      ) as x
      group by regionId

      in this case the group by ensures there is only 1 region/code combination, counting the region is equivalent to count distinct.

      in terms of query plan, this is the same as the query using count distinct.

  4. eri says:

    ID Rslt1 CO1 Rslt2 C02 ResultTotal
    1484 NULL 1 NULL 1 0
    1486 NULL 1 NULL 1 0
    1487 NULL 1 1 1 1
    1494 NULL 1 NULL 1 0
    1495 NULL 1 NULL 1 0
    1496 NULL 1 NULL 1 0
    1499 NULL 1 NULL 1 0
    1500 NULL 1 NULL 1 0
    1501 NULL 1 NULL 1 0
    1502 NULL 1 NULL 1 0

    Need your help. This is the table. The goal is to calculate ResultTotal column.
    My question is how will I use cross apply in this table where if either of Rslt 1 and Rslt2 columns have 1 then Co1 * rslt1 + Rslt2 *c02. If either Rslt 1 and Rslt2 have NULLs then the product of the two columns is 0.

    • First of all you are better off posting questions like this to the various forums like http://www.sqlservercentral.com or dba.stackexchange.com. There are a lot smarter people than me available and a lot of them. Your more likely to get an answer more quickly.

      My question for you is why are you trying to use CROSS APPLY in this case? I would just use a case statement. Something along the lines of:

      CASE Rslt1 IS NULL or Rslt2 IS NULL THEN 0 ELSE Col1 * Rslt1 + Rslt2 * Col2 END
  5. Benson says:

    Eri, can you try the below if I understand your requirements well.But basically the below cross apply becomes an inner JOIN. If am misunderstanding the requirements let me know then I can quickly fix it…. The case statement might not be the best depending on your dataset as it can be slow for large extracts.

    IF OBJECT_ID(‘TempDB..#Temp’) IS NOT NULL
    DROP TABLE #temp

    CREATE TABLE #temp(ID INT NOT NULL, Rslt1 INT NULL , CO1 INT NOT NULL, Rslt2 INT NULL, CO2 INT NOT NULL, ResultTotal INT NOT NULL)
    GO

    INSERT #temp
    ( ID ,
    rslt1 ,
    CO1 ,
    Rslt2 ,
    CO2 ,
    ResultTotal
    )

    select 1484, NULL, 1, NULL, 1, 0 union all
    select 1486, NULL, 1, NULL, 1, 0 union all
    select 1487, NULL, 1, 1 , 1, 1 union all
    select 1494, NULL, 1, NULL, 1, 0 union all
    select 1495, NULL, 1, NULL, 1, 0 union all
    select 1496, NULL, 1, NULL, 1, 0 union all
    select 1499, NULL, 1, NULL, 1, 0 union all
    select 1500, NULL, 1, NULL, 1, 0 union all
    select 1501, NULL, 1, NULL, 1, 0 union all
    select 1502, NULL, 1, NULL, 1, 0

    SELECT A.ID ,
    A.rslt1 ,
    A.CO1 ,
    A.Rslt2 ,
    A.CO2 ,
    A.ResultTotal,
    T.TestResultTotal
    FROM #temp A
    CROSS APPLY
    (
    SELECT CASE WHEN (Rslt1 = 1) OR (Rslt2 = 1)
    THEN (CO1 * COALESCE(Rslt1,0)) + (COALESCE(rslt2,0) *CO2)
    ELSE 0
    END AS TestResultTotal
    FROM #temp B WHERE B.ID = A.ID
    ) T

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

Enter your email address to follow this blog and receive notifications of new posts by email.

Join 1,674 other followers

Follow me on Twitter

ToadWorld Pro of the Month November 2013
%d bloggers like this: