Except and Intersect

7

January 19, 2015 by Kenneth Fisher

EXCEPT and INTERSECT are two uncommon commands. Not that they do anything odd but they aren’t exactly well known in any detail or used very often. I’ve been using them myself a bit recently and thought I would explore.

Per BOL:

EXCEPT returns any distinct values from the left query that are not also found on the right query.
INTERSECT returns any distinct values that are returned by both the query on the left and right sides of the INTERSECT operand.

These commands have a very similar feel to UNION and UNION ALL. They all join two queries together and require that the queries have the same number of columns in the same order with similar data types. Also the column names for the output will be taken from the first query.

I frequently see EXCEPT and INTERSECT compared to LEFT OUTER JOIN and JOIN and to be fair they are somewhat similar. There are a couple of important differences though.

Aside from the conditions above (number of columns and datatypes) EXCEPT and INTERSECT do not require a JOIN condition. They also return a distinct list of values, and in the case of the LEFT OUTER JOIN a WHERE clause will be needed.

Another option, and much closer to the actual definitions, is to use EXISTS and NOT EXISTS. In fact if you use a DISTINCT (or GROUP BY) and EXISTS or NOT EXISTS you have almost exactly the same thing. (For example you only have access to the columns in table1 and not those in table2.)

-- Setup code
CREATE TABLE Except1 (colField varchar(50))
CREATE TABLE Except2 (colField varchar(50))

INSERT INTO Except1 VALUES ('A'),('A'),('B'),('D'),('E'),('E'),('G'),('H'),('J'),('K')
INSERT INTO Except2 VALUES ('B'),('C'),('E'),('F'),('H'),('I'),('K'),('L')

SET STATISTICS TIME ON
SET NOCOUNT ON

Except

PRINT '
Query 1'
SELECT colField FROM Except1
EXCEPT
SELECT colField FROM Except2

PRINT '
Query 2'
SELECT DISTINCT Except1.colField
FROM Except1
LEFT OUTER JOIN Except2
	ON Except1.colField= Except2.colField
WHERE Except2.colField IS NULL

PRINT '
Query 3'
SELECT DISTINCT Except1.colField
FROM Except1
WHERE NOT EXISTS (SELECT 1 FROM Except2
				WHERE Except1.colField= Except2.colField)

Except1

Except2

Except3

Query 1

 SQL Server Execution Times:
   CPU time = 0 ms,  elapsed time = 0 ms.

 SQL Server Execution Times:
   CPU time = 0 ms,  elapsed time = 24 ms.

Query 2

 SQL Server Execution Times:
   CPU time = 0 ms,  elapsed time = 0 ms.

 SQL Server Execution Times:
   CPU time = 0 ms,  elapsed time = 236 ms.

Query 3

 SQL Server Execution Times:
   CPU time = 0 ms,  elapsed time = 0 ms.

 SQL Server Execution Times:
   CPU time = 0 ms,  elapsed time = 62 ms.

Three different plans all different but all show about the same percentage. The EXCEPT version is quite a bit faster than the LEFT OUTER JOIN and slightly faster than NOT EXISTS.

Intersect

PRINT '
Query 1'
SELECT colField FROM Except1
INTERSECT
SELECT colField FROM Except2

PRINT '
Query 2'
SELECT DISTINCT Except1.colField
FROM Except1
JOIN Except2
	ON Except1.colField= Except2.colField

PRINT '
Query 3'
SELECT DISTINCT Except1.colField
FROM Except1
WHERE EXISTS (SELECT 1 FROM Except2
				WHERE Except1.colField= Except2.colField)

Intersect1

Intersect2

Intersect3

Query 1

 SQL Server Execution Times:
   CPU time = 0 ms,  elapsed time = 0 ms.

 SQL Server Execution Times:
   CPU time = 16 ms,  elapsed time = 10 ms.

Query 2

 SQL Server Execution Times:
   CPU time = 0 ms,  elapsed time = 0 ms.

 SQL Server Execution Times:
   CPU time = 0 ms,  elapsed time = 120 ms.

Query 3

 SQL Server Execution Times:
   CPU time = 0 ms,  elapsed time = 0 ms.

 SQL Server Execution Times:
   CPU time = 0 ms,  elapsed time = 0 ms.

So this time the query plan for the JOIN shows the highest percentage. It also has a significantly longer execution time. The other two are pretty similar.
 

Summary

I should point out that with this few records the times are pretty suspect. Not to mention that you should take the percentages within a query plan with a grain of salt as they are frequently unreliable. Another issue is that I’m only dealing with a single column. If you are dealing with multiple columns the ON clauses (and the WHERE clause in the EXISTS/NOT EXISTS) are going to get more and more complex and probably take up more and more overhead. These results, however, do seem to point out that if they will work for your output EXCEPT and INTERSECT are worth taking a looking at.

Here is a really nice example of using EXCEPT to find differences between two identical tables

7 thoughts on “Except and Intersect

  1. Joe Celko says:

    While T-SQL is not up to spec yet, the ANSI/ISO Standard SQL has INTERSECT [ALL] and EXCEPT [ALL] options that parallel the UNION [ALL] construct.

  2. Thanks! I look forward to T-SQL getting those added in a future version.

  3. Lok says:

    Well explained.

  4. mjswart says:

    Excellent post. I think one of the reasons that UNION is more common than the other set operators is that INTERSECT and EXCEPT can be thought of as a filter applied to a set and can be rewritten using the WHERE clause. Not true for UNION.

  5. […] the comments I did have it pointed out that using EXCEPT simplifies things quite a bit (and it does) and I was asked for an example. Even with EXCEPT […]

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

Enter your email address to follow this blog and receive notifications of new posts by email.

Join 1,664 other followers

Follow me on Twitter

ToadWorld Pro of the Month November 2013
%d bloggers like this: