UNION vs UNION ALL
7October 16, 2014 by Kenneth Fisher
You might be wondering why I’m going into such a simple subject. Well the way I see it there are four options here.
- You already know the difference, it seems really obvious and you are probably wondering why I’m mentioning it.
- You think you know the difference but it turns out you are wrong (don’t worry, it happens).
- You don’t know the difference and once I’ve pointed it out you will wonder why on earth you never thought of it before.
- You don’t care.
If you don’t care there isn’t much help I can give you. If you already know what I have to say then you don’t need help. That leaves a 50% chance that you will find this interesting. So here goes.
At its simplest the difference is that UNION returns a distinct list of rows and UNION ALL returns all rows.
--Table setup CREATE TABLE UnionTable1 (Id Int) CREATE TABLE UnionTable2 (Id Int) INSERT INTO UnionTable1 VALUES (2), (4), (6), (8), (10), (12) INSERT INTO UnionTable2 VALUES (3), (6), (9), (12)
--Union example SELECT Id AS [UNION] FROM UnionTable1 UNION SELECT Id FROM UnionTable2 SELECT Id AS [UNION ALL] FROM UnionTable1 UNION ALL SELECT Id FROM UnionTable2
Where things get a little bit interesting is how UNION handles generating that distinct list. You will notice that the UNION output is in order while the UNION ALL is not. In order to generate the distinct list from the queries UNION sorts the values. This means an additional sort operator in the execution plan.
For comparison here is the execution plan for UNION ALL.
Notice that the sort operator for the UNION is by far the most expensive part of the whole process.
So what does that mean for you? Unless you actually need to use UNION (Ie you need to get rid of duplicates) then you want to use UNION ALL as it’s the much cheaper and faster option.
There are a couple of exceptions. If you are doing a UNION in an EXISTS clause then SQL knows enough that it doesn’t bother with the sort and the execution times are the same. Also if you are already sorting the output (using an ORDER BY) then most of the cost is already taken care of.
Like I said, this is all fairly simple, and straight forward, but you would be surprised how often people don’t think about it.
Is it guaranteed that UNION will use a distinct sort? couldnt it use say a Hash Match in which case order is not guaranteed?
You know you have a good point. I don’t know for certain but I can find out. I will say that I’ve never done a UNION where the results didn’t come out in that order. Which of course doesn’t mean much.
It looks like you can force it to use a HASH MATCH using the HASH UNION query hint. (See link). I’m still trying to find out if it will ever use it natively though. https://www.simple-talk.com/sql/performance/controlling-execution-plans-with-hints/
Hey Kenneth,
Yeah, it can natively choose the appropriate approach to UNION the information together. It may be a DISTINCT SORT, a HASH, or it can MERGE if it has the right indexes in the right order. And as you noted, you can try to force these behaviors on it if you think you can get better performance.
One point about that article, it’s from the first version of my book and has some errors, specifically where I compare the cost of one plan to the cost of another. That’s just wrong and shouldn’t be done. It’s fixed in the second edition of the book (and yeah, I’m starting a 3rd edition now).
Oh, and nice post too.
[…] commands have a very similar feel to UNION and UNION ALL. They all join two queries together and require that the queries have the same number of columns in […]
[…] a few months or referenced her post and just given my opinions. Ok, so what’s the trick? UNION ALL FYI I prefer UNION ALL over UNION because UNION does a distinct and if you are going to be […]