Comparing the contents of two tables16
May 12, 2016 by Kenneth Fisher
I’ve been working on tuning a query for the last week and I’ve had to make several changes to the query’s logic. I’m always a little nervous when making those types of changes so I try to do some careful testing before I even suggest the changes to the developers. Who presumably at that point will put the changes into the test environment and run their own tests.
One of the tests I like to run is to generate a fair amount of output (10-50k rows) from both the original and the new queries (if possible), store them into temporary (not temp) tables, and then compare the two tables.
There are lots of ways to do compare the contents and structure of two tables but all have pros and cons.
Purchased tools (Sql Delta, Redgate, Idera etc)
- Nice GUI.
- Heavily tested so you know the result is correct.
- Costs money.
- Some companies don’t like you to install extra software on your machine.
- Typically these tools only compare two databases. You’d have to dump each table into a different database. Not that this is a big deal and there certainly may be ways to get the tools to compare two tables in the same database, I just don’t know it.
Roll your own
Ie write one T-SQL query to compare the structure of the two tables (this one will always be the same) and another to compare the data (new each time).
- You get practice in writing T-SQL (yes, I’m stretching there.)
- Can take quite a bit of time to write. (Don’t forget to check for NULLs separately.)
- Mistakes happen.
- Performance may not be spectacular.
This one is new to me and is actually meant for use with replication. But it will work for our purposes.
- Well tested.
- Command line.
- Lots of parameters and options that need to be filled out and reviewed.
Like I said. TABLEDIFF is new to me so I do want to point out a few options I think are important.
- -STRICT : By default there is some leeway on data types. For my purposes I don’t want to allow that.
- -b : Number of bytes to check on large data types. For our purposes here you may not need to check to make sure 1000’s of characters of data is all the same. You may be able to save some time with this one.
- -f : Script out changes to make the files match. There is an optional [filename] parameter to dump the script to a file. This is really helpful for trying to track down why your new script does do the same thing as the old one.
Category: Microsoft SQL Server, Performance, SQLServerPedia Syndication, T-SQL | Tags: microsoft sql server, Performance, tools
16 thoughts on “Comparing the contents of two tables”
Leave a Reply to Mark Bell (@mwbell77) Cancel reply
This site uses Akismet to reduce spam. Learn how your comment data is processed.
Thanks for article, Kenneth.
Could you share some examples of TSQL comparing scripts?
I’ll see what I can come up with in the next day or two and add an example.
Thanks, will waiting new post.
I’ve tried to use TABLEDIFF.exe for comparison. From memory, there were other limitation such as you can only compare a table as whole because the comparison tables need to be almost like-to-like. This is all fine except when you have column which you do not want to compare i.e. timestamp column
The kicker is that TABLEDIFF uses some cursor-type behaviour behind the scene for doing comparison. So, when comparing a very large table it would take hours and hours.
In both cases above, I’ve used EXCEPT clause to exclude columns which cannot be compared (binary, etc) and it performs faster than TABLEDIFF (not running in cursor-type loop).
TABLEDIFF is OK but I’m finding its use cases quite limited due to the restrictions.
Very nice. I never thought of using the except clause. You’d still want to do a separate check to make sure the structure is the same but you are right I can see how that would be much faster. TABLEDIFF did pretty well for me on a 100k table but I haven’t tried larger than that.
I did comparison on tables with tens of millions on both compare tables and it seem to be taking forever. I used SQL profiler and saw the row-by-row check behaviour. It would be a great if you could include this as part of your test in your coming article.
Obviously I’m going to have to do a part 2 🙂
I also use the EXCEPT clause. I found it performs better than using a MERGE statement and NULLs are handled without special coding in straight INNER JOIN statements
Method used for Delete/Update/Insert:
DELETE [tableName] WHERE NOT EXISTS(SELECT 1 FROM #working AS w WHERE w.primaryKey=tableName.primaryKey)
SET t.field1 = s.field1 /*…*/ ,t.field64 = s.fieldN
FROM [tableName] t
INNER JOIN ( SELECT primaryKey, field1, /*…*/ field64 FROM #working
EXCEPT SELECT primaryKey, field1, /*…*/ fieldN FROM [tableName]) AS s
INSERT [tableName] (primaryKey, feidl1, /*…*/ fieldN)
SELECT primaryKey, feidl1, /*…*/ fieldN)
FROM #working w
WHERE NOT EXISTS(SELECT 1 FROM [tableName] WHERE tableName.primaryKey=w.primaryKey)
you can get the lengthy field lists by running something like:
SELECT ‘,t.[‘+name + ‘]=s.[‘+ name +’]’ as updateFields
, ‘,[‘+name + ‘]’ AS insertFields
WHERE id = OBJECT_ID(‘tableName’)
you could also eliminate any unneeded columns
Performance is better than MERGE statement in really large tables. Most of my tables that I’m doing this with are >250K records.
Nice. Yea someone else had mentioned the idea of using EXCEPT. I’m going to be writing an update and I’ll certainly be including that in there.
SQL Server Data Tools also has a “Data Comparison” feature, which I’ve found handy for comparing data:
You might want to take a look at the open-source SqlDiffFramework (https://sqldiffframework.codeplex.com/). As the name says, it is a framework so you have to provide a query against each of two data sources. That could be as simple as “select * from x”. And it provides some historical support, too, so you can even compare data over time in the same table. That is, you can save a query result to a CSV file, and then compare live data sometime later against a saved snapshot.
Note that SqlDiffFramework has moved to https://github.com/msorens/SqlDiffFramework
Thanks for the new link, but in new repo location on github I cant find installer .exe file and can’t understand how to install github version
Thanks to your feedback, I have updated the “readme” file so it points you to where to find the installer. Thanks for pointing that out!
I have developed a C# application that will compare the data between two tables. The software is currently in beta version and is available to download for free to evaluate. The performance is much better than TABLEDIFF. It is not using a cursor, but in fact it is using the EXCEPT and the INTERCEPT clauses behind the scenes. It has a nice friendly user interface and it will also compare database objects. You can download this software from here http://sqlc.azurewebsites.net/
[…] did a post the other day on comparing two tables. It was meant as a general overview but I got a fair number of comments and requests for more […]