Comparing the contents of two tables

16

May 12, 2016 by Kenneth Fisher

I’ve been working on tuning a query for the last week and I’ve had to make several changes to the query’s logic. I’m always a little nervous when making those types of changes so I try to do some careful testing before I even suggest the changes to the developers. Who presumably at that point will put the changes into the test environment and run their own tests.

One of the tests I like to run is to generate a fair amount of output (10-50k rows) from both the original and the new queries (if possible), store them into temporary (not temp) tables, and then compare the two tables.

There are lots of ways to do compare the contents and structure of two tables but all have pros and cons.

Purchased tools (Sql Delta, Redgate, Idera etc)

  • Benefits
    • Nice GUI.
    • Heavily tested so you know the result is correct.
  • Flaws
    • Costs money.
    • Some companies don’t like you to install extra software on your machine.
    • Typically these tools only compare two databases. You’d have to dump each table into a different database. Not that this is a big deal and there certainly may be ways to get the tools to compare two tables in the same database, I just don’t know it.

 
Roll your own
Ie write one T-SQL query to compare the structure of the two tables (this one will always be the same) and another to compare the data (new each time).

  • Benefits
    • Free.
    • You get practice in writing T-SQL (yes, I’m stretching there.)
  • Flaws
    • Can take quite a bit of time to write. (Don’t forget to check for NULLs separately.)
    • Mistakes happen.
    • Performance may not be spectacular.

 
TABLEDIFF.exe
This one is new to me and is actually meant for use with replication. But it will work for our purposes.

  • Benefits
    • Free.
    • Well tested.
  • Flaws
    • Command line.
    • Lots of parameters and options that need to be filled out and reviewed.
  •  
    Like I said. TABLEDIFF is new to me so I do want to point out a few options I think are important.

  • -STRICT : By default there is some leeway on data types. For my purposes I don’t want to allow that.
  • -b : Number of bytes to check on large data types. For our purposes here you may not need to check to make sure 1000’s of characters of data is all the same. You may be able to save some time with this one.
  • -f : Script out changes to make the files match. There is an optional [filename] parameter to dump the script to a file. This is really helpful for trying to track down why your new script does do the same thing as the old one.

16 thoughts on “Comparing the contents of two tables

  1. Konstantin says:

    Thanks for article, Kenneth.

    Could you share some examples of TSQL comparing scripts?

  2. Simon Liew says:

    I’ve tried to use TABLEDIFF.exe for comparison. From memory, there were other limitation such as you can only compare a table as whole because the comparison tables need to be almost like-to-like. This is all fine except when you have column which you do not want to compare i.e. timestamp column
    The kicker is that TABLEDIFF uses some cursor-type behaviour behind the scene for doing comparison. So, when comparing a very large table it would take hours and hours.
    In both cases above, I’ve used EXCEPT clause to exclude columns which cannot be compared (binary, etc) and it performs faster than TABLEDIFF (not running in cursor-type loop).
    TABLEDIFF is OK but I’m finding its use cases quite limited due to the restrictions.

    • Very nice. I never thought of using the except clause. You’d still want to do a separate check to make sure the structure is the same but you are right I can see how that would be much faster. TABLEDIFF did pretty well for me on a 100k table but I haven’t tried larger than that.

      • Simon Liew says:

        I did comparison on tables with tens of millions on both compare tables and it seem to be taking forever. I used SQL profiler and saw the row-by-row check behaviour. It would be a great if you could include this as part of your test in your coming article.

    • Daxzeal says:

      I also use the EXCEPT clause. I found it performs better than using a MERGE statement and NULLs are handled without special coding in straight INNER JOIN statements

      Method used for Delete/Update/Insert:

      DELETE [tableName] WHERE NOT EXISTS(SELECT 1 FROM #working AS w WHERE w.primaryKey=tableName.primaryKey)

      UPDATE T
      SET t.field1 = s.field1 /*…*/ ,t.field64 = s.fieldN
      FROM [tableName] t
      INNER JOIN ( SELECT primaryKey, field1, /*…*/ field64 FROM #working
      EXCEPT SELECT primaryKey, field1, /*…*/ fieldN FROM [tableName]) AS s
      ON s.primaryKey=t.primaryKey

      INSERT [tableName] (primaryKey, feidl1, /*…*/ fieldN)
      SELECT primaryKey, feidl1, /*…*/ fieldN)
      FROM #working w
      WHERE NOT EXISTS(SELECT 1 FROM [tableName] WHERE tableName.primaryKey=w.primaryKey)

      you can get the lengthy field lists by running something like:
      SELECT ‘,t.[‘+name + ‘]=s.[‘+ name +’]’ as updateFields
      , ‘,[‘+name + ‘]’ AS insertFields
      FROM syscolumns
      WHERE id = OBJECT_ID(‘tableName’)
      you could also eliminate any unneeded columns

      Performance is better than MERGE statement in really large tables. Most of my tables that I’m doing this with are >250K records.

  3. SQL Server Data Tools also has a “Data Comparison” feature, which I’ve found handy for comparing data:
    https://blogs.msdn.microsoft.com/ssdt/2013/06/24/announcing-sql-server-data-tools-june-2013/

  4. msorens says:

    You might want to take a look at the open-source SqlDiffFramework (https://sqldiffframework.codeplex.com/). As the name says, it is a framework so you have to provide a query against each of two data sources. That could be as simple as “select * from x”. And it provides some historical support, too, so you can even compare data over time in the same table. That is, you can save a query result to a CSV file, and then compare live data sometime later against a saved snapshot.

  5. davros30 says:

    I have developed a C# application that will compare the data between two tables. The software is currently in beta version and is available to download for free to evaluate. The performance is much better than TABLEDIFF. It is not using a cursor, but in fact it is using the EXCEPT and the INTERCEPT clauses behind the scenes. It has a nice friendly user interface and it will also compare database objects. You can download this software from here http://sqlc.azurewebsites.net/

  6. […] did a post the other day on comparing two tables. It was meant as a general overview but I got a fair number of comments and requests for more […]

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

Enter your email address to follow this blog and receive notifications of new posts by email.

Join 2,134 other followers

Follow me on Twitter

ToadWorld Pro of the Month November 2013
%d bloggers like this: