Comparing the contents of two tables Redux

Leave a comment

June 8, 2016 by Kenneth Fisher

I did a post the other day on comparing two tables. It was meant as a general overview but I got a fair number of comments and requests for more specifics. So here we go.

  • TABLEDIFF.exe

    I had one comment that TABLEDIFF uses REBAR functionality. So let’s try an XE session and see.

    First we set up a test table to compare with a few differences.

    SELECT * INTO Adv_BusinessEntityAddress_Copy 
    FROM AdventureWorks2014.Person.BusinessEntityAddress
    WHERE AddressID NOT BETWEEN 15600 AND 15700
    
    UPDATE Adv_BusinessEntityAddress_Copy 
    SET ModifiedDate = DATEADD(day,50,ModifiedDate)
    WHERE AddressID > 29865

    Set up a basic simple XE event trace that checks only sp_statement_completed and sql_statement_completed events.

    CREATE EVENT SESSION [SQL Statements] ON SERVER 
    ADD EVENT sqlserver.sp_statement_completed(
        ACTION(sqlserver.sql_text)),
    ADD EVENT sqlserver.sql_statement_completed(
        ACTION(sqlserver.sql_text))
    WITH (MAX_MEMORY=4096 KB,EVENT_RETENTION_MODE=ALLOW_SINGLE_EVENT_LOSS,MAX_DISPATCH_LATENCY=30 SECONDS,MAX_EVENT_SIZE=0 KB,MEMORY_PARTITION_MODE=NONE,TRACK_CAUSALITY=OFF,STARTUP_STATE=OFF)
    GO

    Here is the code for the tablediff. I did mention there were a lot of parameters didn’t I?

    tablediff.exe -sourceserver "(local)\sql2014cs" -sourcedatabase AdventureWorks2014 -sourcetable "BusinessEntityAddress" -sourceschema "Person" -destinationserver "(local)\sql2014cs" -destinationdatabase "Test" -destinationtable "Adv_BusinessEntityAddress_Copy"

    CompareTablesPt2

    Honestly I’m not seeing any evidence here that what we are seeing is REBAR. I ran the same test using profiler with the same results. Only 34 rows in the output and it’s a ~20k row table. Now it could be that I’m looking at the wrong events or that with larger tables that it will break it up into chunks. End result I still consider this an option but use with care on larger tables.

  • Roll your own

    Now this can get a bit complicated but it’s easily the most versatile of any comparison methods. You can compare only certain columns by specifying just those in the field list and/or using a WHERE clause only compare certain portions of the table. A single division or country for example.

    In the comments I did have it pointed out that using EXCEPT simplifies things quite a bit (and it does) and I was asked for an example. Even with EXCEPT it’s not overly simple and we still need several queries. Using the same two tables:

    This query checks to make sure the schemas of the two tables are close enough to compare. I’m using INFORMATION_SCHEMA. I know some people don’t like it but for this it’s the easiest and will work just fine.

    SELECT COLUMN_NAME COLLATE SQL_Latin1_General_CP1_CS_AS COLUMN_NAME, 
    	ORDINAL_POSITION, 
    	IS_NULLABLE COLLATE SQL_Latin1_General_CP1_CS_AS IS_NULLABLE, 
    	DATA_TYPE COLLATE SQL_Latin1_General_CP1_CS_AS DATA_TYPE, 
    	CHARACTER_MAXIMUM_LENGTH, CHARACTER_OCTET_LENGTH, NUMERIC_PRECISION, 
    	NUMERIC_PRECISION_RADIX, NUMERIC_SCALE, DATETIME_PRECISION, 
    	CHARACTER_SET_CATALOG COLLATE SQL_Latin1_General_CP1_CS_AS CHARACTER_SET_CATALOG, 
    	CHARACTER_SET_SCHEMA COLLATE SQL_Latin1_General_CP1_CS_AS CHARACTER_SET_SCHEMA, 
    	CHARACTER_SET_NAME COLLATE SQL_Latin1_General_CP1_CS_AS CHARACTER_SET_NAME, 
    	COLLATION_CATALOG COLLATE SQL_Latin1_General_CP1_CS_AS COLLATION_CATALOG, 
    	COLLATION_SCHEMA COLLATE SQL_Latin1_General_CP1_CS_AS COLLATION_SCHEMA, 
    	COLLATION_NAME COLLATE SQL_Latin1_General_CP1_CS_AS COLLATION_NAME
    FROM AdventureWorks2014.INFORMATION_SCHEMA.COLUMNS
    WHERE TABLE_NAME = 'BusinessEntityAddress'
    AND TABLE_SCHEMA = 'Person'
    EXCEPT
    SELECT COLUMN_NAME, ORDINAL_POSITION, IS_NULLABLE, DATA_TYPE, 
    	CHARACTER_MAXIMUM_LENGTH, CHARACTER_OCTET_LENGTH, NUMERIC_PRECISION, 
    	NUMERIC_PRECISION_RADIX, NUMERIC_SCALE, DATETIME_PRECISION, CHARACTER_SET_CATALOG, 
    	CHARACTER_SET_SCHEMA, CHARACTER_SET_NAME, COLLATION_CATALOG, COLLATION_SCHEMA, 
    	COLLATION_NAME
    FROM Test.INFORMATION_SCHEMA.COLUMNS
    WHERE TABLE_NAME = 'Adv_BusinessEntityAddress_Copy'
    AND TABLE_SCHEMA = 'dbo'

    Since that returned no rows we can now start to compare the actual data. First we use EXCEPT to check for rows in the source table that don’t have exact matches in the destination table. This is going to include both the rows missing from the destination table and those that exist but are different. I’m going to dump all of the output into a temp table #FirstPass. Next we go the other direction. All rows from the destination table that aren’t in the source table and unfortunately all of the rows that are different, again. This time I’m going to put the data into a temp table #SecondPass.

    SELECT * INTO #FirstPass FROM AdventureWorks2014.Person.BusinessEntityAddress
    EXCEPT
    SELECT * FROM Adv_BusinessEntityAddress_Copy 
    
    SELECT * INTO #SecondPass FROM Adv_BusinessEntityAddress_Copy 
    EXCEPT
    SELECT * FROM AdventureWorks2014.Person.BusinessEntityAddress

    The problem is that both #FirstPass and #SecondPass have the differences not just the missing rows. So let’s pull those out and while we are at it get the missing rows separated out. To do that we need to know at least one unique key. In this case the primary key of Person.BusinessEntityAddress is BusinessEntityID, AddressID, AddressTypeID.

    SELECT * INTO #SourceOnly
    FROM #FirstPass
    WHERE NOT EXISTS (SELECT 1 FROM #SecondPass
    				WHERE #FirstPass.BusinessEntityID = #SecondPass.BusinessEntityID
    				  AND #FirstPass.AddressID = #SecondPass.AddressID
    				  AND #FirstPass.AddressTypeID = #SecondPass.AddressTypeID)
    
    SELECT * INTO #DestinationOnly
    FROM #SecondPass
    WHERE NOT EXISTS (SELECT 1 FROM #FirstPass
    				WHERE #FirstPass.BusinessEntityID = #SecondPass.BusinessEntityID
    				  AND #FirstPass.AddressID = #SecondPass.AddressID
    				  AND #FirstPass.AddressTypeID = #SecondPass.AddressTypeID)
    
    SELECT * INTO #DifferencesOnly
    FROM #FirstPass
    WHERE EXISTS (SELECT 1 FROM #SecondPass
    				WHERE #FirstPass.BusinessEntityID = #SecondPass.BusinessEntityID
    				  AND #FirstPass.AddressID = #SecondPass.AddressID
    				  AND #FirstPass.AddressTypeID = #SecondPass.AddressTypeID)
    UNION ALL
    SELECT *
    FROM #SecondPass
    WHERE NOT EXISTS (SELECT 1 FROM #FirstPass
    				WHERE #FirstPass.BusinessEntityID = #SecondPass.BusinessEntityID
    				  AND #FirstPass.AddressID = #SecondPass.AddressID
    				  AND #FirstPass.AddressTypeID = #SecondPass.AddressTypeID)

    As you can see while not overly difficult this does require a number of seperate queries. Also I put both versions of the different rows into the #DifferencesOnly temp table. You may or may not want to do that.

    Leave a Reply

    Fill in your details below or click an icon to log in:

    WordPress.com Logo

    You are commenting using your WordPress.com account. Log Out / Change )

    Twitter picture

    You are commenting using your Twitter account. Log Out / Change )

    Facebook photo

    You are commenting using your Facebook account. Log Out / Change )

    Google+ photo

    You are commenting using your Google+ account. Log Out / Change )

    Connecting to %s

    Enter your email address to follow this blog and receive notifications of new posts by email.

    Join 2,134 other followers

    Follow me on Twitter

    ToadWorld Pro of the Month November 2013
    %d bloggers like this: