Duplicate rows in a table

February 4, 2013 by Kenneth Fisher

I recently got the following email:

“I just keyed in two rows into a table with all identical column values. How do I get rid of just one of them?”

Now even way back when in the dark ages when I started as a database developer I ran into this problem occasionally. Every now and again I would create a table and forget to put a primary key on it. (I’m fairly sure I’m the only one who has ever done this.) As a result I inevitably ended up with some rows that were exact duplicates and I needed to get rid of just one of them. So the first thing I tried was this:

 DELETE TOP 1 FROM MyHeap
WHERE WhatShouldHaveBeenMyKeyValue = 'WhatWasIThinking'

And it failed. Looking in BOL (written on stone tablets at the time) I found out that TOP wasn’t available in DELETE, or UPDATE for that matter. Remember this was back in SQL 2000. I honestly don’t remember if this could be done in FoxPro (my first database language). I developed a couple of tricks to manage the problem, and they worked fairly well, but I had always wished for DELETE TOP #. Well my wish finally came true, as of SQL 2005 TOP is available for use with DELETE, UPDATE and even INSERT. Unfortunately I only learned this recently while studying for my 2008 Development MCTS. Here is the definition in 2005-2008 R2. I haven’t looked at 2012 yet but I imagine that it is the same or similar.

TOP (expression) [PERCENT]
     [ WITH TIES ]

So back to my developer’s problem, I could now give him the simple solution.

 DELETE TOP 1 FROM MyHeap
WHERE WhatShouldHaveBeenMyKeyValue = 'WhatWasIThinking'

and it worked perfectly. In case you were wondering the original solutions I came up with was this:

SELECT DISTINCT * INTO #temp
FROM MyHeap
WHERE WhatShouldHaveBeenMyKeyValue = 'WhatWasIThinking'

DELETE FROM MyHeap
WHERE WhatShouldHaveBeenMyKeyValue = 'WhatWasIThinking'

INSERT INTO MyHeap
SELECT * FROM #temp

With complications for identity columns of course. So you can see how much easier the TOP keyword makes things.

FYI You can also use the TOP keyword as a way to limit a transaction size. I’m using 5000 rows as a transactions size for the following examples.

DELETE TOP (5000) SET ColumnA = 'NewValue'
	WHERE ColumnA = 'OldValue'
WHILE @@ROWCOUNT = 5000
BEGIN
	WAITFOR DELAY ’00:00:02’ – Wait 2 seconds between transactions
	DELETE TOP (5000) SET ColumnA = 'NewValue'
		WHERE ColumnA = 'OldValue'
END

UPDATE TOP (5000) SET ColumnA = 'NewValue'
	WHERE ColumnA = 'OldValue'
WHILE @@ROWCOUNT = 5000
BEGIN
	WAITFOR DELAY '00:00:02' – Wait 2 seconds between transactions
	UPDATE TOP (5000) SET ColumnA = 'NewValue'
		WHERE ColumnA = 'OldValue'
END

Note: Scripts corrected based on comments below.

I haven’t had a chance to do any extensive testing on this yet but the theory is sound and I didn’t run into any problems the couple of times I’ve tried it.

Category: Microsoft SQL Server, SQLServerPedia Syndication, T-SQL | Tags: code language, language sql, microsoft sql server, sql statements, T-SQL, TOP

19 thoughts on “Duplicate rows in a table”

Laurie Alvey says:

February 11, 2013 at 4:30 AM

In FoxPro, you can use the following:
DELETE NEXT 1 FROM mytable WHERE RECNO() = pnRec

Kenneth Fisher says:

February 11, 2013 at 6:47 AM

Ahh, thank you. I had forgotten that. It’s amazing what you forget when you don’t use a language for 10 years :).

Reply

Richard Tovsig says:

February 11, 2013 at 5:13 AM

Ok for 1 duplicate record in at table, but if you have multiple sets of duplicate records, you will still need to use the old method, I guess?

BR. Richard

Kenneth Fisher says:

February 11, 2013 at 6:50 AM

I would think that there would be a way to do it using TOP N, but to be honest I can’t think of a way off the top of my head. Not to mention that it may end up being complicated enough that the old method may still be the easier choice.

Reply

Jonathan Roberts says:

February 11, 2013 at 7:06 AM

WHILE @@ROWCOUNT = 5000
BEGIN
UPDATE TOP (5000) SET ColumnA = ‘NewValue’
WHERE ColumnA = ‘OldValue’
WAITFOR DELAY ’00:02’ – Wait 2 seconds between transactions
END

This won’t work as @@ROWCOUNT would have to equal 5000 on entry and I think the WAITFOR DELAY may reset the @@rowcount.
Better like this:
DECLARE @ROWCOUNT int
SET @ROWCOUNT = -1 –Initialise
WHILE @ROWCOUNT 0
BEGIN
UPDATE TOP (5000) SET ColumnA = ‘NewValue’
WHERE ColumnA = ‘OldValue’
SET @ROWCOUNT = @@ROWCOUNT
END

Kenneth Fisher says:

February 11, 2013 at 7:23 AM

Good point on the wait. Also I messed up and put wait for 2 minutes not 2 seconds. I’ll have to fix that in my post.

You’re code looks good, although I think you would have to initialize to 0 for the same reason you gave for my @@rowcount. You could also re-write mine like this.

UPDATE TOP (5000) SET ColumnA = ‘NewValue’
WHERE ColumnA = ‘OldValue’
WHILE @@ROWCOUNT = 5000
BEGIN
WAITFOR DELAY ’00:00:02′ – Wait 2 seconds between transactions
UPDATE TOP (5000) SET ColumnA = ‘NewValue’
WHERE ColumnA = ‘OldValue’
END

At least I don’t think I made any mistakes there.

Reply
- Jonathan Roberts says:
  
  February 11, 2013 at 11:33 AM
  
  The WHILE I wrote was WHILE @ROWCOUNT NOT EQUAL 0
  But the “” less than greater than signs seem to be deleted from the comment probably as it’s html.
  
  Reply

Joshua Patchak says:

February 11, 2013 at 8:31 AM

FYI, in SQL Server (since 2005), there’s a tidy way to do this with common table expressions and the ROW_NUMBER() windowed function using PARTITION BY like so. When you delete from a CTE, it deletes the data from the source table as well:

;WITH cte(rowNumber) AS
(SELECT ROW_NUMBER() OVER (PARTITION BY [List of Natural Key Fields] ORDER BY [List of Order By Fields]) FROM dbo.TableName)

DELETE FROM cte WHERE rowNumber>1

Kenneth Fisher says:

February 11, 2013 at 9:27 AM

Awesome. I think this one is my favorite so far. I’ll have to add this one to my regular scripts. Thanks!

Reply
Jason W says:

February 11, 2013 at 9:28 AM

Aw, I was too slow. 🙂

Reply
- Joshua Patchak says:
  
  February 11, 2013 at 9:29 AM
  
  🙂
  
  Reply
Ricks says:

February 11, 2013 at 10:02 AM

this is the better, I used many times in a cube and works so good… is clear and can make specific partitions…

Reply

Jason W says:

February 11, 2013 at 9:23 AM

SQL Server lets you use ROW_NUMBER() to handle this sort of thing as well:

CREATE TABLE #a (a INT);
–Numbers is a Tally table, with Number values from 1 to n
INSERT INTO #a
( a )
SELECT 1 FROM NUMBERS WHERE Number BETWEEN 1 AND 10;

SELECT * FROM #a;

WITH sort AS
(
SELECT a, ROW_NUMBER() OVER(PARTITION BY a ORDER BY a) AS rid FROM #a
)
DELETE FROM sort WHERE rid > 1;

SELECT * FROM #a;

DROP TABLE #a;

Carla says:

February 11, 2013 at 9:29 AM

I usually just add a new identity column and set it as the primary key. Then I can easily delete the duplicate row.

Kenneth Fisher says:

February 11, 2013 at 9:37 AM

You can, but there are a few gotchas. First you need to make sure that your new identity column is the last column, otherwise any code or view (see here) that is dependent on column position will get messed up. You still run a risk with code that assumes # of columns. Also sometimes you will be working with vendor code where you aren’t allowed to make changes no matter how much sense it makes.

Assuming all of that is good then absolutely add your identity column. If it’s a big table though you may want to add it after hours :).

Reply
- Carla says:
  
  February 11, 2013 at 9:43 AM
  
  You are right. I usually run into this problem in development, It would be another thing entirely to encounter the problem in production.
  
  Reply

Jeff Springer says:

February 21, 2013 at 11:14 AM

Old style, before they had all of the fun commands others have suggested

set rowcount 1
DELETE FROM MyHeap
WHERE WhatShouldHaveBeenMyKeyValue = ‘WhatWasIThinking’

Kenneth Fisher says:

February 21, 2013 at 11:18 AM

Boy I had forgotten that one. I actually didn’t learn that one for quite awhile. I started with the SELECT DISTINCT into a temp table version, but it was years before I learned about SET ROWCOUNT.

Reply

We could be immortals, just not for long when using a duplicate delete! | Andrea Allred presents RoyalSQL says:

January 22, 2015 at 11:50 AM

[…] Fisher (@sqlstudent144) also wrote a blog post about another way you can accomplish this task if you only have a few of them to […]

SQL Studies

Duplicate rows in a table

19 thoughts on “Duplicate rows in a table”

Leave a comment Cancel reply

Follow me via RSS

Follow me via Email

Follow me on Twitter

Archives

Categories

Meta

SQL Studies

Duplicate rows in a table

Share this:

19 thoughts on “Duplicate rows in a table”

Leave a comment Cancel reply

Follow me via RSS

Follow me via Email

Follow me on Twitter

Archives

Categories

Meta