Deleting a lot of data

April 12, 2018 by Kenneth Fisher

I recently had the task of deleting a bit over a billion rows from a table. Now I could have done just this:

DELETE FROM tablename WHERE createdate >= '1/1/2017'

But I have a few problems here. The table has no index on createdate, potentially causing problems with tempdb (the sort on createdate). Although in this case tempdb is pretty large because of some large batch work done at various times. I’m also going to be deleting > billion rows of ~6 billion which is probably going to fill up the log of the database (which fortunately isn’t in use yet) and end up rolling back my delete anyway. Even if I don’t fill up the log, I’m still going to bloat it pretty badly (autogrowth). And last, and anything but least, this is on a production server. Even if this database was on its own drive (meaning growth of the log can’t cause a problem with any other databases) that tempdb thing (let alone other resource usage) is going to be an issue.

So what do I do?

If I was deleting most of the table there would be several options. For example, dumping just the rows I’m interested in keeping, truncating the table, and putting what I’m keeping back. But since I’m keeping such a large number of rows it would be even worse than just trying to delete the rows I want to get rid of. So since I can’t delete it all, how about I delete a bit of it at a time?

DELETE TOP (2500000) /*2.5 mil*/ FROM tablename WHERE createdate >= '1/1/2017'

I’m going to have to run that a few times though right?

1,000,000,000 / 2,500,000 = 400

The first time I did this I tried using a WHILE loop and @@rowcount.

DELETE TOP (2500000) /*2.5 mil*/ FROM #tablename WHERE createdate >= '1/1/2017'
WHILE @@ROWCOUNT = 0
	DELETE TOP (2500000) /*2.5 mil*/ FROM #tablename WHERE createdate >= '1/1/2017'

Unfortunately, that ran into a slight problem. Since it’s a single batch it counts as a single transaction (as best I can tell) and so I ran into the problems above. In this case, the transaction log ran out of space and all of the deletes had to roll back.

So instead I did this:

DELETE TOP (2500000) /*2.5 mil*/ FROM #tablename WHERE createdate >= '1/1/2017'
GO 401

Well, really I started out doing 50 batches at a time to make sure things went well. Then after doing that I did a quick row count and divided by the 2.5 mil to see how many times I still needed to run the delete. Last but not least the 2.5 million was an educated guess as to the size of the delete that would run fastest. Too large a value and each delete takes too long, too small and you have to run too many deletes. I was also very lucky here that the database is in SIMPLE recovery. Otherwise, I would have had to do a number of batches, log backup, number of batches, log backup etc.

Regardless, it takes a while to delete a billion rows so have fun!

Category: Microsoft SQL Server, Problem Resolution, SQLServerPedia Syndication, T-SQL | Tags: best practices, microsoft sql server, problem resolution, T-SQL

5 thoughts on “Deleting a lot of data”

Deleting Lots Of Data – Curated SQL says:

April 13, 2018 at 7:10 AM

[…] Kenneth Fisher wants to delete a lot of rows: […]

John Mitchell says:

April 16, 2018 at 3:51 AM

Learned something there. I’d never have thought that a WHILE loop would execute everything in a single batch. When you think about it, I suppose it makes sense.

John

Kenneth Fisher says:

April 16, 2018 at 8:16 AM

Yep. I had the same feeling the first time it happened to me 😀

Reply

Michal Robert Piatek says:

April 18, 2018 at 6:46 AM

I originally wanted to post that using the set rowcount method (with while and commit) is superior to the delete top(n) method due to sorting but then I found when looking at the execution plans that both use the expensive sort operator. I had a similar task recently and found that what REALLY speeds things up is deleting all indexes except the PK and re-create them afterwards.
Great article tough, very useful. GO N

Kenneth Fisher says:

April 18, 2018 at 9:22 AM

Thanks 🙂 And the index idea is a good one for certain tasks, but in my case I was deleting say 1 billion rows of 6. Getting rid of the indexes & re-creating might actually have increased my time.

Reply

SQL Studies

Deleting a lot of data

5 thoughts on “Deleting a lot of data”

Leave a comment Cancel reply

Follow me via RSS

Follow me via Email

Follow me on Twitter

Archives

Categories

Meta

SQL Studies

Deleting a lot of data

Share this:

5 thoughts on “Deleting a lot of data”

Leave a comment Cancel reply

Follow me via RSS

Follow me via Email

Follow me on Twitter

Archives

Categories

Meta