Deleting a lot of data5
April 12, 2018 by Kenneth Fisher
I recently had the task of deleting a bit over a billion rows from a table. Now I could have done just this:
DELETE FROM tablename WHERE createdate >= '1/1/2017'
But I have a few problems here. The table has no index on createdate, potentially causing problems with tempdb (the sort on createdate). Although in this case tempdb is pretty large because of some large batch work done at various times. I’m also going to be deleting > billion rows of ~6 billion which is probably going to fill up the log of the database (which fortunately isn’t in use yet) and end up rolling back my delete anyway. Even if I don’t fill up the log, I’m still going to bloat it pretty badly (autogrowth). And last, and anything but least, this is on a production server. Even if this database was on its own drive (meaning growth of the log can’t cause a problem with any other databases) that tempdb thing (let alone other resource usage) is going to be an issue.
So what do I do?
If I was deleting most of the table there would be several options. For example, dumping just the rows I’m interested in keeping, truncating the table, and putting what I’m keeping back. But since I’m keeping such a large number of rows it would be even worse than just trying to delete the rows I want to get rid of. So since I can’t delete it all, how about I delete a bit of it at a time?
DELETE TOP (2500000) /*2.5 mil*/ FROM tablename WHERE createdate >= '1/1/2017'
I’m going to have to run that a few times though right?
1,000,000,000 / 2,500,000 = 400
The first time I did this I tried using a WHILE loop and @@rowcount.
DELETE TOP (2500000) /*2.5 mil*/ FROM #tablename WHERE createdate >= '1/1/2017' WHILE @@ROWCOUNT = 0 DELETE TOP (2500000) /*2.5 mil*/ FROM #tablename WHERE createdate >= '1/1/2017'
Unfortunately, that ran into a slight problem. Since it’s a single batch it counts as a single transaction (as best I can tell) and so I ran into the problems above. In this case, the transaction log ran out of space and all of the deletes had to roll back.
So instead I did this:
DELETE TOP (2500000) /*2.5 mil*/ FROM #tablename WHERE createdate >= '1/1/2017' GO 401
Well, really I started out doing 50 batches at a time to make sure things went well. Then after doing that I did a quick row count and divided by the 2.5 mil to see how many times I still needed to run the delete. Last but not least the 2.5 million was an educated guess as to the size of the delete that would run fastest. Too large a value and each delete takes too long, too small and you have to run too many deletes. I was also very lucky here that the database is in SIMPLE recovery. Otherwise, I would have had to do a number of batches, log backup, number of batches, log backup etc.
Regardless, it takes a while to delete a billion rows so have fun!
Category: Microsoft SQL Server, Problem Resolution, SQLServerPedia Syndication, T-SQL | Tags: best practices, microsoft sql server, problem resolution, T-SQL
5 thoughts on “Deleting a lot of data”
Leave a Reply Cancel reply
This site uses Akismet to reduce spam. Learn how your comment data is processed.
[…] Kenneth Fisher wants to delete a lot of rows: […]
Learned something there. I’d never have thought that a WHILE loop would execute everything in a single batch. When you think about it, I suppose it makes sense.
Yep. I had the same feeling the first time it happened to me 😀
I originally wanted to post that using the set rowcount method (with while and commit) is superior to the delete top(n) method due to sorting but then I found when looking at the execution plans that both use the expensive sort operator. I had a similar task recently and found that what REALLY speeds things up is deleting all indexes except the PK and re-create them afterwards.
Great article tough, very useful. GO N
Thanks 🙂 And the index idea is a good one for certain tasks, but in my case I was deleting say 1 billion rows of 6. Getting rid of the indexes & re-creating might actually have increased my time.