RBAR vs Batch10
August 17, 2016 by Kenneth Fisher
Many years ago Jeff Moden (of SQL Server Central fame) came up with the concept of RBAR. Row-By-Agonizing-Row. At it’s most basic it means you are inserting one row at a time. A more broad interpretation says it’s any type of loop even the type caused by a recursive CTE.
And the point? Loops are slower than batches.
USE AdventureWorks2014; GO SELECT TOP 0 * INTO vSWA FROM Sales.vStoreWithAddresses; GO DECLARE @StartTime datetime = getdate() INSERT INTO vSWA SELECT * FROM Sales.vStoreWithAddresses; SELECT DATEDIFF(millisecond,@StartTime,getdate()) GO DROP TABLE vSWA; GO -- 23 milliseconds
USE AdventureWorks2014; GO SELECT TOP 0 * INTO vSWA FROM Sales.vStoreWithAddresses; GO DECLARE curSWA CURSOR KEYSET FOR SELECT * FROM Sales.vStoreWithAddresses; DECLARE @BusinessEntityID int ,@Name Name ,@AddressType Name ,@AddressLine1 nvarchar(60) ,@AddressLine2 nvarchar(60) ,@City nvarchar(30) ,@StateProvinceName Name ,@PostalCode nvarchar(15) ,@CountryRegionName Name; OPEN curSWA DECLARE @StartTime datetime = getdate(); FETCH NEXT FROM curSWA INTO @BusinessEntityID, @Name, @AddressType, @AddressLine1, @AddressLine2, @City, @StateProvinceName, @PostalCode, @CountryRegionName; WHILE (@@fetch_status <> -1) BEGIN IF (@@fetch_status <> -2) BEGIN INSERT INTO vSWA (BusinessEntityID, Name, AddressType, AddressLine1, AddressLine2, City, StateProvinceName, PostalCode, CountryRegionName) VALUES (@BusinessEntityID, @Name, @AddressType, @AddressLine1, @AddressLine2, @City, @StateProvinceName, @PostalCode, @CountryRegionName) END FETCH NEXT FROM curSWA INTO @BusinessEntityID, @Name, @AddressType, @AddressLine1, @AddressLine2, @City, @StateProvinceName, @PostalCode, @CountryRegionName; END SELECT DATEDIFF(millisecond,@StartTime,getdate()); CLOSE curSWA; DEALLOCATE curSWA; GO DROP TABLE vSWA; GO --850
So 23 milliseconds for the batch version and 850 milliseconds for RBAR. What a difference.
Now in this case the code for the RBAR is also a lot more complicated. But that isn’t always the case. It also isn’t always the case that RBAR is slower. But it’s almost always a lot slower than batch.
So, while the code for RBAR is often easier to write, even though it might be physically longer, it’s probably going to be slower too.
Edit: On twitter today I had it pointed out to me that my cursor may not be the fastest it could be. And in specific I was shown this post by Aaron Bertrand (b/t) that says a FAST_FORWARD cursor might very well be faster than what I had. (It didn’t test against KEYSET but even so.)
So in the interest of fairness to cursors (and honestly I don’t object to them when you actually need them) I ran some additional tests. The following are average times (I ran each test 10 times):
KEYSET Cursor: 558.6ms
FAST_FORWARD Cursor: 595.4ms
So batch is still considerably faster (by 10+x) and KEYSET was actually faster (marginally) than FAST_FORWARD. At least in my test.
Category: Microsoft SQL Server, Performance, SQLServerPedia Syndication, T-SQL | Tags: best practices, microsoft sql server, Performance, T-SQL
10 thoughts on “RBAR vs Batch”
Leave a Reply to Rick Anderson Cancel reply
This site uses Akismet to reduce spam. Learn how your comment data is processed.
Great blog post. I might borrow that animation for my class. The way I typically explain it is if you wanted to make a pizza. You could go to the store and get all your ingredients all at once. (Batch) or you could make several trips to the store and on each trip just buy one item at a time. (RBAR).
Good explanation, I may use that in an upcoming session I’m doing :D. And by all means borrow the image(s). They turned out pretty well. I was a bit worried I wouldn’t be able to pull them off.
It seems you are talking about “Set based” as apposed to Batch mode, which is a specific mode of executing operators, usually (although not exclusively anymore) on ColumnStore indexes.
Would be good to see a comparison on traditional Row mode vs Batch mode too!
I agree with John, the animations look great! I tend to find them the hardest part of producing presentations is the graphics.
You are correct 🙂 I do mean “set based”. I’ve just always heard it as “batch processing” so that’s the way I describe it. I’ll add Row mode vs Batch mode to my list!
Thanks again on the graphics. I actually made them for a presentation I’m doing and liked them so much I had to write the blog around them 🙂
There is another factor to consider with set based versus RBAR processing. When I do things one set element to the time, I have to check element (n +1) against all the prior (n) elements ahead of it. But if I have a set constraint, I just put the whole set all at once. And know I am good. The fact that it is faster is very very nice; the fact that is correct is probably more important in the long run.
For example, if I have a constraint that says no customer can get more than five refunds in a given month, it is a lot easier to load the whole months worth of work than it is to cursor back through everything else they have done in that month.
This is topical for us. We have a couple ETL devs, me being one:) One of us much prefers using lookups and complex data flows in SSIS….while my tried-and-true approach is to stage data on the destination and use batch SQL to integrate the data. There are drawbacks to this, namely that you have to create tables on another system, and as the number of integrations grows to that system this can get tricky (do you use different tables, or share them etc). Also a large amount of data of course has to be planned for in terms of storage. If the set of data is small, and the business logic isn’t too involved, I will use lookups certainly. I just tend to fall back to my comfort zone of SQL, more than one way to skin a cat 🙂
That said, a recent project using RBAR (for-each looping on a recordset, which reached out to another system to integrate the data) just couldn’t hold meet the performance requirements. I had to change it to a “move data to destination stage; exec sql; copy resulting data back to the original system” approach, and performance went from minutes to a couple seconds. If this was a daily overnight thing I might have kept the more “elegant” gui-driven RBAR approach, but this needed to work quickly.
Yea, I’ve seen a lot of stuff like that. Programming is almost always a series of compromises. And frequently RBAR is just easier to code. The logic is easier. Unfortunately it almost always performs significantly worse.
[…] told you not to use cursors but I like […]
[…] faster than using ORM. The inherent problem with ORM’s is that generally they use an RBAR (Row By Agonising Row) or Collection style approach. This is a great example of a mechanism that […]
[…] To be honest, your question sounds like an XY problem. Are you sure you want to determine which code uses XACT_ABORT? A stored procedure won’t cause an open transaction to “hang around”. The most likely cause is a code window open in SSMS that has BEGIN TRANSACTION where the corresponding ROLLBACK or COMMIT haven’t been executed. Alternately, there could be some legitimate long-running code keeping transactions open. One common issue that results in long running transaction is code using the RBAR method. […]