RBAR vs Batch

7

August 17, 2016 by Kenneth Fisher

Many years ago Jeff Moden (of SQL Server Central fame) came up with the concept of RBAR. Row-By-Agonizing-Row. At it’s most basic it means you are inserting one row at a time. A more broad interpretation says it’s any type of loop even the type caused by a recursive CTE.

And the point? Loops are slower than batches.

RBARBatch

Simple example:

Batch
USE AdventureWorks2014;
GO
SELECT TOP 0 * INTO vSWA FROM Sales.vStoreWithAddresses;
GO
DECLARE @StartTime datetime = getdate()
INSERT INTO vSWA
SELECT * FROM Sales.vStoreWithAddresses;
SELECT DATEDIFF(millisecond,@StartTime,getdate())
GO
DROP TABLE vSWA;
GO
-- 23 milliseconds
RBAR
USE AdventureWorks2014;
GO
SELECT TOP 0 * INTO vSWA FROM Sales.vStoreWithAddresses;
GO

DECLARE curSWA CURSOR KEYSET FOR
SELECT * FROM Sales.vStoreWithAddresses;

DECLARE @BusinessEntityID int
		,@Name Name
		,@AddressType Name
		,@AddressLine1 nvarchar(60)
		,@AddressLine2 nvarchar(60)
		,@City nvarchar(30)
		,@StateProvinceName Name
		,@PostalCode nvarchar(15)
		,@CountryRegionName Name;

OPEN curSWA

DECLARE @StartTime datetime = getdate();
FETCH NEXT FROM curSWA INTO @BusinessEntityID, @Name, @AddressType, @AddressLine1,
					@AddressLine2, @City, @StateProvinceName, @PostalCode,
					@CountryRegionName;
WHILE (@@fetch_status <> -1)
BEGIN
	IF (@@fetch_status <> -2)
	BEGIN
		INSERT INTO vSWA (BusinessEntityID, Name, AddressType, AddressLine1,
						AddressLine2, City, StateProvinceName, PostalCode,
						CountryRegionName)
			VALUES (@BusinessEntityID, @Name, @AddressType, @AddressLine1,
					@AddressLine2, @City, @StateProvinceName, @PostalCode,
					@CountryRegionName)
	END
	FETCH NEXT FROM curSWA INTO @BusinessEntityID, @Name, @AddressType, @AddressLine1,
					@AddressLine2, @City, @StateProvinceName, @PostalCode,
					@CountryRegionName;
END
SELECT DATEDIFF(millisecond,@StartTime,getdate());

CLOSE curSWA;
DEALLOCATE curSWA;
GO

DROP TABLE vSWA;
GO
--850

So 23 milliseconds for the batch version and 850 milliseconds for RBAR. What a difference.

Now in this case the code for the RBAR is also a lot more complicated. But that isn’t always the case. It also isn’t always the case that RBAR is slower. But it’s almost always a lot slower than batch.

So, while the code for RBAR is often easier to write, even though it might be physically longer, it’s probably going to be slower too.

Edit: On twitter today I had it pointed out to me that my cursor may not be the fastest it could be. And in specific I was shown this post by Aaron Bertrand (b/t) that says a FAST_FORWARD cursor might very well be faster than what I had. (It didn’t test against KEYSET but even so.)

So in the interest of fairness to cursors (and honestly I don’t object to them when you actually need them) I ran some additional tests. The following are average times (I ran each test 10 times):

Batch: 48.7ms
KEYSET Cursor: 558.6ms
FAST_FORWARD Cursor: 595.4ms

So batch is still considerably faster (by 10+x) and KEYSET was actually faster (marginally) than FAST_FORWARD. At least in my test.

7 thoughts on “RBAR vs Batch

  1. johndeardurff says:

    Great blog post. I might borrow that animation for my class. The way I typically explain it is if you wanted to make a pizza. You could go to the store and get all your ingredients all at once. (Batch) or you could make several trips to the store and on each trip just buy one item at a time. (RBAR).

    • Good explanation, I may use that in an upcoming session I’m doing :D. And by all means borrow the image(s). They turned out pretty well. I was a bit worried I wouldn’t be able to pull them off.

  2. Mat says:

    Hi Kenneth
    It seems you are talking about “Set based” as apposed to Batch mode, which is a specific mode of executing operators, usually (although not exclusively anymore) on ColumnStore indexes.
    Would be good to see a comparison on traditional Row mode vs Batch mode too!
    I agree with John, the animations look great! I tend to find them the hardest part of producing presentations is the graphics.
    Cheers
    Mat

    • You are correct 🙂 I do mean “set based”. I’ve just always heard it as “batch processing” so that’s the way I describe it. I’ll add Row mode vs Batch mode to my list!

      Thanks again on the graphics. I actually made them for a presentation I’m doing and liked them so much I had to write the blog around them 🙂

  3. Joe Celko says:

    There is another factor to consider with set based versus RBAR processing. When I do things one set element to the time, I have to check element (n +1) against all the prior (n) elements ahead of it. But if I have a set constraint, I just put the whole set all at once. And know I am good. The fact that it is faster is very very nice; the fact that is correct is probably more important in the long run.

    For example, if I have a constraint that says no customer can get more than five refunds in a given month, it is a lot easier to load the whole months worth of work than it is to cursor back through everything else they have done in that month.

  4. Rick Anderson says:

    This is topical for us. We have a couple ETL devs, me being one:) One of us much prefers using lookups and complex data flows in SSIS….while my tried-and-true approach is to stage data on the destination and use batch SQL to integrate the data. There are drawbacks to this, namely that you have to create tables on another system, and as the number of integrations grows to that system this can get tricky (do you use different tables, or share them etc). Also a large amount of data of course has to be planned for in terms of storage. If the set of data is small, and the business logic isn’t too involved, I will use lookups certainly. I just tend to fall back to my comfort zone of SQL, more than one way to skin a cat 🙂

    That said, a recent project using RBAR (for-each looping on a recordset, which reached out to another system to integrate the data) just couldn’t hold meet the performance requirements. I had to change it to a “move data to destination stage; exec sql; copy resulting data back to the original system” approach, and performance went from minutes to a couple seconds. If this was a daily overnight thing I might have kept the more “elegant” gui-driven RBAR approach, but this needed to work quickly.

    • Yea, I’ve seen a lot of stuff like that. Programming is almost always a series of compromises. And frequently RBAR is just easier to code. The logic is easier. Unfortunately it almost always performs significantly worse.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

Enter your email address to follow this blog and receive notifications of new posts by email.

Join 1,664 other followers

Follow me on Twitter

ToadWorld Pro of the Month November 2013
%d bloggers like this: