The Halloween Problem

8

October 31, 2016 by Kenneth Fisher

halloweenprotection

DBA: Help me! Help me! It’s Halloween! I have a problem! Oh won’t you please help me!

Ok, so that’s really got nothing to do with the Halloween problem. But it is Halloween today so it seemed like a good time to bring the subject up. Of course if we are going to talk about the Halloween problem it would probably be nice to know why it’s called that, and what the heck, it would be nice to know what it actually is.

Why is it called The Halloween Problem?
Simply enough. It was discovered on Halloween. A bit anti-climatic maybe, but true.

What is The Halloween Problem?
This is a bit more complicated. Let’s say you are trying to give a 10% raise to everyone who makes less than $25k.

UPDATE TableName
SET YearlySalary = YearlySalary * 1.1 -- 10% raise
WHERE YearlySalary < 25000;

Couple of quick notes here. This is a common example because this in fact the problem that exposed the issue. Also, while UPDATEs are probably the easiest way to explain what’s going on, it can affect any type of write.

So back to our update statement. There are several ways this could be implemented. I’m going to use pseudo T-SQL to demonstrate a couple and explain each.

-- Collect the rows to be updated
SELECT KeyVal INTO #Temp
FROM TableName
WHERE YearlySalary < 25000;

-- Perform the update on the rows selected
UPDATE TableName
SET YearlySalary = YearlySalary * 1.1 -- 10% raise
JOIN #Temp
ON TableName.KeyVal = #Temp.KeyVal

So this obviously has the down side of having to hit each row to be updated twice. That might be the best performing plan, but it might not. The other option is to hit each row one at a time.

DECLARE @KeyVal
SELECT TOP 1 @KeyVal = KeyVal
FROM TableName
WHERE YearlySalary < 25000;

WHILE @[clustered index columns or row identifier] IS NOT NULL
BEGIN
UPDATE TableName
SET YearlySalary = YearlySalary * 1.1 — 10% raise
WHERE KeyVal = @KeyVal

SELECT TOP 1 @KeyVal = KeyVal
FROM TableName
WHERE YearlySalary < 25000;
END

In this particular case we only have to hit each row once, but the more suspicious minded of you might have noticed a slight issue. The while loop will only end when everyone has a salary higher than $25k. If someone started at $10k they will get raise after raise until they are actually at $25,937.42. And while that's great for them, it's probably not the actual intent.

Now I do want to point out that this probably not how anyone would actually perform the update. This is just my interpretation via pseudo-code.

So how do we fix this? Basically by switching from a WHILE loop to a CURSOR. Ie Pull the data, making a list of each of the rows to be updated, and updating each one. I know this sounds pretty similar to the first method but it isn't. Quite. This would be more like using a cursor if I understand it correctly. The query is still only run through once, each row being updated during that single run through.

DECLARE GiveRaise CURSOR
FOR SELECT KeyVal FROM TableName
WHERE Salary < 25000;

DECLARE @KeyVal int;

OPEN GiveRaise;

FETCH NEXT FROM GiveRaise INTO @KeyVal;
WHILE (@@fetch_status -1)
BEGIN
IF (@@fetch_status -2)
BEGIN
UPDATE TableName
SET Salary = Salary * 1.1
WHERE TableName.KeyVal = @KeyVal;
END;
FETCH NEXT FROM GiveRaise INTO @KeyVal;
END;

CLOSE GiveRaise;
DEALLOCATE GiveRaise;
GO

Of course all of this is handled behind the scenes and we don’t have to worry about it. Unless of course you happen to be writing your own database system.

Additional Reading:

An excellent series of articles by Paul White (b/t)

The wiki page.

8 thoughts on “The Halloween Problem

  1. dewitte says:

    This is probably just a good example on how to properly handle a simple problem and use a cursor, but wouldn’t one just do something like:

    Update TableName
    Set YearlySalary = YearlySalary * 1.1 where YearlySalary<25000;

    Then you only touch each row once.

    I've always heard cursors are demons so I usually avoid them and try to use while statements instead. I can think of two alternatives using while –

    1) You create a #TempDone table and insert @keyval as you go and each time you select a new @keyval, you check for where not in #TempDone – which would get hairy if you have a million employees making less than $25k

    or

    2) You alter the update statement so you check all rows, but have a case statement for the update – set YearlySalary = YearlySalary * case when YearlySalary<25000 then 1.1 else 1 end, but that's wasteful since you'd alter the where to check for all rows.

    Still, a great article because I'm sure we've all sat staring at the screen and thinking – "This is taking a lot longer than it should have", and "Did the backups run last night?"

  2. Dirk and The Mac says:

    I completely agree with Dewitte. I asked a sample of the developers here and not one of them would write the SQL that way. We would all use straight updates. We would also generally avoid using cursors or loops unless held at gunpoint. Set based over RBAR every time!

    • Right. But that isn’t the code anyone would write. It’s an example of how the proceser might handle a single update statement. I just used code to make it simpler to see.

      Or at least that was the idea 🙂

  3. Wise Old Man says:

    Since I would need to document what was done, I would write a script that wrote each individual’s update statement, and then run them in batches making sure that every one of them completed. Of course, my company has less that 1,000 employees, so it’s a manageable list.

  4. Joe Celko says:

    Back in the early days, one of the standard problems in the training manuals from Sybase feature bookstore. The goal was to raise the price on cheap books by percentage while lowering the price on expensive books by another percentage. Definition of expensive was $25, which gives you an idea just how old this exercise was.

    The first proposed solution was to do two updates

    BEGIN
    UPDATE Books –– cheap books
    SET price = price * 1.10
    WHERE price 25.00;
    END;

    This fails; a book that is $25 first goes up to $27.50 and then goes down to $24.75. If you change the order of the update statements you still have the same basic problem. The price first goes down to $22.50 and then down to $24.75.

    This is why we needed a cursor that could do this process, row by row. The logic is very simple and used IF-THEN-ELSE flow control to do the math. Welcome to the early days of SQL! Today would simply use the single statement with a CASE expression:

    UPDATE Books
    SET price
    = CASE WHEN price <= 25.00
    THEN price * 1.10
    ELSE price * 0.90 END;

Leave a Reply to Joe Celko Cancel reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

This site uses Akismet to reduce spam. Learn how your comment data is processed.

Enter your email address to follow this blog and receive notifications of new posts by email.

Join 3,755 other subscribers

Follow me on Twitter

ToadWorld Pro of the Month November 2013
%d bloggers like this: