What is the “cost” in Cost Threshold for Parallelism?
3April 17, 2017 by Kenneth Fisher
tl;dr; While at one point the cost of a query was an estimated time to run in seconds, today it’s just an estimated number used to give an idea of scale.
Cost Threshold for Parallelism has come up several times recently. Just so far this year I’ve found the following posts without going through any major effort. (If I missed yours, sorry. It was just a quick search):
- Why Cost Threshold for Parallelism Shouldn’t Be Set To 5 by Erik Darling (b)
- Why you should change the Cost Threshold for Parallelism. by Grant Fritchey (b/t)
- Determining the Cost Threshold for Parallelism by Grant Fritchey (b/t)
- And an update to one of the MS pages Configure the cost threshold for parallelism Server Configuration Option
Obviously, it’s an important subject, right? And yet we keep seeing comments about how the cost is in seconds.
And to be fair, it is. It’s an estimate of how many seconds a query would take, if it was running on a developers workstation from back in the 90’s. At least that’s the story. In fact Dave Dustin (t) posted this interesting story today:
In case that’s hard to read:
The story goes that when the new query optimizer was developed for SQL Server 7.0 in the Query Optimizer team there was a programmer called nick (I am sorry but I do not know his last name), he was responsible for calculating query costs (among other things…), and he had to decide how to generate the cost number for a query, so he decided that if query runs for 1 second on his own pc the cost will be ….. 1, so we can finally answer the question what is “estimated subtree cost = 1” means ladies and gentleman that it runs for 1 second on nick’s machine!
I don’t know how true it is, but I’ve heard this from a number of different people so let’s go with it.
And there you go. Yes, it’s seconds, but boy is that misleading. If you see a query cost of 10,000 it’s not actually going to take 2.75 hours. First, remember that it’s an estimate based on statistics and so has a distinct chance of being wrong. Add in the increases in processing speed, memory etc (oh and as I understand it the cost doesn’t even take into account IO, and IO speeds are insane these days) it’s going to take a fraction of that time. Unless the estimate is wrong, then it could even take more than that 2.75 hrs.
The end result is that these days cost is nothing more than a scale based on an estimate based on a guess. A cost of 100 is probably a small fast query, 1k+ is probably a long, obnoxious query.
For some reason this story is irresistible. It seems like no presentation about query cost can avoid the re-telling of the “some PC in Building X” where X is the number of the building of the Microsoft campus where the SQL Server team is/was. However, all cost-based optimizers use a cost number, so each database team must have a Nick and his PC.
That’s because it’s a great story! And of course because people are constantly asking what the heck that cost is.
[…] metric. When your hardware is different from the hardware the costing algorithms are based on (“Nick’s machine”), it’s bound to […]