What is the “cost” in Cost Threshold for Parallelism?

2

April 17, 2017 by Kenneth Fisher

tl;dr; While at one point the cost of a query was an estimated time to run in seconds, today it’s just an estimated number used to give an idea of scale.

Cost Threshold for Parallelism has come up several times recently. Just so far this year I’ve found the following posts without going through any major effort. (If I missed yours, sorry. It was just a quick search):

 
Obviously, it’s an important subject, right? And yet we keep seeing comments about how the cost is in seconds.

And to be fair, it is. It’s an estimate of how many seconds a query would take, if it was running on a developers workstation from back in the 90’s. At least that’s the story. In fact Dave Dustin (t) posted this interesting story today:

In case that’s hard to read:

The story goes that when the new query optimizer was developed for SQL Server 7.0 in the Query Optimizer team there was a programmer called nick (I am sorry but I do not know his last name), he was responsible for calculating query costs (among other things…), and he had to decide how to generate the cost number for a query, so he decided that if query runs for 1 second on his own pc the cost will be ….. 1, so we can finally answer the question what is “estimated subtree cost = 1” means ladies and gentleman that it runs for 1 second on nick’s machine!

I don’t know how true it is, but I’ve heard this from a number of different people so let’s go with it.

And there you go. Yes, it’s seconds, but boy is that misleading. If you see a query cost of 10,000 it’s not actually going to take 2.75 hours. First, remember that it’s an estimate based on statistics and so has a distinct chance of being wrong. Add in the increases in processing speed, memory etc (oh and as I understand it the cost doesn’t even take into account IO, and IO speeds are insane these days) it’s going to take a fraction of that time. Unless the estimate is wrong, then it could even take more than that 2.75 hrs.

The end result is that these days cost is nothing more than a scale based on an estimate based on a guess. A cost of 100 is probably a small fast query, 1k+ is probably a long, obnoxious query.

2 thoughts on “What is the “cost” in Cost Threshold for Parallelism?

  1. notarian says:

    For some reason this story is irresistible. It seems like no presentation about query cost can avoid the re-telling of the “some PC in Building X” where X is the number of the building of the Microsoft campus where the SQL Server team is/was. However, all cost-based optimizers use a cost number, so each database team must have a Nick and his PC.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

Enter your email address to follow this blog and receive notifications of new posts by email.

Join 1,672 other followers

Follow me on Twitter

ToadWorld Pro of the Month November 2013
%d bloggers like this: