Uniquifier is a rather unique word isn’t it?

8

September 18, 2017 by Kenneth Fisher

tl;dr; The uniquifier is used to make a non-unique key in a clustered index unique.

Uniquifier is a rather funny name, and yet it’s very descriptive. If there is a key that must be made unique then SQL server will add a uniquifier to, well, make it unique.

Now, why would you want to do that?

Well, to start with, you need to think about how non-clustered indexes (NCIs) and clustered indexes (CIs) work. If a query uses an NCI to filter (or sort, or whatever) information but not all of the information needed is in the NCI then SQL has to go back to the CI to get that additional information. In order to do that it has to have a pointer back to the original row (contained in the clustered index) and just that row. That’s pretty easy if the clustered index is also a unique index, but it doesn’t have to be. You can create a non-unique clustered index. In the case of a non-unique clustered index SQL adds the uniquifier to make sure that the value is in fact unique.

Examples make life easier, so let’s say you have a table with a non-unique clustered index of CreateDate. You then insert 10 rows in a single batch.

CREATE TABLE NonUniqueCI (
	CreateDate datetime NOT NULL 
		CONSTRAINT df_CreateDate DEFAULT getdate()
	, Col1 int
	, index ci_NonUniqueCI clustered (CreateDate)
	);

INSERT INTO NonUniqueCI (Col1)
SELECT TOP (10) 1
FROM sys.columns a;

SELECT * FROM NonUniqueCI;

So we have a CI on CreateDate but every row is identical. How does SQL keep track of which row is which? To check we have to use the undocumented DBCC commands PAGE and IND so do NOT do this in production. It’s probably safe but undocumented means NO PRODUCTION!!

--Get the Page Number of the Clustered Index
DBCC IND (Test, NonUniqueCI, -1)

-- Create a table to store the results of the DBCC Page
CREATE TABLE #DBCCResults (
	[ParentObject] varchar(100),
	[Object] varchar(1000),
	[Field] varchar(100),
	[VALUE] varchar(max)
	);

-- Collect information about the page
INSERT INTO #DBCCResults
EXEC sp_executesql N'DBCC PAGE(Test, 1, 520, 3) WITH TABLERESULTS;';

-- View just the column information from the 
-- DBCC Page results
SELECT * 
FROM #DBCCResults
WHERE [Object] LIKE '%Column%'
ORDER BY [Object];

As you can see each row has a uniquifier that makes the row unique. With a combination of the clustered index key (in this case CreateDate) and the uniquifier we can uniquely identify each row.

Now, what happens if the values are in fact unique?

-- Insert 2 new rows
INSERT INTO NonUniqueCI (Col1) VALUES (1);
INSERT INTO NonUniqueCI (Col1) VALUES (1);

-- Collect information about the page
TRUNCATE TABLE #DBCCResults
INSERT INTO #DBCCResults
EXEC sp_executesql N'DBCC PAGE(Test, 1, 520, 3) WITH TABLERESULTS;';

-- View just the column information from the 
-- DBCC Page results
SELECT * 
FROM #DBCCResults
WHERE [Object] LIKE '%Column%'
ORDER BY [Object];

Notice how the uniquifier for the two new rows is a 0. That’s fine though since we don’t need that value to identify the row.

So now let’s start over and insert two sets of identical rows.

DROP TABLE NonUniqueCI;

CREATE TABLE NonUniqueCI (
	CreateDate datetime NOT NULL 
		CONSTRAINT df_CreateDate DEFAULT getdate()
	, Col1 int
	, index ci_NonUniqueCI clustered (CreateDate)
	);

-- Load the table with two unique values
INSERT INTO NonUniqueCI (Col1) VALUES (1);
WAITFOR DELAY '00:00:00.003' -- guarantee unique value of getdate()
INSERT INTO NonUniqueCI (Col1) VALUES (1);
-- Insert duplicates of the first two rows.
INSERT INTO NonUniqueCI
SELECT * FROM NonUniqueCI

--Get the new Page Number of the Clustered Index
DBCC IND (Test, NonUniqueCI, -1)

-- Collect information about the page
TRUNCATE TABLE #DBCCResults
INSERT INTO #DBCCResults
EXEC sp_executesql N'DBCC PAGE(Test, 1, 568, 3) WITH TABLERESULTS;';

-- View just the column information from the 
-- DBCC Page results
SELECT * 
FROM #DBCCResults
WHERE [Object] LIKE '%Column%'
ORDER BY [Object];

Notice that we have uniquifiers 0 and 1 for each pair of non-unique values. If I added a third there would be a 2 for that set of values. Etc.

Last piece of information (thank goodness?). The uniquifier is a variable length integer. Variable length integer? I’ve never heard of that before? Well, as I understand it it’s either 4 bytes (if in use) or 0 bytes (sort of) if not. So what do I mean by sort of? Even if it’s not in use there has to be a place to store it. Therefore there has to be a space for it in the variable-length column offset array. However, since that is a rather complicated subject (for me at least) I’m going to skip it for now.

Now, the uniquifier being an integer has an interesting effect. I saw a demo once (and no I’m not going to duplicate it) where someone inserted 2,147,483,647 (max int) identical values into a non-unique clustered index. After that, they could no longer insert that value any more. I don’t exactly consider this a big issue since if you have that many duplicate values it shouldn’t be your CI anyway. Still, it’s interesting from an academic point of view. Actually, 95% of everything about the uniquifier is academic really. The only important thing you need to know is that if you have a non-unique clustered index it’s going to be 4 bytes wider than you expected.

Note: I’d always read that a uniquifier was only added to a row if, and only if, it was needed to make the row unique. I didn’t see any evidence of this in the DBCC Page output. From what I can see it looks like there is always a uniquifier. I’ll be interested to see if anyone knows why that is?

8 thoughts on “Uniquifier is a rather unique word isn’t it?

  1. Hi there. Regarding

    > it it’s either 4 bytes (if in use) or 0 bytes (sort of) if not. So what do I mean by sort of? Even if it’s not in use there has to be a place to store it.

    and

    > I’d always read that a uniquifier was only added to a row if, and only if, it was needed to make the row unique. I didn’t see any evidence of this in the DBCC Page output. From what I can see it looks like there is always a uniquifier.

    I will say: ah, but you _did_ see evidence of the variable nature of the uniquifier as it is in all 3 output images in this post ;-). Well, half of the evidence is showing, and the other half is cut-off due to the width of the **Object** column in the grid view.

    The first half is shown in the **ParentObject** column. If you look closely at the `Length` portion of each row, you will notice that some rows are 19 bytes while others are 27 bytes. The difference is whether or not the uniquifier is present. And you will see a correlation between all rows having a uniquifier of 0 being 19 bytes, and all non-zero uniquifier rows being 27 bytes. This is due to the first entry of any Key (or multi-Key combination) being inherently unique. But starting with value 1 and onwards, the space is taken up to hold that value.

    The second half is at the far right of the **Object** column which is currently cut-off in all images posted here. But, you can at least see the start of where the info is as the right side of that column shows “`Length (phys…`”. If you expand that column in the grid view so that you can see all of the text, it will say “`Length (physical) 0`” when uniquifier is zero, else it will show “`Length (physical) 4`”.

    So, this is being handled just like how Data Compression will allow `NULL` and `0` fixed-length types to take up 0 bytes.

    I am working on a post right now to demonstrate this particular behavior of the uniquifier :-).

  2. John Halunen says:

    Yah, actually hit that error in Prod back in 2007 or so. Had a dev that thought it would be good to have a CI on a column with only date in it, and eventually hit that many inserts in a day. It was interesting migrating that table to a new schema, to say the least.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

This site uses Akismet to reduce spam. Learn how your comment data is processed.

Enter your email address to follow this blog and receive notifications of new posts by email.

Join 3,755 other subscribers

Follow me on Twitter

ToadWorld Pro of the Month November 2013
%d bloggers like this: