Guid vs Identity columns (Ints)11
January 13, 2014 by Kenneth Fisher
I came across an interesting question on SE last week. Guid vs INT – Which is better as a primary key? In addition to the quite good accepted answer I thought I would throw in my own take.
- GUIDs are 16 bytes and hold more values you then could ever use.
- With an identity column you can choose a data type dependent on your need.
- tinyint 1 byte 0-255
- smallint 2 bytes -2^15 (-32,768) to 2^15-1 (32,767)
- int 4 bytes -2^31 (-2,147,483,648) to 2^31-1 (2,147,483,647)
- bigint 8 bytes -2^63 (-9,223,372,036,854,775,808) to 2^63-1 (9,223,372,036,854,775,807)
Remember that the size of your column affects not just how much space the table takes up but how many pages (both index and data) need to be read to perform a given operation. Bigger the column the less you can fit in a page, the more pages need to be read, the slower your queries. Even if by a very small amount.
- GUIDs are considered universally unique. This isn’t exactly true but if you look here and here you will see that it’s really close enough to true to work with.
- Identity columns are only as unique as you make them. If you put a unique constraint on the column (or PRIMARY KEY) then you will be at least guaranteed that you have a unique value in that table. But only in the table itself, when you compare to other tables, databases etc there is no uniqueness
You have to decide here how unique you need your column to be and if it’s worth the space it’s going to take up.
- Because they are universally unique GUIDs are completely portable. You can move the values from place to place with no difficulty.
- Identity columns are not really portable. Anyone who has tried to merge two tables with identity columns, between prod and test for example, knows what a pain this is.
- Ease of use
- GUIDs require the use of NEWSEQUENTIALID() or NEWID() either as a default, part of the insert, a trigger etc
- Identity columns are created and then you actually have to avoid them to make them work properly.
Personally I find identity columns MUCH easier but on the other hand I use them far more often than GUIDs so I have a lot more experience there. They do say you tend to go with what you know.
If you are never going to merge data with another table/location then you are probably ok with an identity column. If on the other hand you expect to need to merge data from multiple tables/locations then you should probably think about GUIDs.
And last but not least here is BOLs take on each of them.
As with many design considerations this is an important decision. When deciding to use a GUID vs an integer Identity column you should balance the portability of a GUID vs it’s additional space required and the smaller size of the identity column vs the major pain that moving a row with an identity column from one table to another can be. The fact that integers are easier to work with when debugging is true but somewhat insignificant if you truly need a GUID. And to be honest I’ll bet you get used to it. Integers “looking” better when displayed to the end user is a factor but somewhat less important when compared to other considerations.
You, your co-workers and your replacement (there will always be one) will have to live with your decision so at least think about it before you decide. One of the things I dislike most is the “Always do it this way” mentality.
Category: Microsoft SQL Server, SQLServerPedia Syndication, T-SQL | Tags: code language, language sql, microsoft sql server, sql statements, T-SQL
11 thoughts on “Guid vs Identity columns (Ints)”
Leave a Reply Cancel reply
This site uses Akismet to reduce spam. Learn how your comment data is processed.
One other major difference:
Identity columns are generated sequentially and make an excellent candidate for the clustered index on the table. GUIDs, on the other hand, should never be clustered due to the method in which they are generated.
Unless of course you use NEWSEQUENTIALID. Then the GUIDs are in order (except after a system restart) and can be clustered without too much of a problem. Also while identity columns are usually generated sequentially you can get exceptions although those are usually one offs.
I worked with a database full of unnecessary GUIDs for a long time and never got used to it. They created a lot of extra work over the years. They have their place, but usually integer is the way to go.
I had to deal with the same question when implementing a datawarehouse with different data sources and the need for a unique ID across data sources => Thus the only option was GUID. The snag was that SSAS doesn’t like GUIDs at all. A computed column did the trick. Formula is: (CONVERT([bigint],CONVERT([varbinary](128),[myGUID]))*(-1)) The resulting bigint is only half the size of a GUID and by far better for use in SSAS Cubes. A test with 4.5 Million records produced only unique “newIDs “.
Regarding portability I sometimes add a small char column next to the identity column (combined primary key). So, the codes 2475 US and 2475 FR can refer completely different objects and can be easily merged into one table for reporting … or in case databases of one of more branch offices need to be centralized. Of course this implies extra storage space, 2 conditions in join statements, extra column in related tables … but it is easier to read/remember than a GUID.
I commonly work on systems with tons of remote databases where records are added, then replicated to a central database. I haven’t came up with a good way to use identity columns while ensuring uniqueness. It’s much easier to deal with the index fragmentation than devising a scheme with identity column
Well as Peter said adding an additional char(2) column can provide the uniqueness you want and still avoid fragmentation. Another option is to seed the identity column with large gaps for the different locations (1mil for one 2 mil for another etc). Guids are certainl a good choice here though.
I think it all depends on what you’re trying to achieve. If you’re doing a lot of replication, use a Sequential GUID rather than a GUID. Fragmentation is a lot less.
If all your records sit in one database and performance is important, just stick with int.
I’m not really sure replication has much to do with it. There are forms of replication where I might take guid vs int into account but even then it’s only one factor. Performance and fragmentation are certainly other factors but again they are factors and there are reasons to override them at times. Really it all comes down to making an informed decision and not just saying “We’ve always done it that way.”
Reblogged this on nuWave Development Team.
I know this post is pretty old but there are couple of things people really need to know about GUIDs and Identity columns and index maintenance. To summarize, index fragmentation is a myth perpetuated by a more than 20 year of “Best Practice” that’s actually a WORST PRACTICE continued by not knowing your insert/update patterns with it comes to identity columns.
I have a “summary” video on the subject. There’s a whole lot more to it. I “just” hit the 3 highpoints I mentioned above in this one.
Seriously… take the time to watch it until the end. It will blow your mind because almost everything they say about Random GUIDs and “Best Practice” index maintenance is wrong and always has been.
Here’s the link.