Natural vs Artificial Primary Keys


August 29, 2016 by Kenneth Fisher

There has been a long standing debate over the use of a natural primary key vs an artificial one. I thought I would throw in my two cents.

First of all why is it important? The primary key is used for two major purposes.


The primary key is one of the candidate (unique) keys possible for a table. It uniquely identifies a given row of data. This is particularly important in case you need to update or delete a specific row.


The primary key is frequently used in related tables to tie the two tables together. Typically this is a Foreign Key relationship.

  • The smaller the key the less space taken up in the other tables.
  • The smaller the key the smaller the related index. The smaller the related index the fewer reads are required. Joins are typically based on these indexes. So the smaller the key the faster the joins.
  • Because this column or columns will be used as a relationship in other tables any changes to these values will also have to be performed in all of the tables that contain them.


The Clustered index

The primary key is frequently the clustered index. It certainly doesn’t have to be but since that’s the default it almost always is. That being the case let’s also look at what makes a particularly useful clustered index.

  • Ever increasing (or decreasing depending on the direction of the index). This decreases fragmentation and increases insert speeds.
  • Narrow – Same reasoning as above. Smaller = faster.
  • Unique – Primary keys have to be unique anyway but this helps with the Narrow requirement. If the clustered index wasn’t unique a 4 byte uniqueifier has to be added.
  • Static – Every time a value clustered index has to be changed the data has to be moved. This of course slows things down and again can lead to fragmentation.

Ok, now that we know what we are working with what are the pros and cons for natural and artificial keys?


A natural key is one constructed of data that already exists in the table. For example using latitude and longitude in a table of addresses. Or the social security number in a table of employees. (Before you say anything, yes, the social security number is a horrible primary key. Be patient.)

The pros and cons


  • The data actually means something. If instead of using an integer artificial key for a list of states (for example) you use a two character field (TX, AL, CA, etc) then when you see a value in another table it actually means something.
  • Potential space savings. All things being equal there is no need to add additional space to a table to hold an artificial key.


  • Can be hard to pick a good one. Things that might seem obvious (like a social security number) make horrible primary keys. Remember they must be unique (SSN isn’t) and shouldn’t change (SSN can). (See I told you we would get there if you waited.)
  • Frequently (but not always) costs additional space. Even though the artificial key is an additional column it’s frequently smaller than a natural key. Because of this when it’s repeated in other tables additional space will be taken up. Also remember if the primary key is also the clustered index then it’s going to exist in every other index.
  • Remember smaller = faster and again, natural keys are usually bigger.



An artificial key (sometimes called a surrogate) is one that was created specifically to be the unique identifier of a row. Identity columns are the most common form of artificial keys. They are unique, ever increasing (typically) and have absolutely nothing to do with the rest of the data in the row.

The pros and cons


  • Small! Everything that I said above about being smaller. FYI common mistake is to use INT everywhere. A SmallINT or even TinyINT is more than big enough for a lot of lookup tables.
  • The chances of an artificial key value changing is next to nothing. I’m not saying it can’t happen. But it’s highly unlikely.
  • Ever increasing. Identity columns are ever increasing (or decreasing) which helps if this is also the clustered index.
  • Identity columns are also usually unique. They don’t have to be but usually are unless something strange happens.


  • Because the artificial key doesn’t mean anything on it’s own, if it’s used in a foreign key you are more likely to have to join to the lookup table.
  • They have absolutely no relation to the rest of the data on the table breaking third normal form. Honestly I don’t consider that to be a big deal but some people do so I thought I would mention it.



So based on all of the above which do you pick? It’s certainly easier to pick an artificial key. Although as a favor to me still pay attention to the datatype you use. If you really need > 32k (smallint, 2 bytes) entries for your employee table your company is huge. That said my personal opinion is that, like many other things, it depends. If you can find an appropriate small natural key then frequently that’s better. Not to mention the fact that if the table won’t have any children (the key won’t be used in any other tables as a foreign key) then size isn’t as big an issue and it’s easier to find a natural key.

Also don’t forget that the primary key and the clustered index don’t have to be the same thing. So using something like a modified date as the clustered index and something else as the primary key could be perfectly reasonable. In fact I have one whole database where the clustered indexes are the modified date and there are no primary keys. It’s a special case, but over the nears I’ve noticed that special cases seem to be the norm.

6 thoughts on “Natural vs Artificial Primary Keys

  1. Gabriel says:

    One potential risk with artificial keys is that you might not catch duplicates. For example if you have an identity column on your employee table, you could insert the exact same row two or more times without realizing.

    If you’re running a data warehouse and your employee table is tracking changes, then the artificial (a.k.a. surrogate) key works great for that purpose. Otherwise, you may need to add a unique constraint/index somewhere to ensure that you’re not getting unwanted duplicates.

  2. […] Again, not a lawyer, not an expert, but I’m going to guess no. This does mean that in the Natural vs Artificial Keys argument the artificial keys side just got a big bump. Because to be honest, I’m going to […]

  3. […] a long-standing divide in the database community between two camps: the patriots of natural keys and the proponents of […]

  4. Byron Mayes says:

    Always use the unique field or fields as your primary key. Auto identities has no meaning to the data row, unless it is a historical or log file.

    • That’s certainly one way to go. Of course if there is no unique combination of fields, or for whatever reason the unique combination includes a column that has to be nullable then you are going to have a problem.

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

This site uses Akismet to reduce spam. Learn how your comment data is processed.

Enter your email address to follow this blog and receive notifications of new posts by email.

Join 3,755 other subscribers

Follow me on Twitter

ToadWorld Pro of the Month November 2013
%d bloggers like this: