March 19, 2014 by Kenneth Fisher
I had a recent run in with collation problems and it got me started reading. As I read I started to realize that for a seemingly simple subject there was a lot to it. So I thought I would go over some of what I found out. First things first of course.
What is a collation? Collation is a property assigned to any character field that defines the sort and comparison rules for the field. The information I’m going over is mostly found in BOL under Collation and Unicode Support. It’s something like 10 pages long which just goes to show I wasn’t exaggerating about the depth of the subject.
Prior to SQL Server 2012 there were four options associated with collation with a 5th added in SQL Server 2012. They are specified by appending them to the collation name along with a _. So Collation_CS_AS_KS_WS_SC is case sensitive, accent sensitive, kana sensitive, width sensitive and supplementary character sensitive.
- Case sensitivity (CS/CI) – Will the sort/comparison take case (upper/lower) of the characters into account. If CS (case sensitive) it will and if CI (case insensitive) it will not.
- Accent sensitivity (AS/AI) – Will the sort/comparison take into account accented characters (a/ấ). If AS (accent sensitive) then it will. If AI (accent insensitive) it will not.
- Kana sensitivity (KS) – This option is specific to Japanese kana characters. There are two types of kana characters, Hiragana and Katakana. If KS is specified then the sort/comparison will be treat the kana characters differently. If it is not specified then they are treated the same. There is no KI.
- Width sensitivity (WS) – I find this an interesting option. This determines if the storage (one byte or two) is taken into account in the sort/comparison. There is no WI. If WS is included then the collation is width sensitive. If it isn’t included then the collation is not width sensitive
- UTF-16 Supplementary Characters (SC) – Thanks to Bob Beauchemin for pointing this one out to me. But basically this is new as of SQL 2012 and it supports UTF16 characters that require more than a 16bit (2 byte) word. It is supported in 90 level and up collations.
SQL Server supports three different collation sets. Windows collations, Binary collations, and SQL Server collations. I honestly found the differences rather confusing so I’ll let you read up on them yourself.
In addition to the sort/comparison options, for non-Unicode data types (char/varchar/text) collation also determines the code page. Because non-Unicode types only have one byte per character they are more restricted in the number of characters available. The code page determines what character set is available. So for example Hebrew collations will use an entirely different set of characters than English ones. Unicode datatypes (nchar/nvarchar/ntext) don’t have the same problem. Because they have the additional byte per character they can hold a large enough number of possible characters that a code page isn’t really necessary.
Over the next few weeks I’m going to do a number of different posts around the surprisingly deep subject of collation.
As a start listen to Rob Farley’s hilarious talk on collation
Collation: Correction -> Expression, Column, Database
Collation: Correction -> Instance
Collation: Temporary objects & variables: Demonstration
Category: Collation, Microsoft SQL Server, SQLServerPedia Syndication, T-SQL | Tags: code language, collation, language sql, microsoft sql server, T-SQL
5 thoughts on “Collation: Definition”
Leave a Reply Cancel reply
This site uses Akismet to reduce spam. Learn how your comment data is processed.
Actually as of SQL Server 2012 there is a fifth possible collation option, SC for supplemental character support (UTF-16). See: http://msdn.microsoft.com/en-us/library/09f0096e-ab95-4be0-8c01-f98753255747
Well I did mention that it was more complicated than I expected didn’t I? Thanks for the additional info. I’ll add it in.
[…] can take up more than two. I knew this to a certain extent from one of the comments in my Collation: Definition post. As of 2012 there is a collation option called UTF-16 Supplementary Characters (SC) that […]
Hello. I have an issue with one application. The application can not execut
the following short script on an SQL Server database:
set language us_english
select ‘TS=;’ + rtrim(name) + ‘;’ + rtrim(filename) + ‘;end_of_line’ from sysfiles
The message we get is:
Msg 451, Level 16, State 1, Line 3
Cannot resolve collation conflict between “Latin1_General_100_CS_AS_SC”
and “Latin1_General_100_CI_AS_KS_WS_SC” in add operator occurring in
SELECT statement column 1.
The “Latin1_General_100_CS_AS_SC” is the collation of [mydatabase].
How can we fix the issue?
I’m surprised that sysfiles has two different collations in it and you are better off using sys.master_files or sys.database_files. That said if you look here:
You’ll see that you can add the COLLATE clause to change the collation of a single expression. So in your case something like this:
Note: I put the COLLATE on both columns because I didn’t know which had which collation. This forces them both to be the same.