SQL Server Choosing Collation

22 Oct
October 22, 2011

your SQL Server Collation choice for your database (or server) will control how characters will be sorted and compared within the database or SQL Server in general.

DBAs must choose a collation carefully that matches the type of characters encountered in a particular database. It is important to take into account as well how this database will interact with other database in the same instances, or through a linked SQL Server instances.

With choosing a collation, its important to select something suitable for both the data in the database, and how this data will interact with other data in other SQL Server databases, so there must be a global and a local collation selection strategy. Sometimes it is recommended to keep the collation consistent across the whole environment; this is because SQL Server stores character based columns in the specified collation, and comparing two different columns (maybe across different tables or tables) that have different collation will force the user to call the COLLATE function on that column, resulting in inefficient utilization of the available indexes on those character based columns.

This could be a nightmare, as I have found, if you have multiple database or multiple SQL Server instances with different default collation values, each collaborating together, for example in a scale-out environment.

What is a Collation?

Collation is a collection of rules that dictates how SQL Server encodes, stores, sorts and compares character types.

Collations have no effects on the way Unicode types (nvarchar or nchar) are stored , but the Collation rules still govern how these types are sorted and compared. The Collation type directly effects the way non-Unicode types (varchar or char) are stored.

More about Collations can be found at MSDN

Collation Types and Options

Collations have the following configurable attributes, which can generally be deduced from the Collation type name

  • Character Encoding: What type of character encoding to use (for example for most Western European languages, the Latin1_General is a good character encoding type)
  • SQL Version: Collations can either be specific to a SQL Server version (for example if Collation type name has “100”, then this is a SQL Server 08 Collation, “90” indicates a SQL Server 05 Collation
  • Case Sensitivity: CI indicates “Case Insensitivity” and CS indicates “Case Sensitivity”
  • Accent Sensitivity: AI indicates “Accent Insensitivity” and AS indicates “Accent Sensitivity”, Lack of accent identifier in Collation type name indicates “Accent Insensitivity”
  • Width Sensitivity: WI indicates “Width Insensitivity” and WS indicates “Width Sensitivity”, Lack of width identifier in Collation type name indicates “Width Insensitivity”
  • Kanna Sensitivity: KI indicates “Kanna Insensitivity” and KS indicates “Kanna Sensitivity”, Lack of Kanna identifier in Collation type name indicates “Kanna Insensitivity”

This is with the exception of Binary Collations

Setting SQL Server Collations

There are many ways Collations can be set:

  • When installing SQL Server, you will be asked to set the default Collation of the SQL Server instance, this will initialize some of the system databases under that Collation.
  • When creating a new database, you will be prompted to set the Collation of that database, the default option will be the SQL Server instance Collation.
  • When creating objects such as Tables or Table Types, you can set the Collation on a per column basis. Sometimes for optimisation purposes, Collations are set on a Computed Column.
  • Inline you can use the “COLLATE [Collation-Name]” Function to convert from one Collation to another (you can also use the database property “DATABASE_DEFAULT” to get the database’s default Collation.

Choosing a SQL Server Collation

When selecting a Collation, its important to consider the following:

  • What are the best combination of options (shown above) that best describe the type of character based data you will be encountering in the database. For example, if you are dealing with French characters, then you might choose the Latin1_General character encoding, with Accent Sensitivity (AS)
  • Will this database interact with other databases in its environment in terms of its character based columns? if so, what is the Collation of these other databases/tables?
  • Are there any sort specific requirements in the database that you need to adhere by?, considering those requirements early on could help speed up your Sort queries later on
* * *     1 vote
0 replies

Leave a Reply

Want to join the discussion?
Feel free to contribute!

Leave a Reply

Your email address will not be published. Required fields are marked *

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>