Working With Collations In SQL Azure
[This article was contributed by the SQL Azure team.]
A collation encodes the rules governing the proper use of characters for either a language, such as Greek or Polish, or an alphabet, such as Latin1_General (the Latin alphabet used by western European languages). The default collation for character data in SQL Azure databases is SQL_Latin1_General_CP1_CI_AS. This collation is also used across the SQL Azure infrastructure to sort and compare metadata that defines database objects. The server and database level collations are not configurable in SQL Azure. However, you can use a collation of your choice at the column and expression level. This article will show you how.
Although the server and database collations cannot be configured in SQL Azure, you still can query both of these properties, for example:
SELECT SERVERPROPERTY('Collation') SELECT DATABASEPROPERTYEX('TestDB', 'Collation')
Currently, both of queries will return the default collation: SQL_Latin1_General_CP1_CI_AS.
If the solution you are building on SQL Azure requires a different collation for character data you will need to set the collation at the column level or use the expression level collation to explicitly cast to a specific collation. Keep reading to learn how.
When using SQL Server Management Studio’s Generate Script Wizard (more about using the Generate Script Wizard in this blog post), column collations are included by default. You can verify this by checking that the “Include collation” option is set to “True” (the default value).
This sample shows how to create columns with a specific collation:
CREATE TABLE t ( id int IDENTITY(1,1) NOT NULL PRIMARY KEY CLUSTERED, c1 nvarchar(20) COLLATE SQL_Latin1_General_CP1_CI_AS, c2 nvarchar(20) COLLATE Japanese_CI_AS )
To retrieve column collation property for the example above:
SELECT name, collation_name FROM sys.columns WHERE object_id = OBJECT_ID('t', 'U') AND name <> 'id'
Along with metadata objects (such as logins, databases, tables, views, and column names, etc.), the default collation also applies to identifiers for variables, GOTO labels, and temporary tables. Temporary tables and table-valued parameters (and associated user-defined table types) represent an interesting case to be aware of. While the default collation of databases in SQL Azure is the same as collation used for character data columns in temporary tables or user-defined table types, you can get a mismatch in collations and collation conflicts when you use column-level collations in your database. Also, the database_default option for setting a column-level collation will always map to SQL_Latin1_General_CP1_CI_AS.
If you need to create temporary tables (or table types) that will be used in conjunction with tables that have columns that use non-default collation you should create columns in temporary tables with matching collation. For example:
CREATE TABLE #TestTempTable ( id int IDENTITY(1,1) PRIMARY KEY, c1 nvarchar(20), c2 nvarchar(20) COLLATE database_default, c3 nvarchar(20) COLLATE Japanese_CI_AS, )
Note that columns c1 and c2 will have the same collation in the example above.
The following queries will return a collation conflict error:
SELECT * FROM t INNER JOIN #TestTempTable on t.c2 = #TestTempTable.c1 SELECT * FROM t INNER JOIN #TestTempTable on t.c2 = #TestTempTable.c2
While these two will work:
SELECT * FROM t INNER JOIN #TestTempTable on t.c2 = #TestTempTable.c3 SELECT * FROM t INNER JOIN #TestTempTable on t.c2 = #TestTempTable.c2 COLLATE Japanese_CI_AS
The above correction in the Transact-SQL is an example of a declarative collation in the expression, that is executed when the statement is executed and doesn’t affect the data stored in the table.
It is also a good practice to ensure that Unicode strings are preceded with a capital letter N that stands for National Language in the SQL-92 standard, and must be uppercase. If the Unicode string is not preceded by the N-prefix the code page that corresponds to the current database collation will be used to convert characters if N-prefix which in some cases could cause data corruption.
In the example below first entry of Hiragana Letter E will be converted correctly, whereas the second entry will cause data corruption. This example also illustrates that column collation is not used for conversion.
INSERT INTO t VALUES(N'え', N'え'), ('え', 'え') SELECT * FROM t
Do you have questions, concerns, comments? Post them below and we will try to address them.