Case Sensitivity and Collations

  • Hi

    Hope someone can shed some light on this for me...

    I have a Sql 2000 installation that a 3rd party vendor requires to be case sensitive.

    Innitially when i installed the server i specified case sensitive sort order (I believe)

    When they arrived to perform their tasks i ended up re-installing the machine useing the binary sort order.

    The entire server is case sensitive.

    I'm sure there is a way to allow selected database/objects to be case sensitive.

    The reason i am making this post is becasue i'm a bit confused between the character sets/sort order collations and locales... how they affect each other etc...

    Although i have worked with Sql for quite a while, this is the first time i have had to deal with case sensitivity.

    I have some procedures that collect information from all sql servers and this server is a problem everytime due to this setup method. (Not consistent with the rest of the machines in our enterprise)

    Thanks in advance ....

    Tony

  • Search the index in Books Online (BOL) for "collations" and you will find a lot of info.

    Collations define "rules" for what type of characters you can store in your database and how they are sorted against eachother.

    For example if you live in Sweden you probably want to use the collation named Finnish_Swedish so that when you do a SELECT with an ORDER BY the swedish characters ÅÄÖ will be in the correct order (last after XYZ).

    If you instead use the default SQL Server collation Latin1_General the swedish characters ÅÄÖ wouldn't end up last as swedes are used too but instead among A and O.

    Sort order is just one thing you specify when you choose a collation. Once you selected the langauge specific collation (language specific in the sense that all the local characters for this country are represented in the collation and they are sorted against eachother correctly) that you want, you can then specify if it should be case sensitive or case insensitive. Other things like Accent and Kana can also be specified but skip those for now.

    In SQL 2000 the default collation is Latin1_General_CI_AS. This practically means that you support the english language (and other languages using same alphabet) and that small a is equal to capital A (CI = Case Insensitive). If you would do a ORDER BY the result would be like

    a

    A

    A

    a

    B

    b

    B

    b

    If you on the other hand choose a case sensitive collation like Latin1_General_CS_AS (CS = Case Sensitive) it would look like

    a

    a

    A

    A

    b

    b

    B

    B

    The above collations where you can select case sensitivity etc. are called a dictonary sort. You can also select what is called a binary sort. Binary sort will not sort your data according to a countries alphabet but instead according to each characters numerical representation on the ASCII chart.

    If you would choose a binary sort the ORDER BY looks like

    A

    A

    B

    B

    a

    a

    b

    b

    since captial letters have lower ASCII values they come first. With SQL 6.5 you could probably gain some performance by using a binary sort but on SQL 7 and 2000 the difference isnt that big. My personal recommendation is not to use a binary sort since it only results in a lot of frustration.

    It's not just the sort order of characters that is effected by a case sensitive collation. Compare operations etc are also effected. With a case insensitive collation Abc = ABC would be true, but it would be false on a case sensitive (or binary). Only exceptions to this is if you use Unicode data but read more about that in BOL.

    One more thing to think (geez starting to be a long reply now 🙂 ) is that on SQL 2000 you can select collation on:

    1. Server Level

    2. Database Level

    3. Column level

    You would think that the server level collation doesnt matter since you can select a database level collation. Wrong. The server level collation will decide the collation for the system databases (master, model, tempdb, msdb and distribution). This means that if I have a user database with Finnish_Swedish_CS_AS collation on a SQL 2000 server with Latin1_General_CI_AS as default, I will run into trouble if I for example in a stored procedure create a #temp table and try to do string comparison. This since the #temp table will have the server specific collation. You would most likely run into collation errors here unless you specificly specify what collation your #temp table should use when you create it. This is where most 3rd party apps originally written for SQL 7.0 will run into trouble when running them on SQL 2000.

    Anyway, enough rambling from me. In your specific case I would probably skip the binary sort collation and instead choose a case sensitive dictonary sort like Latin1_General_CS_AS on the server level.

    Then set all user databases to use the same collation.

    If the 3rd party vendor has requirements on langage specific sort (say swedish) I would choose the case sensitive collation for that language, i.e Finnish_Swedish_CS_AS

    /Argyle

    Edited by - argyle on 03/23/2002 04:06:06 AM

  • Hi

    THanks for your reply. Trust me, it is appreciated. You have confirmed some things and shed some light on others...

    THanks again

    Tony

Viewing 3 posts - 1 through 2 (of 2 total)

You must be logged in to reply to this topic. Login to reply