Full-Text Search – Thesaurus Languages

  • (Bob Brown) (3/19/2013)


    Hugo Kornelis (3/19/2013)


    Can anyone fill me in on the missing details?

    I don't know if this is any help but it is what I used to answer this question:

    http://msdn.microsoft.com/en-us/library/ms142491.aspx

    http://msdn.microsoft.com/en-us/library/ms142491(v=sql.100).aspx

    Thanks, Bob.

    But I was specifically looking for how to find the three-letter language code for any give language. Those pages do not include that information (unless I overlooked it).


    Hugo Kornelis, SQL Server/Data Platform MVP (2006-2016)
    Visit my SQL Server blog: https://sqlserverfast.com/blog/
    SQL Server Execution Plan Reference: https://sqlserverfast.com/epr/

  • demonfox (3/19/2013)


    only English , when it comes to britain ...

    I'm looking forward to Tom's reply to that one 😉

  • +1 🙂 I recognized that I should be looking for enu... in my moment of mental distraction.

    good question though... it's one of those little things that isn't hard to miss.



    --Mark Tassin
    MCITP - SQL Server DBA
    Proud member of the Anti-RBAR alliance.
    For help with Performance click this link[/url]
    For tips on how to post your problems[/url]

  • (Bob Brown) (3/19/2013)


    Yay. Great question. Had to do a lot of research to get it right. Thanks.

    +1

    (50 minutes of study)

  • Hugo Kornelis (3/19/2013)


    (Bob Brown) (3/19/2013)


    Hugo Kornelis (3/19/2013)


    Can anyone fill me in on the missing details?

    I don't know if this is any help but it is what I used to answer this question:

    http://msdn.microsoft.com/en-us/library/ms142491.aspx

    http://msdn.microsoft.com/en-us/library/ms142491(v=sql.100).aspx

    Thanks, Bob.

    But I was specifically looking for how to find the three-letter language code for any give language. Those pages do not include that information (unless I overlooked it).

    I am not sure , if this is anywhere related to iso639-2 codes ..

    These are the references I could find ..

    http://www.loc.gov/standards/iso639-2/php/code_list.php

    here is a discussion reference and an included further references .. I think, this might the standard followed by ms in sql server.. but, then again , a guess 😉

    http://social.msdn.microsoft.com/Forums/en-US/wpf/thread/efa9b596-3bc4-4be7-aeeb-4d97ad31f1dd

    http://msdn.microsoft.com/en-us/library/system.globalization.cultureinfo.threeletterisolanguagename.aspx%5B/url%5D

    ~ demonfox
    ___________________________________________________________________
    Wondering what I would do next , when I am done with this one :ermm:

  • demonfox (3/19/2013)


    These are the references I could find ..

    http://www.loc.gov/standards/iso639-2/php/code_list.php

    here is a discussion reference and an included further references .. I think, this might the standard followed by ms in sql server.. but, then again , a guess 😉

    http://social.msdn.microsoft.com/Forums/en-US/wpf/thread/efa9b596-3bc4-4be7-aeeb-4d97ad31f1dd

    http://msdn.microsoft.com/en-us/library/system.globalization.cultureinfo.threeletterisolanguagename.aspx%5B/url%5D%5B/quote%5D

    Thanks for your digging, Demonfox! Much appreciated.

    The first reference is to a standards body. I would not automatically assume that Microsoft adheres to any standard they didn't invent themselves. 😉 And indeed - I checked the ENU code that is rlevant to this discussion, and it's not included in the list.

    The second link is a discussion on the non-standard nature of the three-letter codes used by MS, and the third reference lists a C# program one could use to output the list from Windows. The cropped output shown shows that, at least for American English, SQL Server does not use the code listed as "ISO", but does use the code listed as "WIN".

    I'm not sure if that means that I could run that program and use the entire list for my Thesaurus files, as I still have not seen a reference telling me that the three-letter code used by full-text search is always equal to that "WIN" code. Or that all languages in that output are supported by full-text search. Or that that list includes all supported languages. And even if that all would be the case, then I still maintain what I previously replied to Tom - this information should be included in Books Online, in a place that is easy to find, and in the form of a table listing all supported languages and the corresponding three-letter code. Not in the form of a program I'd have to copy, paste, compile and run first. In my opinion, Microsoft really dropped the ball here.


    Hugo Kornelis, SQL Server/Data Platform MVP (2006-2016)
    Visit my SQL Server blog: https://sqlserverfast.com/blog/
    SQL Server Execution Plan Reference: https://sqlserverfast.com/epr/

  • Hugo Kornelis (3/19/2013)


    L' Eomot Inversé (3/19/2013)


    Good question, but the definite cultural bias is perhaps unfortunate. I suppose it's fair enough, as the default installation will use LCID 1033, not 2057. But there may be some Brits around for whom teseng.xml is the right file and they wouldn't stand much chance of spotting the right answer, would they?

    Even for Brits, the tseng.xml file is NOT the right choice when "working with an American English SQL Server instance" (quote from question text; emphasis added by me). I guess you overlooked that part of the question?

    Yes, I should remember to read the question properly before commenting! I'm getting too careless these days.

    Tom

  • Toreador (3/19/2013)


    demonfox (3/19/2013)


    only English , when it comes to britain ...

    I'm looking forward to Tom's reply to that one 😉

    Well, just to keep Toreador happy I'll reply, although completely off topic here.

    There are at least four versions of spoken English in Britain: Scottish English, Welsh English, and two English Englishes: that awful baabaa they speak in SE England, and the English of the rest of England. If you want to count phrase like "all mang the cudders akay"(travellers language - maybe Rom) or "bickering brattle" (Scots -lallans/doric) as English - I don't, especially since "bickering" in that phrase means something completely different from what the English word with the same spelling means, but some do - then there are at least another two versions; and if you want to count minor dialectal variations like Geordie English and Brumagen English, ie versions with lots of pronunciation variation but only trivial grammar variation, as well as variants with seriously different grammar and vocabulary (I don't, it would be pointless - as silly as in the USA counting Boston English as different from Cambridge English would be) there are hundreds.

    But even though there are at least four versions, those four versions have a lot in common, especially in written form: while one version uses "I am after going" and another uses "I am gone" and yet another uses " I have gone" everyone understands all those variants, so in that sense there is a single British English that is a union of those versions. Unless of course you count things like the two non-English example I gave above as English - if you did that you would have to accept that there are three or more mutually incomprehensible English languages in Britain.

    I suspect someone from SE England would take exception to the lower case "b" in demofox's "britain". I'm perfectly happy with lower case for the first letters of country names and language names. I usually use upper case for them when writing English because so many Englsh speakers take exception to lower case and always when writing German because all nouns get initial capitals in German, but usually stick to lower case for them except at the beginning of a sentence when writing in other languages, especially in languages like Spanish, Scots Gaelic, and Irish where capitalising language names is formally incorrect. I even use lower case in english when the capital slips my mind or I'm bent on teasing na sasunnaich.

    Tom

  • L' Eomot Inversé (3/19/2013)


    Toreador (3/19/2013)


    demonfox (3/19/2013)


    only English , when it comes to britain ...

    I'm looking forward to Tom's reply to that one 😉

    Well, just to keep Toreador happy I'll reply, although completely off topic here.

    There are at least four versions of spoken English in Britain: Scottish English, Welsh English, and two English Englishes: that awful baabaa they speak in SE England, and the English of the rest of England. If you want to count phrase like "all mang the cudders akay"(travellers language - maybe Rom) or "bickering brattle" (Scots -lallans/doric) as English - I don't, especially since "bickering" in that phrase means something completely different from what the English word with the same spelling means, but some do - then there are at least another two versions; and if you want to count minor dialectal variations like Geordie English and Brumagen English, ie versions with lots of pronunciation variation but only trivial grammar variation, as well as variants with seriously different grammar and vocabulary (I don't, it would be pointless - as silly as in the USA counting Boston English as different from Cambridge English would be) there are hundreds.

    But even though there are at least four versions, those four versions have a lot in common, especially in written form: while one version uses "I am after going" and another uses "I am gone" and yet another uses " I have gone" everyone understands all those variants, so in that sense there is a single British English that is a union of those versions. Unless of course you count things like the two non-English example I gave above as English - if you did that you would have to accept that there are three or more mutually incomprehensible English languages in Britain.

    I suspect someone from SE England would take exception to the lower case "b" in demofox's "britain". I'm perfectly happy with lower case for the first letters of country names and language names. I usually use upper case for them when writing English because so many Englsh speakers take exception to lower case and always when writing German because all nouns get initial capitals in German, but usually stick to lower case for them except at the beginning of a sentence when writing in other languages, especially in languages like Spanish, Scots Gaelic, and Irish where capitalising language names is formally incorrect. I even use lower case in english when the capital slips my mind or I'm bent on teasing na sasunnaich.

    now , that's something 😀 something as a wholesome picture of english in Britain 🙂 may be more is there ; makes me curious to dig into it ..

    and, as for the "britain" and the first letter caps , it is laziness to press SHIFT .. 😉

    Edit 🙁 Now a days , I am typing something else than what I think I am typing .. Missing a word completely .. ) English

    ~ demonfox
    ___________________________________________________________________
    Wondering what I would do next , when I am done with this one :ermm:

  • Hugo Kornelis (3/19/2013)


    demonfox (3/19/2013)


    These are the references I could find ..

    http://www.loc.gov/standards/iso639-2/php/code_list.php

    here is a discussion reference and an included further references .. I think, this might the standard followed by ms in sql server.. but, then again , a guess 😉

    http://social.msdn.microsoft.com/Forums/en-US/wpf/thread/efa9b596-3bc4-4be7-aeeb-4d97ad31f1dd

    http://msdn.microsoft.com/en-us/library/system.globalization.cultureinfo.threeletterisolanguagename.aspx%5B/url%5D%5B/quote%5D

    Thanks for your digging, Demonfox! Much appreciated.

    The first reference is to a standards body. I would not automatically assume that Microsoft adheres to any standard they didn't invent themselves. 😉 And indeed - I checked the ENU code that is rlevant to this discussion, and it's not included in the list.

    The second link is a discussion on the non-standard nature of the three-letter codes used by MS, and the third reference lists a C# program one could use to output the list from Windows. The cropped output shown shows that, at least for American English, SQL Server does not use the code listed as "ISO", but does use the code listed as "WIN".

    I'm not sure if that means that I could run that program and use the entire list for my Thesaurus files, as I still have not seen a reference telling me that the three-letter code used by full-text search is always equal to that "WIN" code. Or that all languages in that output are supported by full-text search. Or that that list includes all supported languages. And even if that all would be the case, then I still maintain what I previously replied to Tom - this information should be included in Books Online, in a place that is easy to find, and in the form of a table listing all supported languages and the corresponding three-letter code. Not in the form of a program I'd have to copy, paste, compile and run first. In my opinion, Microsoft really dropped the ball here.

    yes, that's true . well, I think, since I couldn't find any reference then I will have to agree with you.

    Moreover , did you check the link provided by steve in the explanation ;

    http://msdn.microsoft.com/en-us/library/39cwe7zf(v=vs.110).aspx

    http://msdn.microsoft.com/en-us/library/39cwe7zf(v=vs.100).aspx

    If you switch between versions ; then you could see the mention of three letter languages . I am not sure why it's not carried on in the 2012 documentations , but does give a hint about ENU and ENG .

    ~ demonfox
    ___________________________________________________________________
    Wondering what I would do next , when I am done with this one :ermm:

  • demonfox (3/19/2013)


    I am not sure , if this is anywhere related to iso639-2 codes ..

    These are the references I could find ..

    http://www.loc.gov/standards/iso639-2/php/code_list.php

    It seems to use ISO-639-2 some of the time, but not always: for example bgr, chs, cht, and enu are not in ISO-639-2 but are 3 letter language codes used by MS.

    here is a discussion reference and an included further references .. I think, this might the standard followed by ms in sql server.. but, then again , a guess 😉

    http://social.msdn.microsoft.com/Forums/en-US/wpf/thread/efa9b596-3bc4-4be7-aeeb-4d97ad31f1dd

    http://msdn.microsoft.com/en-us/library/system.globalization.cultureinfo.threeletterisolanguagename.aspx%5B/url%5D%5B/quote%5D

    Those look useful - presumably the "ThreeLetterISOLanguageName" property in the CultureInfo object is not actually what it is called but what MS uses (which is sometimes but not always a three letter ISO language code).

    Isn't it wonderful that you have to grub about either in the registry or in .NET objects to discover information that ought to be properly documented? And that for all we know grubbing about in the two places may deliver different answers? And that even the number of SQL-Sever supported languages (documented clearly as 33 in BoL) is perhaps 40 or 41 or 44 or 48 depending on which web page one looks at and whether one believes the directry entries installed with SQL Server instead of BoL or some other MSDN web page?

    Tom

  • presumably the "ThreeLetterISOLanguageName" property in the CultureInfo object is not actually what it is called but what MS uses (which is sometimes but not always a three letter ISO language code).

    Isn't it wonderful that you have to grub about either in the registry or in .NET objects to discover information that ought to be properly documented?

    😛

    If we combine the link with this one

    http://msdn.microsoft.com/en-us/library/39cwe7zf(v=vs.100).aspx

    we might get it all :w00t:

    but , that might be a writ to grit.

    so +1 for the proper documentation Questionmark .

    ~ demonfox
    ___________________________________________________________________
    Wondering what I would do next , when I am done with this one :ermm:

  • Nice question, thanks.

    Need an answer? No, you need a question
    My blog at https://sqlkover.com.
    MCSE Business Intelligence - Microsoft Data Platform MVP

  • Nice question - based on painful experience Steve? 😎

  • david.wright-948385 (3/20/2013)


    Nice question - based on painful experience Steve? 😎

    Yes. I was working with this for a talk and kept editing what I thought was the English file. Eventually I researched and realized I was editing the wrong file.

Viewing 15 posts - 16 through 30 (of 38 total)

You must be logged in to reply to this topic. Login to reply