ADO.NET, SqlConnection and codepage conversions

Question

Post reply

ADO.NET, SqlConnection and codepage conversions

Carl B.

SSCertifiable

Points: 7931
More actions
March 17, 2004 at 1:51 pm

#84722

Hello everyone,
We have made some tests with ADO.NET connecting on SQL Server 2000.
We saw that a string read from a SqlDataReader from a server with the codepage 850 is automatically converted.
Just to give you an example:
The caracter + (195 in the codepage 850) is converted to the caracter + (43 in the codepage 1252).
And when there is no possible conversion it put an '?'.
Do we have any control over this behavior?
Where this conversion happens?
Thank's
Carl

Viewing 12 posts - 1 through 11 (of 11 total)

You must be logged in to reply to this topic. Login to reply

Peter Evans Ten Centuries Points: 1187 More actions · Answer 1

You don't mention setting the CultureInfo member of the DataSet!

Try setting it to System.Globalization.CultureInfo.InvariantCulture

EG

public DataTable MakeGoodTable()

{

DataTable goodTable = new DataTable("Customers");

// Satisfies FxCop rule: LocaleShouldBeSetForDataTypes.

goodTable.Locale = CultureInfo.InvariantCulture;

DataColumn keyColumn = goodTable.Columns.Add("ID", typeof(Int32));

keyColumn.AllowDBNull = false;

keyColumn.Unique = true;

goodTable.Columns.Add("LastName", typeof(String));

goodTable.Columns.Add("FirstName", typeof(String));

return goodTable;

}

See this GOTDOTNET Link below referring to one of the main rules for globalization with .NET Framework. I think the dataset defaults to the CurrentCulture which is most likely 1252 in your case and that is why the iso850 sql data is getting automatically convert when retrieved from the server.

http://www.gotdotnet.com/team/fxcop/docs/Rules/GDNRuleIndexStyle.aspx?url=Globalization/LocaleShouldBeSet.html

Peter Evans (__Peter Peter_)

Carl B. SSCertifiable Points: 7931 More actions · Answer 2

Thank's Peter for your input.

But as I can see the InvariantCulture is related to english (codepage 850 or 437 ??? not sure, not clear) and this is not what we need.

We need no conversion at all no matter we are reading from a 850 server, a 1252 or a 437 server. Its Beacause we are doing conversion on our own.

Maybe we could set the System.Globalization.CultureInfo consequently for the value of the codepage of the server we are connected to.

Best regards,

Carl

Peter Evans Ten Centuries Points: 1187 More actions · Answer 3

Carl,

My understanding of this issue is that InvariantCulture is used to keep the data neutral internally until you decide to put it into other .NET Strings or Winform string properties at which time you'll still have to process fields of type (Text Char Varchar) with either the System.Text classes or the implicit ToString conversions depending upon how the SQLString.CultureInfo compares with Thread.CurrentThread.CurrentCulture or Thread.CurrentThread.CurrentUICulture depending on your task serializing data or displaying it. So your reader code will need to deal with these situations, becuase every SQLDataSet rowset could possibly have mixed encodings on a column by column basis.

Peter Evans (__PETER Peter_)

Carl B. SSCertifiable Points: 7931 More actions · Answer 4

Peter,

I'll try to be clear...

We use ADO.NET only in our own database acces layer, this database access layer will be in managed c++ but the applications that uses this database acces are written in c++.

So in our database acces layer (lets call it DAL), we do not display data to the end user at all we are just feeding "records" in memory that are used by the applications.

If I use InvariantCulture will I be able to read from a serveur with codepage 1252 the same (I mean without any conversion) as I read from a 850 codepage server? What is the codepage associate to the InvariantCulture?

Thank's

Carl

Carl B. SSCertifiable Points: 7931 More actions · Answer 5

Pêter,

I just want to add that we don't use DataSets, we just use SQLDataReader.

Regards,

Carl

Peter Evans Ten Centuries Points: 1187 More actions · Answer 6

Carl,

Heres is I believe the two essential points from msdn that apply to your goal as outlined from the previous two posts.

[snip from msdn]

The cultures are generally grouped into three sets: the invariant culture,

the neutral cultures, and the specific cultures.

1) The invariant culture is culture-insensitive.

You can specify the invariant culture by name

using an empty string ("") or by its culture identifier 0x007F.

InvariantCulture retrieves an instance of the invariant culture.

It is associated with the English language but

not with any country/region. It can be used in almost

any method in the Globalization namespace that requires a culture.

If a security decision depends on a string comparison or

a case-change operation, use the InvariantCulture to ensure

that the behavior will be consistent regardless of

the culture settings of the system. However, the invariant culture

must be used only by processes that require culture-independent results, such as system services; otherwise, it produces results that might be linguistically incorrect or culturally inappropriate.

[end snip msdn]

So for in your managed code use InvariantCulture internally unless your service needs to interact with the user in some way. Otherwise you need to worry about the cultureinfo resources being available on the system and deal with the resource fallback issues just like any other assembly that needs to interact with the user when an error or exceptional case occurs and it needs to need be logged or shown to the user.

[snip msdn]

For more information on the resource fallback process, see Resource Fallback Process.

The list of cultures in the Windows API is slightly different from the

list of cultures in the .NET Framework. For example, the neutral culture

zh-CHT "Chinese (Traditional)" with culture identifier 0x7C04 is not available in the Windows API.

If interoperability with Windows is required (for example, through the p/invoke mechanism), use a specific culture that is defined in the .NET Framework. This will ensure consistency with the equivalent Windows locale, 2) which is identified with the same LCID.

[end snip from msdn]

So all your SQLString handling will need to use the SQLString.LCID when marshalling the data to your legacy code. Note your code is playing middle man between the provider and the legacy code so it will have to trust that the .NET SQLProvider is encoding the SQLstrings with respect to the declaration of the collation at the SQL Server.

HTH,

Peter Evans (__PETER Peter_)

Carl B. SSCertifiable Points: 7931 More actions · Answer 7

Peter,

I have read those topics from MSDN yesterday, thank you nevertheless.

What I can see is that I can affect System.Globalization.CultureInfo.InvariantCulture to the Locale member of a Dataset. But we don't use Dataset.

There is no Locale member for SqlDataReader and we must ensure that the sqlDataReader does not convert anything from the server.

Best regards,

Carl

Carl B. SSCertifiable Points: 7931 More actions · Answer 8

Peter,

Since now what I said is about what we are reading from the server but we have the same requirements for the strings we are writing to the server: we dont want them to be converted.

All those requirements comes from the need to rewrite an old database acces layer that never let our old API (DB-LIBRARY) makes any conversion at all. They decided, at that time, that they would convert everything from themself in the database acces layer. Bad idea but maybe at this time (1992) they didn't have the choice.

Best regards,

Carl

Peter Evans Ten Centuries Points: 1187 More actions · Answer 9

Carl,

So there's several transcode points involved here. SQL Server Instances code pages constraints which are limited to what ever the underlying os supports and its configured value.

The SQL collations attribute overriding the default instance behavior at either the database or column level. Cross database problems because of differing collations most notably because of tempdb which defaults to that of model.

Also, the locale of the database determines the Parameters for stored procedures or functions, user-defined data types, and variables are assigned the default collation of the database

The limits of the client OS to translate codepages, because they are unavailable for conversion. eg. 98 and NT 4.

Let's assume that ADO.NET SqlDataProvider gets us server level data, I've seen that already by using the SqlDataReader.GetSchemaTable and using it to find non unicode sqlstring and then examine the LCID on each of these values and they match the servers. And can be intercepted using SqlString.GetNonUnicodeBytes.

What you probably need to do is determine the range of code pages that your legacy code expects and limit the Sql Server collation types to just one of those at a time when forwarding this building the database, after all the expectation was that db-lib was feeding it a single code page based set of server data at a time right?

You'll probably also want to constrain it to limitations similar to that of the MSUnicode Layer for 98. Any unicode >= 2.0 spec is probably out because your legacy code may not handle the modern surrogates or some of the character codes added since the code was written. I'm a little vague on the the timeline of MBCS and unicode 1/2 here so figuring out what the old code can do via code inspection or testing might be in order.

Your right, the harder problem you got is being able to query the old layer for its inbound LCID (locale/ code page) of the commandtext its sending you and encoding that into a UNICODE query string such that any of the instance, database, column, parameter or temp db level conflict from above will avoid converting the value on evaluation.

That means knowing something about the command parameters and locations of these string literals in the commandtext and for that matter information about the variables and functions used in the command text to generate other char values. And finally any potential conflict with the final column still requiring conversion into a collation that doesn't match the encoded input. Mapping encodings aint easy its work and its just down right hard.

Then of course you need to make all this stuff work with the P/Invoke stuff and your managed types acting as intermediaries.

Well take it easy and good luck.

Peter Evans (__PETER Peter_)

Peter Evans Ten Centuries Points: 1187 More actions · Answer 10

Carl,

Here's a good article demonstrating all the issues around what I think you've stated to be your problem.

Peter Evans (__PETER Peter_)

http://msdn.microsoft.com/library/default.asp?url=/library/en-us/dnsql2k/html/intlfeaturesinsqlserver2000.asp

Carl B. SSCertifiable Points: 7931 More actions · Answer 11

Peter,

Thank's for all your research. It is very appreciated.

For now we will make some tests with the Getchars method of the SqlDatReader that is not supposed to make conversions.

Best regards,

Carl