Latin small letter u with diaeresis

  • Hi,

    I have a source table which is under SQL 2014 and I am doing ETL on it (no transformation logic - just one to one mapping). And this gets loaded to again on a SQL 2014.

    Both source and target columns are varchar type.

    However, after loading, when i query the table, using SSMS, I see ASCII code 252, which is Latin small letter u with diaeresis.

    Do I need to change any visual settings under SSMS?

    thanks

  • What exactly is your question here? Are you saying that the character, ü, is not the character it should be? What character should it be?

    Thom~

    Excuse my typos and sometimes awful grammar. My fingers work faster than my brain does.
    Larnu.uk

  • Hi,

    I am seeing that special character on my SSMS query, where as the actual data does not contain that character.

    Basically, data type is varchar for both source and destination (not nvarchar), so there is no possibility of extended ASCII 252 code being held...thus, i see it as an SSMS visual settings issue...do you agree?

    Thanks

  • Is the code page the same for source and destination? Is the ETL tool doing some kind of data conversion?

    Luis C.
    General Disclaimer:
    Are you seriously taking the advice and code from someone from the internet without testing it? Do you at least understand it? Or can it easily kill your server?

    How to post data/code on a forum to get the best help: Option 1 / Option 2
  • etl2016 (12/12/2016)


    Hi,

    I am seeing that special character on my SSMS query, where as the actual data does not contain that character.

    Basically, data type is varchar for both source and destination (not nvarchar), so there is no possibility of extended ASCII 252 code being held...thus, i see it as an SSMS visual settings issue...do you agree?

    Thanks

    What shows instead in the initial table? What ETL tool are you using, and how is it moving the data across?

    SSMS simply displays what's in the record, if it's showing a ü, there's a ü in there record. If it isn't in the source data, this implies your ETL process is doing it.

    Thom~

    Excuse my typos and sometimes awful grammar. My fingers work faster than my brain does.
    Larnu.uk

  • etl2016 (12/12/2016)


    Hi,

    I am seeing that special character on my SSMS query, where as the actual data does not contain that character.

    Basically, data type is varchar for both source and destination (not nvarchar), so there is no possibility of extended ASCII 252 code being held...thus, i see it as an SSMS visual settings issue...do you agree?

    Thanks

    1 byte is 8 bits, which allows 256 characters.

    Therefore, there is a perfect possibility to hold the character ASCII 252.

    the actual data does not contain that character

    Are you sure?

    Try convert(varbinary(max), SourceData) - see what you can find at the place where you're getting 'ü' at the destination.

    _____________
    Code for TallyGenerator

  • As Sergiy has already indicated, the assumption that it can't be in the source data because it's CHAR/VARCHAR is incorrect.

    That can be shown in multiple ways.

    To pick a couple at random, the return type of the CHAR() function is CHAR(1), and it can display ü with no issues (just pass 252 to the function); CASTing that character or a string containing it as CHAR/VARCHAR will also work.

    Cheers!

  • etl2016 (12/12/2016)


    Hi,

    I have a source table which is under SQL 2014 and I am doing ETL on it (no transformation logic - just one to one mapping). And this gets loaded to again on a SQL 2014.

    Both source and target columns are varchar type.

    However, after loading, when i query the table, using SSMS, I see ASCII code 252, which is Latin small letter u with diaeresis.

    Do I need to change any visual settings under SSMS?

    thanks

    Check if there's a collation difference in the tables, run this query for the source table and destination table:

    SELECT TABLE_SCHEMA, TABLE_NAME, COLUMN_NAME, COLLATION_NAME

    FROM INFORMATION_SCHEMA.COLUMNS

    WHERE TABLE_SCHEMA = 'YourSchema'

    AND TABLE_NAME = 'YourTable'

  • We found, much to our surprise and dismay, that starting with 2012 (I believe) that some previously unicode characters were accepted by non-unicode fields.

    This caused much head scratching and profanity when moving data to systems that weren't so tolerant.

  • JustMarie (12/14/2016)


    starting with 2012 (I believe) that some previously unicode characters were accepted by non-unicode fields.

    the original post said both the source table and destination table are on SQL Server 2014, so that may be a red herring, as both would have had that character right? If your SSIS package wasn't doing any transformations, then there's likely either a collation mismatch between the source table and the destination table, or the SSIS connections themselves had the wrong code page defined somewhere.

  • Hi,

    Latin1_General_CI_AS is the collation at both source and destination (both of which are SQL server databases)

    At source, the character is – (hyphen) and at destination, it is coming as û

    Between Source and Destination, there is SSIS ETL package, and it is doing a basic one to one mapping without any Transformations.

    Source, middle-man ETL, Destination all are Microsoft products, so, a good compatibility while data propagates.

    Under SSIS, the settings are Code page 1252, Data type string [DT_STR] Length 100 precision 0 scale 0

    Source and target both have data types as [varchar](100) NULL

    thanks

  • Check hex view of your inbound string.

    Chances are that hyphen is not actually a normal hyphen but in fact some UTF8 hyphen.

    _____________
    Code for TallyGenerator

  • Sergiy (12/15/2016)


    Check hex view of your inbound string.

    Chances are that hyphen is not actually a normal hyphen but in fact some UTF8 hyphen.

    Hi,

    true, actually as a next step, i was trying to explore options to see raw bytes, than rely on what is shown on screen.

    if it were unix kind of system I would use OD -c command (octal dump), could you please suggest how would I use hex view ? Can I do it from SSMS or do I need to do it from command line using sqlcmd etc?

    thanks for your suggestion....

  • okay, I think I got the difference when i did the hex dump using convert(varbinary(max), sourcecolumn)

    At source, the content looks like the character "En dash" with ASCII HEXcode of 96, which is DECIMAL 150 . Visually, on Screen using SSMS, it appears as hyphen.

    Whereas, at destination, it is appearing as Latin small letter u with circumflex, which is û, with ASCII HEX code of FB, which is DECIMAL 251.

    VARCHAR stores 1 Character per byte. So, maximum ASCII codes it can store are 256, so, both these characters can be comfortably stored without internal tampering, behind the scenes.

    Then, why is characters getting deformed?

    thanks

  • etl2016 (12/15/2016)


    okay, I think I got the difference when i did the hex dump using convert(varbinary(max), sourcecolumn)

    At source, the content looks like the character "En dash" with ASCII HEXcode of 96, which is DECIMAL 150 . Visually, on Screen using SSMS, it appears as hyphen.

    Whereas, at destination, it is appearing as Latin small letter u with circumflex, which is û, with ASCII HEX code of FB, which is DECIMAL 251.

    VARCHAR stores 1 Character per byte. So, maximum ASCII codes it can store are 256, so, both these characters can be comfortably stored without internal tampering, behind the scenes.

    Then, why is characters getting deformed?

    thanks

    What is the character right before the dash, and right before û?

    _____________
    Code for TallyGenerator

Viewing 15 posts - 1 through 15 (of 15 total)

You must be logged in to reply to this topic. Login to reply