Latin small letter u with diaeresis

  • etl2016 (12/15/2016)


    okay, I think I got the difference when i did the hex dump using convert(varbinary(max), sourcecolumn)

    At source, the content looks like the character "En dash" with ASCII HEXcode of 96, which is DECIMAL 150 . Visually, on Screen using SSMS, it appears as hyphen.

    Whereas, at destination, it is appearing as Latin small letter u with circumflex, which is û, with ASCII HEX code of FB, which is DECIMAL 251.

    VARCHAR stores 1 Character per byte. So, maximum ASCII codes it can store are 256, so, both these characters can be comfortably stored without internal tampering, behind the scenes.

    Then, why is characters getting deformed?

    thanks

    ....as a next step, I found a strange thing......do this and see yourselves....

    1) open a notepad, set the font to Lucinda Console

    2) copy paste from a reliable ASCII chart both En Dash and Hyphen as follows (don't try to key-in yourself)

    try - me (hyphen)

    try – me (En dash)

    3) Change the font to "terminal" and you will see that hyphen remains the same , whereas En Dash changes to û

    try - me (hyphen)

    try û me (En dash)

    ........ so it boils down to the user of the database keying in an "En Dash" (which means he has actually copy pasted it from somewhere, because, keying in En Dash is not as easily known as a hyphen) and that goes to SSIS's Code page of 1252 and THIS IS WHERE THE TAMPERING HAPPENS to û.

    To prove this point, I did an "Export Data" from my SSMS from Source DB to Target DB, withOUT involvement of SSIS. And i see En Dash appearing as-is at my destination, without getting converted to û ....

    thanks.

Viewing post 16 (of 15 total)

You must be logged in to reply to this topic. Login to reply