Find special characters

  • Good morning.

    I would like to search in my BD, descriptions containing special characters such as Ç Ã Â ° '

    Even determining the demand for these returns is not the same characters, even though I know that the register has descriptions with these characters.

    I am using the command:

    SELECT Description, FROM TBProduct WHERE Description LIKE '%"º"

    The query returns nothing, but I know I have several old records in the register containing these characters.

    Any tips?

    Grateful

  • here's one way to do it;

    the typical chars you are looking for are highascii between 160 and 255;

    here i'm just using a row_number() function generate a table of numbers so i can get the chars to test for.

    just changed my limits for the MiniTally to match the ranges you want to test for.

    SELECT Description,

    CHAR(MiniTally.n)

    FROM TBProduct

    CROSS APPLY (SELECT TOP 255 ROW_NUMBER() OVER (ORDER BY (SELECT NULL)) AS n

    FROM sys.columns) MiniTally

    WHERE Description LIKE '%' + CHAR(MiniTally.n) + '%'

    AND MiniTally.n BETWEEN 126 AND 255

    Lowell


    --help us help you! If you post a question, make sure you include a CREATE TABLE... statement and INSERT INTO... statement into that table to give the volunteers here representative data. with your description of the problem, we can provide a tested, verifiable solution to your question! asking the question the right way gets you a tested answer the fastest way possible!

  • firstly that query looks incorrect as your only ever going to be looking at the last character in the string being a special character not the whole string

    SELECT Description, FROM TBProduct WHERE Description LIKE '%º%'

    The above will cause a scan of all the data which can be a performance hit if you have a big TBProducts table, so instead I would look at CHARINDEX and if it comes back with a result of > 0 then the string contains that character

    with cte as

    (

    select description, charindex('º', description, 1) as CharInd from TBProduct

    )

    select * from cte where CharInd > 0

  • Lowell (3/28/2012)


    here's one way to do it;

    the typical chars you are looking for are highascii between 160 and 255;

    here i'm just using a row_number() function generate a table of numbers so i can get the chars to test for.

    just changed my limits for the MiniTally to match the ranges you want to test for.

    SELECT Description

    FROM TBProduct

    CROSS APPLY (SELECT TOP 255 ROW_NUMBER() OVER (ORDER BY (SELECT NULL)) AS n

    FROM sys.columns) MiniTally

    WHERE Description LIKE '%' + CHAR(MiniTally.n) + '%'

    AND MiniTally.n BETWEEN 126 AND 255

    i like that a lot, grabs the whole special character set in one go,

  • Lowell (3/28/2012)


    here's one way to do it;

    the typical chars you are looking for are highascii between 160 and 255;

    here i'm just using a row_number() function generate a table of numbers so i can get the chars to test for.

    just changed my limits for the MiniTally to match the ranges you want to test for.

    SELECT Description,

    CHAR(MiniTally.n)

    FROM TBProduct

    CROSS APPLY (SELECT TOP 255 ROW_NUMBER() OVER (ORDER BY (SELECT NULL)) AS n

    FROM sys.columns) MiniTally

    WHERE Description LIKE '%' + CHAR(MiniTally.n) + '%'

    AND MiniTally.n BETWEEN 126 AND 255

    Great use of tally!

    Jared
    CE - Microsoft

  • I can not even find Nº20

    for example....

    Thanks

  • did you try my exmaple? it worked perfectly for me:

    /*

    DescriptionFoundChar

    Nº20º

    */

    WITH TBProduct (Description)

    AS

    (

    SELECT 'Nº20' UNION ALL

    SELECT 'No Special Characters'

    )

    SELECT Description,

    CHAR(MiniTally.n) AS FoundChar

    FROM TBProduct

    CROSS APPLY (SELECT TOP 255 ROW_NUMBER() OVER (ORDER BY (SELECT NULL)) AS n

    FROM sys.columns) MiniTally

    WHERE Description LIKE '%' + CHAR(MiniTally.n) + '%'

    AND MiniTally.n BETWEEN 126 AND 255

    Lowell


    --help us help you! If you post a question, make sure you include a CREATE TABLE... statement and INSERT INTO... statement into that table to give the volunteers here representative data. with your description of the problem, we can provide a tested, verifiable solution to your question! asking the question the right way gets you a tested answer the fastest way possible!

  • The result is the same??

    No Special CharactersŠ

    No Special Charactersš

    No Special Charactersª

    Nºº

    No Special Charactersº

    No Special CharactersÀ

    No Special CharactersÁ

    No Special CharactersÂ

    No Special CharactersÃ

    No Special CharactersÄ

    No Special CharactersÅ

    No Special CharactersÇ

    No Special CharactersÈ

    No Special CharactersÉ

    No Special CharactersÊ

    No Special CharactersË

    No Special CharactersÌ

    No Special CharactersÍ

    No Special CharactersÎ

    No Special CharactersÏ

    NºÑ

    No Special CharactersÑ

    NºÒ

    No Special CharactersÒ

    NºÓ

    No Special CharactersÓ

    NºÔ

    No Special CharactersÔ

    NºÕ

    No Special CharactersÕ

    NºÖ

    No Special CharactersÖ

    NºØ

    No Special CharactersØ

    No Special Charactersà

    No Special Charactersá

    No Special Charactersâ

    No Special Charactersã

    No Special Charactersä

    No Special Characterså

    No Special Charactersç

    No Special Charactersè

    No Special Charactersé

    No Special Charactersê

    No Special Charactersë

    No Special Charactersì

    No Special Charactersí

    No Special Charactersî

    No Special Charactersï

    Nºñ

    No Special Charactersñ

    Nºò

    No Special Charactersò

    Nºó

    No Special Charactersó

    Nºô

    No Special Charactersô

    Nºõ

    No Special Charactersõ

    Nºö

    No Special Charactersö

    Nºø

    No Special Charactersø

  • clearly you've modified what i posted in order to get the results you are showing. show us the code YOU created.

    we can find the issues there.

    Lowell


    --help us help you! If you post a question, make sure you include a CREATE TABLE... statement and INSERT INTO... statement into that table to give the volunteers here representative data. with your description of the problem, we can provide a tested, verifiable solution to your question! asking the question the right way gets you a tested answer the fastest way possible!

  • i found something weird. i have two databases with different collations. so if i do:

    WITH TBProduct (Description)

    AS

    (

    SELECT 'Nº20' UNION ALL

    SELECT 'No Special Characters' union all

    select 'Coedcae Comprehensive' union all

    select col1 collate SQL_Latin1_General_CP1_CI_AS from myTable

    )

    SELECT Description ,

    CHAR(MiniTally.n) AS FoundChar

    FROM TBProduct

    CROSS APPLY (SELECT TOP 255 ROW_NUMBER() OVER (ORDER BY (SELECT NULL)) AS n

    FROM sys.columns) MiniTally

    WHERE Description LIKE '%' + CHAR(MiniTally.n) + '%'

    AND MiniTally.n BETWEEN 126 AND 255

    it correctly only shows one record - Nº20

    if i do:

    WITH TBProduct (Description)

    AS

    (

    SELECT 'Nº20' UNION ALL

    SELECT 'No Special Characters' union all

    select 'Coedcae Comprehensive' union all

    select Col1 collate Latin1_General_CI_AS from myTable

    )

    SELECT Description ,

    CHAR(MiniTally.n) AS FoundChar

    FROM TBProduct

    CROSS APPLY (SELECT TOP 255 ROW_NUMBER() OVER (ORDER BY (SELECT NULL)) AS n

    FROM sys.columns) MiniTally

    WHERE Description LIKE '%' + CHAR(MiniTally.n) + '%'

    AND MiniTally.n BETWEEN 126 AND 255

    it brings back 14126 rows. (the amount of rows in my table are 38621) some example rows are:

    Coedcae Comprehensive , Œ

    Joel Robert , Œ

    Ethan , Þ

    Michael ,æ

  • ok i see that; case sensitive collations, and accent insensitive colaltions will see Ä as matching both 'a' and 'A' for example.

    i fiddled with it a little, not much difference...maybe switch to a binary collation is the final solution.

    WITH TBProduct (Description)

    AS

    (

    SELECT 'Nº20' UNION ALL

    SELECT 'No Special Characters' union all

    select 'Coedcae Comprehensive'

    --select Col1 from myTable

    )

    SELECT Description ,

    CHAR(MiniTally.n) AS FoundChar

    FROM TBProduct

    CROSS APPLY (SELECT TOP 255 ROW_NUMBER() OVER (ORDER BY (SELECT NULL)) AS n

    FROM sys.columns) MiniTally

    WHERE Description COLLATE Latin1_General_CS_AS

    LIKE '%' + CHAR(MiniTally.n) + '%' COLLATE Latin1_General_CS_AS

    AND MiniTally.n BETWEEN 126 AND 255

    Lowell


    --help us help you! If you post a question, make sure you include a CREATE TABLE... statement and INSERT INTO... statement into that table to give the volunteers here representative data. with your description of the problem, we can provide a tested, verifiable solution to your question! asking the question the right way gets you a tested answer the fastest way possible!

  • When I did:

    WITH TBProduct (Description)

    AS

    (

    SELECT 'Nº20' UNION ALL

    SELECT 'No Special Characters'

    )

    SELECT Description,

    CHAR(MiniTally.n) AS FoundChar

    FROM TBProduct

    CROSS APPLY (SELECT TOP 255 ROW_NUMBER() OVER (ORDER BY (SELECT NULL)) AS n

    FROM sys.columns) MiniTally

    WHERE Description LIKE '%' + CHAR(MiniTally.n) + '%'

    AND MiniTally.n BETWEEN 126 AND 255

    The result :

    No Special CharactersŠ

    No Special Charactersš

    No Special Charactersª

    Nºº

    No Special Charactersº

    No Special CharactersÀ

    No Special CharactersÁ

    No Special CharactersÂ

    No Special CharactersÃ

    No Special CharactersÄ

    No Special CharactersÅ

    No Special CharactersÇ

    No Special CharactersÈ

    No Special CharactersÉ

    No Special CharactersÊ

    No Special CharactersË

    No Special CharactersÌ

    No Special CharactersÍ

    No Special CharactersÎ

    No Special CharactersÏ

    NºÑ

    No Special CharactersÑ

    NºÒ

    No Special CharactersÒ

    NºÓ

    No Special CharactersÓ

    NºÔ

    No Special CharactersÔ

    NºÕ

    No Special CharactersÕ

    NºÖ

    No Special CharactersÖ

    NºØ

    And Wen I did:

    WITH TBProduct (Description)

    AS

    (

    SELECT 'Nº20' UNION ALL

    SELECT 'No Special Characters' union all

    select 'Coedcae Comprehensive' union all

    select Description collate SQL_Latin1_General_CP1_CI_AS from TBProduct

    )

    SELECT Description ,

    CHAR(MiniTally.n) AS FoundChar

    FROM TBProduct

    CROSS APPLY (SELECT TOP 255 ROW_NUMBER() OVER (ORDER BY (SELECT NULL)) AS n

    FROM sys.columns) MiniTally

    WHERE Description LIKE '%' + CHAR(MiniTally.n) + '%'

    AND MiniTally.n BETWEEN 126 AND 255

    The result :

    Msg 240, Level 16, State 1, Line 1

    Types don't match between the anchor and the recursive part in column "Description" of recursive query "TBProduct".

  • Following his guidance was the result:

    No Special CharactersŠ

    Coedcae ComprehensiveŠ

    Coedcae ComprehensiveŒ

    No Special Charactersš

    Coedcae Comprehensiveš

    Coedcae Comprehensiveœ

    No Special Charactersª

    Coedcae Comprehensiveª

    Nº20²

    Nº20º

    No Special Charactersº

    Coedcae Comprehensiveº

    No Special CharactersÀ

    Coedcae ComprehensiveÀ

    No Special CharactersÁ

    Coedcae ComprehensiveÁ

    No Special CharactersÂ

    Coedcae ComprehensiveÂ

    No Special CharactersÃ

    Coedcae ComprehensiveÃ

    No Special CharactersÄ

    Coedcae ComprehensiveÄ

    No Special CharactersÅ

    Coedcae ComprehensiveÅ

    Coedcae ComprehensiveÆ

    No Special CharactersÇ

    Coedcae ComprehensiveÇ

    No Special CharactersÈ

    Coedcae ComprehensiveÈ

    No Special CharactersÉ

    Coedcae ComprehensiveÉ

    No Special CharactersÊ

    Coedcae ComprehensiveÊ

    No Special CharactersË

    Coedcae ComprehensiveË

    No Special CharactersÌ

    Coedcae ComprehensiveÌ

    No Special CharactersÍ

    Coedcae ComprehensiveÍ

    No Special CharactersÎ

    Coedcae ComprehensiveÎ

    No Special CharactersÏ

    Coedcae ComprehensiveÏ

    Coedcae ComprehensiveÐ

    Nº20Ñ

    No Special CharactersÑ

    Coedcae ComprehensiveÑ

    Nº20Ò

    No Special CharactersÒ

    Coedcae ComprehensiveÒ

    Nº20Ó

    No Special CharactersÓ

    Coedcae ComprehensiveÓ

    Nº20Ô

    No Special CharactersÔ

    Coedcae ComprehensiveÔ

    Nº20Õ

    No Special CharactersÕ

    Coedcae ComprehensiveÕ

    Nº20Ö

    No Special CharactersÖ

    Coedcae ComprehensiveÖ

    Nº20Ø

    No Special CharactersØ

    Coedcae ComprehensiveØ

    No Special Charactersà

    Coedcae Comprehensiveà

    No Special Charactersá

    Coedcae Comprehensiveá

    No Special Charactersâ

    Coedcae Comprehensiveâ

    No Special Charactersã

    Coedcae Comprehensiveã

    No Special Charactersä

    Coedcae Comprehensiveä

    No Special Characterså

    Coedcae Comprehensiveå

    Coedcae Comprehensiveæ

    No Special Charactersç

    Coedcae Comprehensiveç

    No Special Charactersè

    Coedcae Comprehensiveè

    No Special Charactersé

    Coedcae Comprehensiveé

    No Special Charactersê

    Coedcae Comprehensiveê

    No Special Charactersë

    Coedcae Comprehensiveë

    No Special Charactersì

    Coedcae Comprehensiveì

    No Special Charactersí

    Coedcae Comprehensiveí

    No Special Charactersî

    Coedcae Comprehensiveî

    No Special Charactersï

    Coedcae Comprehensiveï

    Coedcae Comprehensiveð

    Nº20ñ

    No Special Charactersñ

    Coedcae Comprehensiveñ

    Nº20ò

    No Special Charactersò

    Coedcae Comprehensiveò

    Nº20ó

    No Special Charactersó

    Coedcae Comprehensiveó

    Nº20ô

    No Special Charactersô

    Coedcae Comprehensiveô

    Nº20õ

    No Special Charactersõ

    Coedcae Comprehensiveõ

    Nº20ö

    No Special Charactersö

    Coedcae Comprehensiveö

    Nº20ø

    No Special Charactersø

    Coedcae Comprehensiveø

  • alternatively one can use .NET Regex.Match Class method to create a SQL scalar-valued CLR function to find those junk chars

    in the range between \u0020 and \u007F [those one above ascii 127 i think]

    select 'TRUE' as myresult

    where dbo.fn_Is_RegexMatch(isnull('Nevacanezza123€Â¿å®ÆÐ',''), '[^\u0020-\u007F]') = 1

    returns: 'TRUE', bad char(s) found

    once found, another CLR function that utilizes Regex.Replace class method can help to clean all bad chars in the same region [between acsii 127 and 256 in this case]

    select dbo.fn_RegexReplace(isnull('Nevacanezza123€Â¿å®ÆÐ',''), '[^\u0020-\u007F]', '') as vcwebsiteurl

    returns the cleaned string: Nevacanezza123; all bad chars were replaced by an empty string.

    Sergel Z

  • I do have 1 question

    Exactly what is a special character to you? Some people might include a period or dollar sign as considered special. So need to be sure there first. Another option you could use is something like PATINDEX so you don't need another table to generate numbers for you value.

    WITH TBProduct (Description)

    AS

    (

    SELECT 'Nº20' UNION ALL

    SELECT null UNION ALL

    SELECT 'No Special Characters' union all

    select 'Coedcae Comprehensive' union all

    select Description collate SQL_Latin1_General_CP1_CI_AS from TBProduct

    )

    SELECT Description, substring(IsNull(Description,''),PATINDEX('%[^A-Za-z0-9 ]%',IsNull(Description,'')),1)

    FROM TBProduct

    WHERE PATINDEX('%[^A-Za-z0-9 ]%',IsNull(Description,'')) > 0

Viewing 15 posts - 1 through 15 (of 19 total)

You must be logged in to reply to this topic. Login to reply