Removing duplicate character from string

  • Hi All,

    I need a neat way please guys to write a function to remove duplicated characters only if they come in sequence.

    Examples

    darrk should return dark

    eeagle should return eagle

    redd should return red

    corner should corner as it is as the r's are not in sequence.

    Thanks in advance.

    Nader

  • I have written the following code but it has a syntax error that i am not able to fix, not sure if case clause is supported inside CTE or what.

    declare @Name as varchar(50) ='parrdetam';

    declare @result as varchar(50) ='';

    WITH

    N(N) AS (

    SELECT 1 FROM (VALUES (1),(1),(1),(1),(1),(1),(1),(1)) N (N)

    ) -- 8 rows

    , TALLY (N) AS (

    SELECT ROW_NUMBER() OVER (ORDER BY (SELECT NULL)) FROM N N1, N N2

    )

    ,

    NewWord(L) AS (

    SELECT @result=case when SUBSTRING(@Name, T.N, 1) = SUBSTRING(@Name, T.N-1, 1)

    then

    'rr'

    else

    @Name

    end

    FROM TALLY T

    WHERE SUBSTRING(@Name, T.N, 1) LIKE '[a-zA-Z]' COLLATE Latin1_General_BIN

    )

    select * from NewWord

  • Quick solution

    😎

    USE tempdb;

    GO

    SET NOCOUNT ON;

    DECLARE @INPUT_STR VARCHAR(500) = 'WWWWWWHY DDDDDDDDDDDOOOOOOOOO YYYYYYYYOOOOOOOOOOOOOOOOOUUUUUUUUUU WWWWWWWWWWANNNNNNNNNNNNNTTTTTTT TOOOOOOOOO DOOOOOOOO TTTTTTTTHIIIIIIIIIIISSSSSSS??????????';

    ;WITH T(N) AS (SELECT N FROM (VALUES (NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL)) AS X(N))

    , NUMS(N) AS (SELECT TOP(LEN(@INPUT_STR)) ROW_NUMBER() OVER (ORDER BY (SELECT NULL)) AS N FROM T T1,T T2,T T3)

    ,CHAR_LIST AS

    (

    SELECT

    NM.N

    ,SUBSTRING(@INPUT_STR,NM.N,1) AS TCHAR

    FROM NUMS NM

    )

    ,COMPARE_CHARS AS

    (

    SELECT

    CL.TCHAR

    ,LAG(CL.TCHAR,1,CHAR(0)) OVER

    (

    ORDER BY CL.N

    ) AS PREV_CHAR

    FROM CHAR_LIST CL

    )

    SELECT

    (SELECT

    CC.TCHAR

    FROM COMPARE_CHARS CC

    WHERE CC.TCHAR <> CC.PREV_CHAR

    FOR XML PATH(''), TYPE).value('.[1]','VARCHAR(500)') AS OUT_STR

    ;

    Results

    OUT_STR

    ----------------------------

    WHY DO YOU WANT TO DO THIS?

  • nadersam (10/18/2015)


    Hi All,

    I need a neat way please guys to write a function to remove duplicated characters only if they come in sequence.

    Examples

    darrk should return dark

    eeagle should return eagle

    redd should return red

    corner should corner as it is as the r's are not in sequence.

    Thanks in advance.

    Nader

    What will you want to do about legal words such as "BOOK", "SEEN", or "ABBOTT"?

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.
    "Change is inevitable... change for the better is not".

    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)
    Intro to Tally Tables and Functions

  • WHY DO YOU WANT TO DO THIS?

    [/code][/quote]

    First i would like to thank you for your reply, first time i hear of this LAG function 🙂

    This function is needed to cross check a new brand name against existing ones.

    There are some rules for this new name to pass , one of them is the number of different characters in it must be more than a certain number.

    Some companies add repetitive characters in sequence to pas this check :-), if repetitive characters come in sequence they will not differ significantly in pronunciation which is the main point so consumer is not confused between them.

  • What will you want to do about legal words such as "BOOK", "SEEN", or "ABBOTT"?[/quote]

    Even in those legal names repetition will be removed because what matters more is how is they are pronounced.

    There are some rules for this new name to pass , one of them is the number of different characters in it must be more than a certain number.

    Some companies add repetitive characters in sequence to pas this check :-), if repetitive characters come in sequence they will not differ significantly in pronunciation which is the main point so consumer is not confused between them.

  • Maybe soundex() would be a valid approach if it is the pronunciation that matters.

    'bok', 'book' and 'booookkk'' all produces a value of B200 in a quick test.

  • Bob JH Cullen (10/19/2015)


    Maybe soundex() would be a valid approach if it is the pronunciation that matters.

    'bok', 'book' and 'booookkk'' all produces a value of B200 in a quick test.

    I have added soundex as well to my list of conditions but sometimes it gets totally irrelevant words with regards to pronunciation , that's why i had to add other conditions as well.

  • Unfortunately the lag function is not supported on the servers on site, they have SQL Server 2008.

    Any other option please?.

  • nadersam (10/19/2015)


    Unfortunately the lag function is not supported on the servers on site, they have SQL Server 2008.

    Any other option please?.

    Self-join the character list cte is another option

    😎

  • Eirikur Eiriksson (10/19/2015)


    nadersam (10/19/2015)


    Unfortunately the lag function is not supported on the servers on site, they have SQL Server 2008.

    Any other option please?.

    Self-join the character list cte is another option

    😎

    Thank you very much Eirikur Eiriksson.

    I really appreciate if you could write the syntax, i am trying to grab the tally tables and cte and that will take some time and i need to quickly finish that.

    Once again thanks for your help.

  • Eirikur Eiriksson (10/19/2015)


    nadersam (10/19/2015)


    Unfortunately the lag function is not supported on the servers on site, they have SQL Server 2008.

    Any other option please?.

    Self-join the character list cte is another option

    😎

    I have written it like what follows but not getting correct result, not sure exactly what's wrong.

    Declare @InputValue as varchar(50)='ttttest';

    WITH T(N) AS

    (SELECT N FROM (VALUES (NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL)) AS X(N))

    , NUMS(N) AS (SELECT TOP(LEN(@InputValue)) ROW_NUMBER() OVER (ORDER BY (SELECT NULL)) AS N FROM T T1,T T2,T T3)

    ,CHAR_LIST AS

    (

    SELECT

    NM.N

    ,SUBSTRING(@InputValue,NM.N,1) AS TCHAR

    FROM NUMS NM

    )

    ,COMPARE_CHARS AS

    (

    select distinct t1.PREV_CHAR, t2.tchar

    from

    (

    select min(n) mnval, tchar as PREV_CHAR

    from CHAR_LIST

    group by tchar

    ) t1

    inner join CHAR_LIST t2

    on t1.PREV_CHAR = t2.tchar

    )

    SELECT

    (SELECT

    CC.TCHAR

    FROM COMPARE_CHARS CC

    WHERE CC.TCHAR <> CC.PREV_CHAR

    FOR XML PATH(''), TYPE).value('.[1]','VARCHAR(500)')

  • nadersam (10/19/2015)


    Unfortunately the lag function is not supported on the servers on site, they have SQL Server 2008.

    Any other option please?.

    In future posts, I recommend you post to the correct forum. This forum is for 2012. Not 2008.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.
    "Change is inevitable... change for the better is not".

    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)
    Intro to Tally Tables and Functions

  • Would something like this work?

    declare @InputValue as varchar(50)='ttttest',

    @Word varchar(50);

    with e1(n) as (select 1 from (values (1),(1),(1),(1),(1),(1),(1),(1),(1),(1))dt(n)),

    eTally(n) as (select row_number() over (order by (select null)) from e1 a cross join e1 b), -- don't need more than a 100 (actually 50)

    pivotletters as (

    select

    n,

    ascii(substring(@InputValue,n,1)) LetterValue,

    substring(@InputValue,n,1) letters,

    rn = n - row_number() over (partition by substring(@InputValue,n,1) order by n)

    from eTally where n <= len(@InputValue)

    )

    select @Word = stuff(( select -- min(n), letters from pivotletters group by letters, rn order by min(n)

    '' + letters

    from

    pivotletters

    group by

    letters, rn

    order by

    min(n)

    for xml path(''),TYPE).value('.','varchar(50)'),1,0,'');

    select @InputValue, @Word;

  • Jeff Moden (10/19/2015)


    nadersam (10/19/2015)


    Unfortunately the lag function is not supported on the servers on site, they have SQL Server 2008.

    Any other option please?.

    In future posts, I recommend you post to the correct forum. This forum is for 2012. Not 2008.

    The post was not about 2012 or 2008 syntax at the beginning , then i got help using lag function which worked perfectly but unfortunately it didn't work on testing server as it was 2008, sorry about that.

Viewing 15 posts - 1 through 15 (of 18 total)

You must be logged in to reply to this topic. Login to reply