Deleting duplicate record

  • Hi,

    I have a table likeCreate table demo(nvarchar(20))

    It has following recordsinsert into demo values('abc')

    insert into demo values('abc')

    insert into demo values('abc')

    insert into demo values('def')

    insert into demo values('jkl')

    insert into demo values('jkl')

    Now my requirement is that duplicate record should get deleted and only distinct records should be present in the query how is it possible

  • use a windowing function

    try this

    Create table demo(x nvarchar(20))

    insert into demo values('abc')

    insert into demo values('abc')

    insert into demo values('abc')

    insert into demo values('def')

    insert into demo values('jkl')

    insert into demo values('jkl')

    delete y

    from

    (select x,ROW_NUMBER() over(partition by x order by x) rnk from demo) y

    where rnk>1

    select * from demo

    drop table demo

    MVDBA

  • Do you really have a table with a single column? Or you have some ID there. Id so, you may be able to de-dupe without use of windowed function.

    _____________________________________________
    "The only true wisdom is in knowing you know nothing"
    "O skol'ko nam otkrytiy chudnyh prevnosit microsofta duh!":-D
    (So many miracle inventions provided by MS to us...)

    How to post your question to get the best and quick help[/url]

  • Hi,

    Firstly always name your columns:

    CREATE TABLE DEMO ([COLUMN1] NVARCHAR(20))

    Then insert your values:

    INSERT INTO DEMO VALUES ('ABC')

    INSERT INTO DEMO VALUES ('ABC')

    INSERT INTO DEMO VALUES ('ABC')

    INSERT INTO DEMO VALUES ('DEF')

    INSERT INTO DEMO VALUES ('FGK')

    INSERT INTO DEMO VALUES ('FGK')

    INSERT INTO DEMO VALUES ('HIJ')

    Lastly select distinct records:

    SELECT DISTINCT [COLUMN1]

    FROM DEMO

    If you don't want replicated data in a column, make sure it has a primary key constraint.

    See here: http://qa.sqlservercentral.com/stairway/75773/ by Gregory Larson

    This outlines the constructs of databases and relationships between tables based on Edgar Codd's relational database model. Really worth a read...

    As Eugene pointed out, if you're deleting records a unique ID would be an advantage. And make use of a temporary table to de-duplicate records based on that ID.

    ALTER TABLE DEMO

    ADD ID INT IDENTITY(1,1)

    SELECT MAX(ID) AS ID, MAX([COLUMN1]) AS [COLUMN1]

    INTO DEMO_TEMP

    FROM DEMO

    GROUP BY [COLUMN1]

    DELETE FROM DEMO

    WHERE ID NOT IN (SELECT ID FROM DEMO_TEMP)

  • Just two samples without windowed functions:

    1. Use of temporary storage for values to keep. It may play very well if you need to large set of data:

    create table #demo(val nvarchar(20))

    insert into #demo values('abc')

    insert into #demo values('abc')

    insert into #demo values('abc')

    insert into #demo values('def')

    insert into #demo values('jkl')

    insert into #demo values('jkl')

    select distinct val into #t from #demo

    truncate table #demo

    insert #demo select val from #t

    2. Example with numeric ID:

    create table #demo1(id int identity(1,1), val nvarchar(20))

    insert into #demo1 values('abc')

    insert into #demo1 values('abc')

    insert into #demo1 values('abc')

    insert into #demo1 values('def')

    insert into #demo1 values('jkl')

    insert into #demo1 values('jkl')

    delete d1

    from #demo1 d1

    join #demo1 d2 on d2.val = d1.val

    and d1.id > d2.id

    _____________________________________________
    "The only true wisdom is in knowing you know nothing"
    "O skol'ko nam otkrytiy chudnyh prevnosit microsofta duh!":-D
    (So many miracle inventions provided by MS to us...)

    How to post your question to get the best and quick help[/url]

  • Thanks to all for suggesting the solution

  • eugene... just out of interest... why avoid the windowing???

    MVDBA

  • example 2 wont work -you'll end up with a table called demo1 not demo

    MVDBA

  • MVDBA (9/6/2012)


    eugene... just out of interest... why avoid the windowing???

    Because using them may be slower then implementation without them...

    _____________________________________________
    "The only true wisdom is in knowing you know nothing"
    "O skol'ko nam otkrytiy chudnyh prevnosit microsofta duh!":-D
    (So many miracle inventions provided by MS to us...)

    How to post your question to get the best and quick help[/url]

  • That might be because Eugene already had a #demo table and didn't want to drop ti and create a new one.

    You can change the name to #demo.

    If you avoid windowing functions you can use your code in SQL 2000 and previous. However, you might not need to do it.

    Luis C.
    General Disclaimer:
    Are you seriously taking the advice and code from someone from the internet without testing it? Do you at least understand it? Or can it easily kill your server?

    How to post data/code on a forum to get the best help: Option 1 / Option 2
  • MVDBA (9/6/2012)


    example 2 wont work -you'll end up with a table called demo1 not demo

    Are you serious? :w00t:

    It is just an EXAMPLE 🙂

    So, I've named the table in a second example as #demo1 so you can run both examples at once!

    _____________________________________________
    "The only true wisdom is in knowing you know nothing"
    "O skol'ko nam otkrytiy chudnyh prevnosit microsofta duh!":-D
    (So many miracle inventions provided by MS to us...)

    How to post your question to get the best and quick help[/url]

  • A windowed function won't be slower than a two step function if you nest the conditions correctly. My last example is a two step using a temp table, but you can retrieve the same outcome with a windowed function.

  • I wonder what we are rally discussing here?

    There are many different ways to remove duplicate data. Decision on which one to use will depend on exact situation. Sometimes, you may want (and able) to get data to keep into temp and truncate the crap, sometimes you may be able to use existing key (anchors) which will allow to de-dupe by deleting, sometimes you will need to use windowed functions...

    It all does depend...

    _____________________________________________
    "The only true wisdom is in knowing you know nothing"
    "O skol'ko nam otkrytiy chudnyh prevnosit microsofta duh!":-D
    (So many miracle inventions provided by MS to us...)

    How to post your question to get the best and quick help[/url]

  • Luis Cazares (9/6/2012)


    That might be because Eugene already had a #demo table and didn't want to drop ti and create a new one.

    You can change the name to #demo.

    If you avoid windowing functions you can use your code in SQL 2000 and previous. However, you might not need to do it.

    ah but solution 2 assumed the table had an int column

    MVDBA

  • Eugene Elutin (9/6/2012)


    MVDBA (9/6/2012)


    eugene... just out of interest... why avoid the windowing???

    Because using them may be slower then implementation without them...

    hmmm.... yes, but also alternate solutions might use other resources (such as disk space to create new table 🙂 )

    many ways to skin a cat though.. really depends on what "other data" was in the table that determined which one to keep - if the table was 50 columns wide with varchar(max) columns then i'd use the windowing function every day of the week

    MVDBA

Viewing 15 posts - 1 through 15 (of 21 total)

You must be logged in to reply to this topic. Login to reply