How to delete duplicated rows in a table

  • Hi sql gurus,

    I can write a simple code to find out the duplicated rows, but I don't know how to delete all duplicated rows only leave one row there.

    select <column name>, count(column name) as ' # of duplicates'

    from <table name>

    group by <column name>

    having (count(column name) > 1)

    order by <column name>

    Please help. Thanks.

    Robert

  • Here is another article that might help:

    http://www.databasejournal.com/features/mssql/article.php/2235081

    Gregory Larsen, DBA

    If you looking for SQL Server Examples check out my website at http://www.geocities.com/sqlserverexamples

    Gregory A. Larsen, MVP

  • Allen and Greg,

    Thank you guys very much.

    These scripts all work! Great!!

    Robert

  • How about something like this - would'nt this do the trick as well:

    delete from table_name

    where (Pk_value )

    not in

    ( select min(pk_value)

    from table_name group by PK_Value)

  • Unfortunately not.

    The pk_value will be the same in the duplicate rows.

    Because min(pk_value) will return the same value for each row in the duplicates, you'll end up deleting all the rows.

    example:

    Field1, Field2 pk_value

    0 0 1

    0 0 1

    min(pk_value) will return 1.

    This query will delete the whole table.

  • You can try to set rowcount to the 'number of row to be deleted -1' like this:

    set rowcount # of duplicates - 1

    delete from table where key_column=key_value

    set rowcount 0

    (0 means unlimited)

    The delete statement will affect only # of duplicates - 1 rows

    Bye,

    Pepito

  • If you have a smallish number of rows (in total) in the table then just select distinct all columns into a new table, truncate the old table and then insert them all back from the new table to the old table.

    If you have a large table then select distinct only the rows that are dup (group by having count(*) > 1 clause) into a new table. Delete from the old table using the "from from" format (join table you are deleting to the new table) where all columns match. (sorry, typed to fast here and forgot to put the distinct rows back) and remember to insert the rows from the new table back into the old table.

    Either method will give the result of only having 1 occurance of a row regardless of how many duplicates there were initially for a row.

    Edited by - songmeister on 07/24/2003 08:52:44 AM

  • Here's one that we recently did in the real world:

    CREATE PROCEDURE spGMSRMRemovePODetailDupes

    AS

    select count(*) as NumberOfDupes,

    max(grsinterfacePODetailsID) as MaxIDNumber,

    RequestNo as ReqNo,

    StatusCode as StatCode,

    StatusDate as StatDate

    into #TempFindDupes

    from GRSInterfacePODetails

    group by RequestNo, StatusCode, StatusDate

    having count(*) > 1

    /* For debugging */

    /*

    select * from #TempFindDupes

    select * from GRSInterfacePODetails

    inner join #TempFindDupes

    on RequestNo = ReqNo

    and StatusCode = StatCode

    and StatusDate = StatDate

    where GRSInterfacePODetailsID not in (select MaxIDNumber from #TempFindDupes)

    */

    delete from GRSInterfacePODetails

    from GRSInterfacePODetails inner join #TempFindDupes

    on RequestNo = ReqNo

    and StatusCode = StatCode

    and StatusDate = StatDate

    where GRSInterfacePODetailsID not in (select MaxIDNumber from #TempFindDupes)

    drop table #TempFindDupes

    Of course you'll have to substitute your own tables and columns!

    Dave Orkis

    dave.orkis@eds.com

  • songmeister's method has always worked for me. quick n' dirty, true, but does the job. don't forget to add constraints that won't allow these dupes going forward.

  • OK, if this data is REALLY completely duplicated, then the articles will help you get rid of them. And like geomon said, make sure this doesn't happen in the future.

    But, if these values aren't truly duplicate then the solution is much easier. Say you have duplicate IDs with a different "CreationDate" in Table1

    Delete t1

    From Table1 t1 (nolock)

    JOIN Table1 t2 (nolock) on t1.ID = t2.ID and t2.CreationDate < t1.CreationDate

    Join the table to itself on ID and only delete records where the CreationDate < max(CreationDate), essentially all your dups.

    Pretty sweet!..Saw this piece of code on this very site and I've used it a few times already.

    Signature is NULL

  • I've used songmeister's technique in the past and some of the articles are rather complexed. You may try this:

    Create a PK for the table with an identity seed.

    DELETE FROM tblTest

    WHERE (intPK NOT IN

    (SELECT MIN(intPK) AS Expr1

    FROM tblTest

    GROUP BY intFld1, intFld2, intFld3))

    Then, delete the PK and set intFld1 to PK.

Viewing 12 posts - 1 through 11 (of 11 total)

You must be logged in to reply to this topic. Login to reply