How to delete duplicated rows in a table

Question

Post reply

How to delete duplicated rows in a table

Robert Lu

SSCrazy

Points: 2428
More actions
July 23, 2003 at 2:56 pm

#64841

Hi sql gurus,
I can write a simple code to find out the duplicated rows, but I don't know how to delete all duplicated rows only leave one row there.
select <column name>, count(column name) as ' # of duplicates'
from <table name>
group by <column name>
having (count(column name) > 1)
order by <column name>
Please help. Thanks.
Robert

Viewing 12 posts - 1 through 11 (of 11 total)

You must be logged in to reply to this topic. Login to reply

Allen Cui-55137 SSC Guru Points: 51650 More actions · Answer 1

http://support.microsoft.com/default.aspx?scid=kb;en-us;70956

http://www.sql-server-performance.com/rd_delete_duplicates.asp

Edited by - Allen_Cui on 07/23/2003 3:04:40 PM

Greg Larsen SSC-Insane Points: 20891 More actions · Answer 2

Here is another article that might help:

http://www.databasejournal.com/features/mssql/article.php/2235081

Gregory Larsen, DBA

If you looking for SQL Server Examples check out my website at http://www.geocities.com/sqlserverexamples

Gregory A. Larsen, MVP

Robert Lu SSCrazy Points: 2428 More actions · Answer 3

Allen and Greg,

Thank you guys very much.

These scripts all work! Great!!

Robert

Vik Nat Hall of Fame Points: 3930 More actions · Answer 4

How about something like this - would'nt this do the trick as well:

delete from table_name

where (Pk_value )

not in

( select min(pk_value)

from table_name group by PK_Value)

SeekQuel SSCommitted Points: 1818 More actions · Answer 5

Unfortunately not.

The pk_value will be the same in the duplicate rows.

Because min(pk_value) will return the same value for each row in the duplicates, you'll end up deleting all the rows.

example:

Field1, Field2 pk_value

0 0 1

min(pk_value) will return 1.

This query will delete the whole table.

pepito SSC Journeyman Points: 80 More actions · Answer 6

You can try to set rowcount to the 'number of row to be deleted -1' like this:

set rowcount # of duplicates - 1

delete from table where key_column=key_value

set rowcount 0

(0 means unlimited)

The delete statement will affect only # of duplicates - 1 rows

Bye,

Pepito

songmeister Old Hand Points: 334 More actions · Answer 7

If you have a smallish number of rows (in total) in the table then just select distinct all columns into a new table, truncate the old table and then insert them all back from the new table to the old table.

If you have a large table then select distinct only the rows that are dup (group by having count(*) > 1 clause) into a new table. Delete from the old table using the "from from" format (join table you are deleting to the new table) where all columns match. (sorry, typed to fast here and forgot to put the distinct rows back) and remember to insert the rows from the new table back into the old table.

Either method will give the result of only having 1 occurance of a row regardless of how many duplicates there were initially for a row.

Edited by - songmeister on 07/24/2003 08:52:44 AM

lzd503 Grasshopper Points: 11 More actions · Answer 8

Here's one that we recently did in the real world:

CREATE PROCEDURE spGMSRMRemovePODetailDupes

AS

select count(*) as NumberOfDupes,

max(grsinterfacePODetailsID) as MaxIDNumber,

RequestNo as ReqNo,

StatusCode as StatCode,

StatusDate as StatDate

into #TempFindDupes

from GRSInterfacePODetails

group by RequestNo, StatusCode, StatusDate

having count(*) > 1

/* For debugging */

/*

select * from #TempFindDupes

select * from GRSInterfacePODetails

inner join #TempFindDupes

on RequestNo = ReqNo

and StatusCode = StatCode

and StatusDate = StatDate

where GRSInterfacePODetailsID not in (select MaxIDNumber from #TempFindDupes)

*/

delete from GRSInterfacePODetails

from GRSInterfacePODetails inner join #TempFindDupes

on RequestNo = ReqNo

and StatusCode = StatCode

and StatusDate = StatDate

where GRSInterfacePODetailsID not in (select MaxIDNumber from #TempFindDupes)

drop table #TempFindDupes

Of course you'll have to substitute your own tables and columns!

Dave Orkis

dave.orkis@eds.com

geomon Old Hand Points: 316 More actions · Answer 9

songmeister's method has always worked for me. quick n' dirty, true, but does the job. don't forget to add constraints that won't allow these dupes going forward.

Calvin Lawson SSChampion Points: 11030 More actions · Answer 10

OK, if this data is REALLY completely duplicated, then the articles will help you get rid of them. And like geomon said, make sure this doesn't happen in the future.

But, if these values aren't truly duplicate then the solution is much easier. Say you have duplicate IDs with a different "CreationDate" in Table1

Delete t1

From Table1 t1 (nolock)

JOIN Table1 t2 (nolock) on t1.ID = t2.ID and t2.CreationDate < t1.CreationDate

Join the table to itself on ID and only delete records where the CreationDate < max(CreationDate), essentially all your dups.

Pretty sweet!..Saw this piece of code on this very site and I've used it a few times already.

Signature is NULL

da_stew Newbie Points: 6 More actions · Answer 11

I've used songmeister's technique in the past and some of the articles are rather complexed. You may try this:

Create a PK for the table with an identity seed.

DELETE FROM tblTest

WHERE (intPK NOT IN

(SELECT MIN(intPK) AS Expr1

FROM tblTest

GROUP BY intFld1, intFld2, intFld3))

Then, delete the PK and set intFld1 to PK.