DUPLICATE RECORDS

  • I have a table containing about 2500000 rows out of this i have found that 750 rows have been duplicated thats 1500 rows, is there a possibility to delete the duplicated 750 rows only

    Thanks in advance

  • Begin Transaction

    Create table #temp(inculde the columns which are in Source table)

    Insert into #temp values Select distinct * from

    source table

    delete from source table

    Insert into source table select * from #temp

    if @@error=0

    commit transaction

    else

    rollback transaction

    drop table #temp

    erajendar


    erajendar

  • Here's a script by Chris Cubley which I have used a couple times.

    http://qa.sqlservercentral.com/scripts/contributions/395.asp

    Cheers,

    Crispin

    Something as incredibly simple as

    binary still gives you too many options

    Cheers,CrispinI can't die, there are too many people who still have to meet me!It's not a bug, SQL just misunderstood me!

  • Chris Cubley also wrote an excellent article on this topic that includes the scripts you will need to identify and remove the duplicate records. I tested the scripts on my own system and everything works A-OK !!!

    Here's the article:

    http://qa.sqlservercentral.com/columnists/ccubley/findinganddeletingduplicatedata.asp

  • You can also add an identity column to the table to help with deleting duplicates. Here is an example of some code to do that:

    alter table dup_table

    add seq_num int identity

    go

    delete from a

    from dup_table a join

    (select col1, col2, max(seq_num) max_seq_num from dup_table

    group by col1, col2

    having count(*) > 1) b

    on a.col1 = b.col1 and

    a.col2 = b.col2 and

    a.seq_num < b.max_seq_num

    go

    alter table dup_table

    drop column seq_num

    Gregory Larsen, DBA

    If you looking for SQL Server Examples check out my website at http://www.geocities.com/sqlserverexamples

    Gregory A. Larsen, MVP

  • Many Thanks to all you Guys for your help

    Special Thanks Greg, your Solution did the trick.

    Thanks once again.

  • If you can't touch the table structure, a loop using SET ROWCOUNT will limit the number of records deleted. Perhaps a cursor with using a select something like

    SELECT Col1, Col2, Col3, Count(*) - 1

    FROM TheTableWithDups

    GROUP BY @Col1, @Col2, @Col3

    HAVING COUNT(*) > 1

    Fetch into @Col1, @Col2, @Col3, @NumberToDelete

    In the cursor loop...

    BEGIN TRAN -- make sure the count is and stays correct or bad things could happen!

    SELECT @NumberToDelete = COUNT(*) - 1

    FROM TheTableWithDups

    WHERE Col1 = @Col1 AND

    Col2 = @Col2 AND

    Col3 = @Col3

    IF @NumberToDelete > 0

    BEGIN

    SET ROWCOUNT = @NumberToDelete

    DELETE FROM TheTableWithDups

    WHERE Col1 = @Col1 AND

    Col2 = @Col2 AND

    Col3 = @Col3

    END

    COMMIT

    I have not tried this, but I think it would work. If there is no chance that the table will be changed (perhaps via a table lock), then no extra check or transaction would be required.

    Randy
    Helpdesk: Perhaps Im not the only one that does not know what you are doing. 😉

Viewing 7 posts - 1 through 6 (of 6 total)

You must be logged in to reply to this topic. Login to reply