Removing Duplicates

  • Dear All,

    I am migrating data over to a new system written in a completely different lauguage, the database is not SQL but one completely different.

    In this DB we have a persons table, which can be summerised as follows; -

    1Smith Peter

    2Smith Mary

    3Smith Tony

    4Smith Heather

    5Smith Peter

    6Smith Mary

    7Jones Matt

    8Jones George

    9Jones Henry

    10Jones George

    I need SQL to return the first occurances of the duplicates ID, so I am afrer sql which will return 1, 2, 3, 4, 7, 8, 9.

    Can anyone help ?

    Thanks

    J

  • SELECT Name, MIN( ID )

    FROM <table>

    GROUP BY Name

    Guarddata-

  • Hello,

    My apologies I meant to put

    1 Smith Peter

    2 Smith Mary

    3 Smith Tony

    4 Smith Heather

    7 Jones Matt

    8 Jones George

    9 Jones Henry

    As the result so the Min(ID) solution will not work.

    Thanks

    J

  • OK - help me understand. If you just change the order of the column list, do you not get the information?

    SELECT MIN(ID) as ID, Name

    FROM <table>

    GROUP BY Name

    ORDER BY MIN(ID), Name

    Guarddata-

  • Well, I guess these queries were enough if you ONLY need to return the first record for each DISTINCT nameIf you are migrating data and want to remove the extra duplicates here is another query:

    DELETE FROM persons p1
    WHERE (SELECT COUNT(*)
     FROM persons p2
     WHERE p1.name = p2.name) >1AND p1.ID <>
      (SELECT MIN(ID)
        FROM persons p3
        WHERE p1.name = p2.name)

    This query is not so complex as it might appear; For each record in the persons table, if the same name did appear more than once, all these records will be deleted except for the first one (the one having the MIN(ID))



    Mohamad Ad-Deeb
    Senior Software Developer, Microtech
    3 Mossadak st. Dokki 12311
    Giza, Egypt.
    Tel. +20 2 336 9855 (Ext.112)
    Fax +20 2 749 8784
    http://www.microtech-eg.com

  • A direct way to do this is using the "distinct" qualifier, as in:

    select distinct names

    into newtable

    from oldtable

    if you have a lot of columns in the old table then you can do something like:

    select *

    into newtable

    from oldtable

    where names in (select names from oldtable group by names having count(*)=1)

    order by names

    Charles

  • delete p

    From #People p (nolock)

    JOIN#People p1 (nolock) on p.pName = p1.pName

    wherep.pID > p1.Pid

    Here's a extremely simple high performance way to do it in one statement:

    Join the table to itself on the value you want to be unique (pName is this case). In the where clause filter all records where the pID is less than itself (the minimum). Everything else gets' deleted:

    FULL code below:

    /*------------------------------------------

    Create test data

    */----------------------------------------------

    if object_id('tempdb..#people') is not null drop table #People

    create table #People (pID int identity, pName varchar(50))

    Insert #People Values ('Smith Peter')

    Insert #People Values ('Smith Mary')

    Insert #People Values ('Smith Tony')

    Insert #People Values ('Smith Heather')

    Insert #People Values ('Smith Peter')

    Insert #People Values ('Smith Mary' )

    Insert #People Values ('Jones Matt' )

    Insert #People Values ('Jones George' )

    Insert #People Values ('Jones Henry' )

    Insert #People Values ('Jones George')

    /*---------------------------------

    test deletion

    */------------------------------------

    select * from #People

    delete p

    From #People p (nolock)

    JOIN#People p1 (nolock) on p.pName = p1.pName

    wherep.pID > p1.Pid

    select * from #People

    Signature is NULL

Viewing 7 posts - 1 through 6 (of 6 total)

You must be logged in to reply to this topic. Login to reply