SQL query help to save my bacon

  • Hi,

    I made a boo-boo with an import query and would like some help. I maintain a hockey stats database, and for somereason the IS NULL command did not work on my prod server (worked great on my test system) and as a result, I imported the stats of some 450 players in DUPLICATE.

    So, I am hoping that I can write a query which will allow me to identify the duplicates and delete them. The stats table has a team_id, player_id and season_id. So I know that if there is more than 1 instance of the same values for those 3 columns, it is a duplicate.

    I am ssuming I need to write a query that groups on team_id, player_id and season_id where the count for each of those items is > 1. But I have had no luck so far.

    Any help appreciated

    JW

     

  • There are other cooler ways to do this but the fastest way might be to do a select distinct (those 3 columns) into a second table, dump the original table and re-import from the second table into the original

    ------------

    Ray Higdon MCSE, MCDBA, CCNA

  • Discussion of some alternatives

     

  • http://www.databasejournal.com/features/mssql/article.php/1438651

     

    duh!! Sending the link might help.

  • Hmm, that was too easy. I'm always looking for the hard way!

    JW

  • You could save yourself a lot of heartache if you put a unique index (constraint) on team_id, player_id and season_id then you could not insert duplicates without an error.

  • Deleting duplicates can be done in a single T-SQL command,

    when the table has two unique keys that don't have any columns in common.

    For a table with only one unique key, add an identity column,

    perform the delete, then remove the identity column.

    create table sample_tbl

    (

       id   int identity,

       key1 int,

       key2 int,

       attr1 varchar (12),

       attr2 varchar (12),

       attr3 varchar (12)

    )

    Remove the duplicate rows, keeping only one of each

    delete a

      from      dbo.sample_tbl as a

           join dbo.sample_tbl as b

             on  a.key1 = b.key1

             and a.key2 = b.key2

       where a.id < b.id     -- "<" keeps the row with the highest identity value

                                    -- ">" keeps the row with the lowest identity value

    If you want to review the data before deleting...

    select num_dups, *

      from      sample_tbl as a

                -- get duplicate keys

           join (select key1, key2, count (*) as num_dups

                   from dbo.sample_tbl

                   group by key1, key2

                   having count (*) > 1

                ) as b

             on  a.key1 = b.key1

             and a.key2 = b.key2

      order by a.key1, a.key2

  • Hi !

    If the table structure is like.

    Tbl1

    (

    pid integer,

    .... some fields

    )

    then u can delete the duplicates as

    delete from tbl1 where pid not in (select max(pid) from tbl1)

    If u don;t have one then u can just add one identity column with seed 1 and it will work like pid.

    It shud work...

    --Vijendra


    "Beginning is Half Done"

  • "delete from tbl1 where pid not in (select max(pid) from tbl1)"

    Wouldn't that delete all rows exept the one with the max PID?

    ------------

    Ray Higdon MCSE, MCDBA, CCNA

  • I think that was needed !!!

    regards

    Vijendra...


    "Beginning is Half Done"

Viewing 10 posts - 1 through 9 (of 9 total)

You must be logged in to reply to this topic. Login to reply