Eliminating duplicate rows

  • Hello all, my problem is that I was running update statements and at the bottom of the page there was a nice little insert statement that i didn't see. So every time I was executing my updates I was also inserting new rows into another table (rookie mistake, live and learn I guess).

    So my question is does anyone know how to get rid these duplicate rows?

    I want to keep the originals but just delete the duplicates... thanks in advance for any help

  • 1 : Restore table from backup

    2 : Make sure you always have a backup

    3 : NEVER EVER WORK ON PRODUCTION SERVER.

    4 : Search these forums.  That answer is posteds few times a week here.

  • I have tried something like:

    select

    distinct emimageid, * into image_3 from image_1st

    but nothing, I have also scanned over some different forums and i don't see anything that does what I am looking for.

  • Ummm.... you need to identify to us what makes a "duplicate" and, perhaps, post the DDL for the table where the dupes exist.  Elimination of duplicates is not difficult but we do need to know what makes a duplicate by column name(s).

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.
    "Change is inevitable... change for the better is not".

    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)
    Intro to Tally Tables and Functions

  • sorry i wasn't clear, the whole row is duplicate except for the ID column which is a GUID..

    sometimes there is 3 of a row and sometimes just one

  • Are you using SQL Server 2000 or SQL Server 2005?

     


    N 56°04'39.16"
    E 12°55'05.25"

  • No, I understood that... but typically there is some unique key (column or combination of columns) other than the GUID that would identify the unique row.  Need that info to do this...

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.
    "Change is inevitable... change for the better is not".

    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)
    Intro to Tally Tables and Functions

  • Congrats on the 4K mile Jeff.

  • SQL 2005 management studio...

    No sorry the only thing thats different is the GUIDs..

    columns I have: 

    ID----GUID

    ManCode----nvarchar(3)

    emImageID----int

    description----nvarchar(250)

    path----nvarchar(150)

    Everything is the same on the duplicate rows except for the GUID... also not all rows have a duplicate just some, thanks for your patience and input.

  • This might be considered "cheating" but the most straightforward way is to extract one row per "unique" set, blow everything out and rebuild:

    --find the unique file names and extract them to a temp table

    Select mancode, emimageID,description,path, min(id) as ID into #temptable from <mytable>

    GROUP BY mancode, emimageID,description,path

    --clean out the table

    delete * from <mytable>

    --put the unique values back in

    Insert mytable (id,mancode,emimageID,description,path)

    select id,mancode,emimageID,description,path from #temptable

    --clean up after yourself

    drop table #temptable

     

    This is going to force table scans and all sorts of nasty performance since we have to do this on all the fields NOT indexed, so you probably want to make sure you only have to do this once.  Do that by putting a unique constraint in place (so it is not possible to insert duplicate values).  You should however make sure that this kind of constraint won't blow up your application.

    It's ugly, but it will do the job.

    ----------------------------------------------------------------------------------
    Your lack of planning does not constitute an emergency on my part...unless you're my manager...or a director and above...or a really loud-spoken end-user..All right - what was my emergency again?

  • Thanks matt I think I am on the right path but when I run the query I get the error:

    Msg 8117, Level 16, State 1, Line 1

    Operand data type uniqueidentifier is invalid for min operator.

    I tried changing the ID field from a GUID to a nvarchar and the query runs fine but I get the same results as when I started.
    The GUIDs are the only thing different between the rows and not all rows have a duplicate just some
    thanks again for helping

  • ok figured out what i needed to do, just bad programming to start with, thanks all

  • Thanks Remi!

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.
    "Change is inevitable... change for the better is not".

    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)
    Intro to Tally Tables and Functions

  • C'mon Zach... folks helped you out... the least you could do is post your solution

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.
    "Change is inevitable... change for the better is not".

    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)
    Intro to Tally Tables and Functions

  • Well.... its kind of difficult to explain.

    My situation was that I had inserted several rows into one table that were duplicates, but I had already related the table ID's to other tables therefore forcing me to keep the duplicates because I do not know which GUID id is tied to the other tables.

    On the duplicate rows the column I use to relate is the ID column which is GUID, so when I had a one to many relation the GUID id's alternate between the GUID values even though the rest of the data is identical.

    Example:

    5lk34-adfnjo8  somedata

    334d-fad9324 somedata

    I apologize if this isn't clear, it is difficult to explain with out seeing my situation.

    Thanks to all for the input

Viewing 15 posts - 1 through 15 (of 20 total)

You must be logged in to reply to this topic. Login to reply