How to remove duplicate record?

  • Hi all,

       I have a staging table need to be imported to the employee table which has a primary key on employee id.

    This staging table contains many duplicated employees records which has same employee id, name but have different department names or different pay rate or addresses.

    I want to delete all duplicate employee records and just leave the first one.  How can I do it?

    Thank you.

  • >>and just leave the first one.  How can I do it?

    You start with the business requirements and a definition of "first one".

    Oldest ? Newest ? Most complete address ? Highest Pay rate ? Lowest Pay Rate ?

    You need the rules which define which records to discard & which to keep before even opening the SQL code editor.

     

     

  • This should grab the first row it runs accross based on the EmpID:

     

    INSERT INTO EmployeeTable (Columns) SELECT Top 1 FROM StagingDatabase..StagingTable WHERE .......

    John Rowan

    ======================================================
    ======================================================
    Forum Etiquette: How to post data/code on a forum to get the best help[/url] - by Jeff Moden

  • Finish your statement:

    ORDER BY <criteria>

     

    _____________
    Code for TallyGenerator

  • I've used the below script to remove duplicates (may even have found it on this site):

    ALTER TABLE TableWithDuplicates ADD EliminateDuplicates_RowID int NOT NULL IDENTITY(1, 1)

    DELETE

    FROM  t1

    FROM TableWithDuplicates t1

    JOIN(

     SELECT EliminateDuplicates_RowID=MAX(EliminateDuplicates_RowID),

      Field1,

      Field2,

      Field3,

      Field4

     FROM TableWithDuplicates

     GROUP BY Field1,

      Field2,

      Field3,

      Field4

     HAVING COUNT(*) >= 2

    ) t2

    ON( t1.EliminateDuplicates_RowID <> t2.EliminateDuplicates_RowID

     AND t1.Field1 = t2.Field1

     AND t1.Field2 = t2.Field2

     AND t1.Field3 = t2.Field3

     AND t1.Field4 = t2.Field4)

    ALTER TABLE TableWithDuplicates DROP COLUMN EliminateDuplicates_RowID

    Add a primary key to your staging table and then add in the fields which will make up a unique record.

  • DELETE FROM

     y

    FROM

     tblA AS x

    INNER JOIN

     tblB AS y

    ON

     x.id > y.id

     AND

     x.attribute1 = y.attribute1

     AND

     x.attribute2 = y.attribute2

  • All the employees have more than 1 record, and I only want to have one based on employid and I don't care which one I deleted.

    However, the code I write is deleting the whole table.  PLEASE HELP!!

    DELETE

    FROM TestEmp

    WHERE TestEmp.EmployID IN

      -- List 1 - all rows that have duplicates

      (SELECT F.EmployID

      FROM TestEmp AS F

      WHERE Exists (

    SELECT EmployID, Count(TestEmp.EmployID)

          FROM TestEmp

          WHERE TestEmp.LastName =TestEmp.LastName

               AND TestEmp.Frstname = TestEmp.Frstname

            GROUP BY TestEmp.Employid

            HAVING Count(TestEmp. EmployID) > 1))

  • >>has same employee id, name but have different department names or different pay rate or addresses.

    And that's why I asked my original question. If you can't come up with business rules for which record to keep based on dept name, pay rate or address, then you have no choice but to add an additional IDENTITY column to your staging table because the only way you can arbitrarily delete all but 1 of a group of dupes is to have some way of uniquely determining which of the group you don't want to delete.

     

  • There's almost always at least two or three ways to get something accomplished. Here is an example without a unique identifier:

    INSERT INTO x

    SELECT y.*

    FROM

     tbl2 AS x

    LEFT JOIN

     (

      SELECT * FROM #tbl1

      UNION

      SELECT * FROM #tbl1

     [close paren] AS y

    ON

     x.EmployID = y.EmployID

    WHERE

     x.EmployID IS NULL

    The union eliminates the duplicates and the check for isnull inserts records that are not in the base table yet.

  • Here's a completely different way.

    1) In Enterprise Manager, create a NEW copy of the OLD table, with unique contraints, and "Ignore Duplicate Key" checked.

    2) Copy from OLD table with duplicates, to NEW table.  Only 1 rcd of each dup group will get copied in to NEW table.

    3a) Drop OLD table, and rename NEW table to OLD name

                 OR

    3b) Truncate OLD table, and copy NEW records to OLD table.

    NOTE: This may not be a good way if you have FK relationships.

     

Viewing 10 posts - 1 through 9 (of 9 total)

You must be logged in to reply to this topic. Login to reply