How to detect duplicates ?

  • Hello,

    I'm not exactly a newbie, but I'm far from being a SQL expert.

    So, I have now a problem to solve and I don't know how to work it out in a clever way.

    I have only one table where I can have duplicates based on three fields (if the three fields have the same value in two or more records, then these records are duplicates).

    I must now get the primary key (not included in the three fields) of all duplicate records and I need, for each record having duplicates, to know the primary key of all its duplicates (in my ASP page, duplicates must be displayed in red and a tooltip on each grid line must display the list of duplicates for the current record).

    Of course, from my ASP .Net page, I can get a list of all records, then, for each one, call for a procedure which looks for its duplicates.

    Sure it will work, but I hope there is a more efficient way.

    Is there ?

    Thanks for your advices.

    Gedeon

  • SELECT * FROM

    (SELECT * ,ROW_NUMBER()

    OVER (PARTITION BY FIELD1, Field2, Field3 Order BY PK) as row_no

    FROM URTABLE) T

    WHERE ROW_NO > 1

    This will give you duplicates, and all the other columns from your table

    ---------------------------------------------------------------------------------

  • I'll try this solution this afternoon (I must attend a meeting now).

    I couldn't invent such a piece of code.

    Thanks for your lightning fast answer !

    Gedeon

  • Ok, I tried.

    It's a big step forward, but not exactly what I was waiting for.

    I get all duplicates records where row_no greater than 1.

    But for these records, and only for these, I also need the record where row_now equals 1.

    In other words I must get all duplicates, including the first record in a duplicates group.

    I got the solution using a cursor and a loop.

    Is there a more easy way ?

    Thanks.

  • How about this?

    select YourTable.*

    from

    YourTable

    inner join

    (select

    field1, field2, field3

    from YourTable

    group by field1, field2, field3

    having count(*)>1) dups

    on dups.field1=YourTable.field1

    and dups.field2=YourTable.field2

    and dups.field3=YourTable.field3

    The subquery identifies the duplicates then is joined back to your source table to get the details of all duplicates.

    Regards

    Alun

  • Wow !

    Works fine, with only one (combined) request !

    The power of this langage is always stupendous for me.

    Thanks a lot.

    Gedeon

  • Row_Number version for it. It would give the first record of the duplicate set order by your PK,

    CREATE TABLE #URTABLE (PK int, Field1 int, Field2 int , Field3 int)

    INSERT INTO #URTABLE VALUES (1,1,1,1)

    INSERT INTO #URTABLE VALUES (2,1,1,1)

    INSERT INTO #URTABLE VALUES (3,2,2,2)

    INSERT INTO #URTABLE VALUES (4,2,2,2)

    INSERT INTO #URTABLE VALUES (5,3,3,3)

    DROP table #temp

    Select * from (

    SELECT *,ROW_NUMBER()

    OVER (PARTITION BY T1.FIELD1, T1.Field2, T1.Field3 Order BY PK) as row_no

    FROM #URTABLE T1

    JOIN

    (SELECT

    T3.Field1 as df1,

    T3.Field2 as df2,

    T3.field3 as df3,

    Count(*) as row_cnt

    FROM #URTABLE T3

    GROUP BY FIELD1, Field2, Field3 ) T2

    ON T1.Field1 = T2.DF1 AND T1.Field2 = T2.DF2 AND T1.Field3 = T2.DF3)T4

    Where row_cnt> 1 and row_no = 1

    ---------------------------------------------------------------------------------

  • Works fine, too !

    And it provides the number of duplicates for each group of dups and the dup number in each group.

    Thanks again.

Viewing 8 posts - 1 through 7 (of 7 total)

You must be logged in to reply to this topic. Login to reply