selecting duplicate rows including PK, but ignoring PK field for determining duplicate row

  • I want to select duplicate rows based on the column 'companyname' into a table.

    This statement works:

    SELECT p.companyname, Count(*) - 1

    FROM prospects p

    GROUP BY p.companyname

    HAVING Count(*) > 1

    BUT, I need to know the id belonging to the duplicate row. So I added the "id" column:

    DECLARE @prosp TABLE (

    id int

    ,companyname nvarchar(50)

    ,cnt int

    )

    INSERT INTO @prosp(id,companyname,cnt)

    SELECT p.id,p.companyname, Count(*) - 1

    FROM prospects p

    GROUP BY p.id,p.companyname

    HAVING Count(*) > 1

    select * from @prosp

    But in the latter case, NO records are found (ofcourse) since the id is the PK and thus never duplicate.

    How can I get this scenario to work?

  • See if this works for you

    INSERT INTO @prosp(Id,companyname,cnt)

    SELECT

    P.Id,

    P.companyname,

    P.cnt

    FROM

    prospects P

    INNER JOIN

    (

    SELECT

    T1.companyname,

    Count(*) - 1

    FROM

    prospects T1

    GROUP BY

    T1.companyname

    HAVING Count(*) > 1

    ) AS DQ1

    ON

    P.companyname = DQ1.companyname

    select * from @prosp

    Scott

    Scott
    I am a Senior Data Analyst/DBA
    I work with MS SQL, Oracle & several ETL Tools

    @thewolf66[/url]

  • It does...thx!

  • Ok, sorry, but I do have one more question.

    I need to add a calculated column to the temp table: whether the selected records occur in another table 'prospectsincampaigns'.

    I now have this:

    DECLARE @prosp TABLE (

    id int

    ,companyname nvarchar(50)

    ,isincampaign bit

    ,cnt int

    )

    INSERT INTO @prosp(Id,companyname,cnt)

    SELECT

    P.Id

    ,P.companyname

    ,isincampaign

    ,cnt

    FROM prospects P

    INNER JOIN

    (

    SELECT T1.companyname

    ,(Count(*) - 1) as cnt

    ,(select COUNT(prospectid) from prospectsincampaigns where prospectid=P.Id)>0 as isincampaign

    FROM

    prospects T1

    GROUP BY

    T1.companyname

    HAVING Count(*) > 1

    ) AS DQ1

    ON

    P.companyname = DQ1.companyname

    select * from @prosp

    As you see, I have this line:

    ,(select COUNT(prospectid) from prospectsincampaigns where prospectid=P.Id)>0 as isincampaign

    What I want here, is a bit value that indicates if this value occurs in that other table.

    Besides the facts that the statement syntax above is incorrect, im also unable to add a calculated column to the temp table.

    If adding the bit value is not possible, I would also settle for a value that indicates the amount (int) of times the id occurs in the other table..

    Help is really appreciated (again 🙂

  • This is a sample of how to do this

    --= first set up some test data

    DECLARE @prospects TABLE (

    id int identity

    ,companyname nvarchar(50)

    )

    insert @prospects( companyname)

    select 'test1' union all

    select 'test1' union all

    select 'test1' union all

    select 'test2' union all

    select 'test2' union all

    select 'test3' union all

    select 'test4' union all

    select 'test5' union all

    select 'test6' union all

    select 'test7' union all

    select 'test8'

    --= now use a CTE and ROW_NUMBER() to count the repeating entries

    ;with cte as

    (

    select

    id

    ,companyname

    ,ROW_NUMBER() over(partition by companyname order by id) as counter

    from @prospects

    )

    select

    P.id

    ,P.companyname

    ,P.counter

    ,COALESCE(PIC.flag,'Is NOT in prospectsincampaigns') as picflag

    from cte as P

    outer apply (

    select TOP 1 'Is in prospectsincampaigns'

    from prospectsincampaigns

    where prospectid=P.Id

    ) PIC(flag)

    where counter >1 --== and select only the ones that are duplicates (assuming the first one found for each companyname is not a duplicate)

    Edit: added external lookup to prospectsincampaigns

    MM



    select geometry::STGeomFromWKB(0x0106000000020000000103000000010000000B0000001000000000000840000000000000003DD8CCCCCCCCCC0840000000000000003DD8CCCCCCCCCC08408014AE47E17AFC3F040000000000104000CDCCCCCCCCEC3F9C999999999913408014AE47E17AFC3F9C99999999991340000000000000003D0000000000001440000000000000003D000000000000144000000000000000400400000000001040000000000000F03F100000000000084000000000000000401000000000000840000000000000003D0103000000010000000B000000000000000000143D000000000000003D009E99999999B93F000000000000003D009E99999999B93F8014AE47E17AFC3F400000000000F03F00CDCCCCCCCCEC3FA06666666666FE3F8014AE47E17AFC3FA06666666666FE3F000000000000003D1800000000000040000000000000003D18000000000000400000000000000040400000000000F03F000000000000F03F000000000000143D0000000000000040000000000000143D000000000000003D, 0);

  • Forum Etiquette: How to post Reporting Services problems
  • [/url]
  • Forum Etiquette: How to post data/code on a forum to get the best help - by Jeff Moden
  • [/url]
  • How to Post Performance Problems - by Gail Shaw
  • [/url]

  • @magoo: I think you were replying to an earlier post of me and not to my latest?

  • No, the whole thing, to find duplicates in the prospects table (I used a table variable for this table as a test) and show whether the duplicates are also in prospectsincampaigns ...

    If I have misunderstood the requirement, then I apologise.

    The method I am showing does not use a temporary table and will only read Prospects once.

    MM



    select geometry::STGeomFromWKB(0x0106000000020000000103000000010000000B0000001000000000000840000000000000003DD8CCCCCCCCCC0840000000000000003DD8CCCCCCCCCC08408014AE47E17AFC3F040000000000104000CDCCCCCCCCEC3F9C999999999913408014AE47E17AFC3F9C99999999991340000000000000003D0000000000001440000000000000003D000000000000144000000000000000400400000000001040000000000000F03F100000000000084000000000000000401000000000000840000000000000003D0103000000010000000B000000000000000000143D000000000000003D009E99999999B93F000000000000003D009E99999999B93F8014AE47E17AFC3F400000000000F03F00CDCCCCCCCCEC3FA06666666666FE3F8014AE47E17AFC3FA06666666666FE3F000000000000003D1800000000000040000000000000003D18000000000000400000000000000040400000000000F03F000000000000F03F000000000000143D0000000000000040000000000000143D000000000000003D, 0);

  • Forum Etiquette: How to post Reporting Services problems
  • [/url]
  • Forum Etiquette: How to post data/code on a forum to get the best help - by Jeff Moden
  • [/url]
  • How to Post Performance Problems - by Gail Shaw
  • [/url]

  • Ah, I didnt see all the code, my bad.

    Anyway, your code seems to do [almost] what I need.

    Currently your code shows: #duplicate records - 1, meaning that if I have 5 companies named 'TEST', it will now show only 4 records.

    I need that all 5 records are shown, since all of them are duplicate.

    how would I achieve that?

  • petervdkerk (11/19/2010)


    Ah, I didnt see all the code, my bad.

    Anyway, your code seems to do [almost] what I need.

    Currently your code shows: #duplicate records - 1, meaning that if I have 5 companies named 'TEST', it will now show only 4 records.

    I need that all 5 records are shown, since all of them are duplicate.

    how would I achieve that?

    Just change the code

    ,ROW_NUMBER() over(partition by companyname order by id) as counter

    to

    ,SUM(1) over(partition by companyname) as counter

    Scott
    I am a Senior Data Analyst/DBA
    I work with MS SQL, Oracle & several ETL Tools

    @thewolf66[/url]

  • shump66 (11/19/2010)


    petervdkerk (11/19/2010)


    Ah, I didnt see all the code, my bad.

    Anyway, your code seems to do [almost] what I need.

    Currently your code shows: #duplicate records - 1, meaning that if I have 5 companies named 'TEST', it will now show only 4 records.

    I need that all 5 records are shown, since all of them are duplicate.

    how would I achieve that?

    Just change the code

    ,ROW_NUMBER() over(partition by companyname order by id) as counter

    to

    ,SUM(1) over(partition by companyname) as counter

    Exactly, thanks shump66! I was in the car at the time, so couldn't reply.

    MM



    select geometry::STGeomFromWKB(0x0106000000020000000103000000010000000B0000001000000000000840000000000000003DD8CCCCCCCCCC0840000000000000003DD8CCCCCCCCCC08408014AE47E17AFC3F040000000000104000CDCCCCCCCCEC3F9C999999999913408014AE47E17AFC3F9C99999999991340000000000000003D0000000000001440000000000000003D000000000000144000000000000000400400000000001040000000000000F03F100000000000084000000000000000401000000000000840000000000000003D0103000000010000000B000000000000000000143D000000000000003D009E99999999B93F000000000000003D009E99999999B93F8014AE47E17AFC3F400000000000F03F00CDCCCCCCCCEC3FA06666666666FE3F8014AE47E17AFC3FA06666666666FE3F000000000000003D1800000000000040000000000000003D18000000000000400000000000000040400000000000F03F000000000000F03F000000000000143D0000000000000040000000000000143D000000000000003D, 0);

  • Forum Etiquette: How to post Reporting Services problems
  • [/url]
  • Forum Etiquette: How to post data/code on a forum to get the best help - by Jeff Moden
  • [/url]
  • How to Post Performance Problems - by Gail Shaw
  • [/url]

  • Ok, thanks both you guys. And what if I want to select this result into a temp table?

    I now have (not working):

    DECLARE @prosp TABLE (

    id int

    ,companyname nvarchar(50)

    ,isincampaign bit

    ,cnt int

    ,incamp nvarchar(50)

    )

    with cte as

    (

    select id,companyname,SUM(1) over(partition by companyname) as counter from prospects

    )

    INSERT INTO @prosp(Id,companyname,cnt,incamp)

    select

    P.id

    ,P.companyname

    ,P.counter

    ,COALESCE(PIC.flag,'0') as incamp

    from cte as P

    outer apply (

    select TOP 1 '1'

    from prospectsincampaigns

    where prospectid=P.Id

    ) PIC(flag)

    Thanks again! 🙂

  • It helps if you tell people the error you get...

    Which bit is not working - the select or the insert (try it without the insert)

    MM



    select geometry::STGeomFromWKB(0x0106000000020000000103000000010000000B0000001000000000000840000000000000003DD8CCCCCCCCCC0840000000000000003DD8CCCCCCCCCC08408014AE47E17AFC3F040000000000104000CDCCCCCCCCEC3F9C999999999913408014AE47E17AFC3F9C99999999991340000000000000003D0000000000001440000000000000003D000000000000144000000000000000400400000000001040000000000000F03F100000000000084000000000000000401000000000000840000000000000003D0103000000010000000B000000000000000000143D000000000000003D009E99999999B93F000000000000003D009E99999999B93F8014AE47E17AFC3F400000000000F03F00CDCCCCCCCCEC3FA06666666666FE3F8014AE47E17AFC3FA06666666666FE3F000000000000003D1800000000000040000000000000003D18000000000000400000000000000040400000000000F03F000000000000F03F000000000000143D0000000000000040000000000000143D000000000000003D, 0);

  • Forum Etiquette: How to post Reporting Services problems
  • [/url]
  • Forum Etiquette: How to post data/code on a forum to get the best help - by Jeff Moden
  • [/url]
  • How to Post Performance Problems - by Gail Shaw
  • [/url]

  • Forgot a semicolon :$ code works fine 🙂 thanks!

  • Viewing 13 posts - 1 through 12 (of 12 total)

    You must be logged in to reply to this topic. Login to reply