self-join; getting one "unique" record

  • Maybe it's just one of those days, but for some reason I can't figure out the code to get what I want.

    An example of:

    declare @Tmp table(ssn int, name varchar(50), number int)

    INSERT INTO @Tmp VALUES(123456789, 'John Smith', 123456)

    INSERT INTO @Tmp VALUES(123456789, 'John Q Smith', 234567)

    INSERT INTO @Tmp VALUES(123456789, 'JohnSmith', 345689)

    SELECT *

    FROM @Tmp t1 join

    @Tmp t2 ON t1.ssn = t2.ssn

    WHERE t1.name <> t2.name

    Returns rows of:

    ssn name number name number

    ----------- --------------- ----------- --------------- -----------

    123456789 John Smith 111111 John Q Smith 222222

    123456789 John Smith 111111 JohnSmith 333333

    123456789 John Q Smith 222222 John Smith 111111

    123456789 John Q Smith 222222 JohnSmith 333333

    123456789 JohnSmith 333333 John Smith 111111

    123456789 JohnSmith 333333 John Q Smith 222222

    For what I'm doing, I consider rows 1 & 3, 2 & 5, 4 & 6

    I can't seem to figure out the code to just give me one each pair. This is just a sample table, the real table may have several name mismatches for a ssn.

    TIA for your help,

    Doug

  • For a start, you can just try to get an ordered list, and eliminate rows where its the same but just a different order ie.,

    change

    WHERE t1.name <> t2.name

    to

    WHERE t1.name > t2.name

    edit:

    You also could consider a more general solution, for instance, if you are trying to see if you have the same ssn with different names, you could just search for rows that have the same ssn.

    SELECT ssn, COUNT(*) C

    FROM @Tmp

    GROUP BY ssn

    HAVING COUNT(*) > 1

  • I'd like the number column included as it helps people research the case.

  • dough-378918 (4/8/2014)


    I'd like the number column included as it helps people research the case.

    If it were me, I'd use the generic group by and then using the gathered ssn's that were duplicated, rejoin that to the original list to select the rows I wanted to examine, I'm a pretty literal guy so I'll break the task into steps, ie., find the dupes (group by), then find the info associated with the dupes (join).

    There could be more concise ways to do this but I tend toward the most obvious 🙂

    SELECT ssn, name, number FROM

    (

    SELECT ssn selected_ssn, COUNT(*) C

    FROM @Tmp

    GROUP BY ssn

    HAVING COUNT(*) > 1

    ) SELECTED

    JOIN @Tmp

    ON selected_ssn = ssn

  • Would something like this help you?

    SELECT ssn

    ,repeatedvalues = STUFF(( SELECT DISTINCT '; ' + name + '-' + CAST( number AS varchar(20))

    FROM @Tmp x

    WHERE x.ssn = t.ssn

    FOR XML PATH('')), 1, 2, '')

    FROM @Tmp t

    GROUP BY ssn

    HAVING COUNT(DISTINCT name + CAST( number AS varchar(20))) > 1

    This will group all the distinct values in one row.

    Luis C.
    General Disclaimer:
    Are you seriously taking the advice and code from someone from the internet without testing it? Do you at least understand it? Or can it easily kill your server?

    How to post data/code on a forum to get the best help: Option 1 / Option 2
  • And then, encrypt those SSNs and stop storing them in plain text immediately!!!

    _______________________________________________________________

    Need help? Help us help you.

    Read the article at http://www.sqlservercentral.com/articles/Best+Practices/61537/ for best practices on asking questions.

    Need to split a string? Try Jeff Modens splitter http://www.sqlservercentral.com/articles/Tally+Table/72993/.

    Cross Tabs and Pivots, Part 1 – Converting Rows to Columns - http://www.sqlservercentral.com/articles/T-SQL/63681/
    Cross Tabs and Pivots, Part 2 - Dynamic Cross Tabs - http://www.sqlservercentral.com/articles/Crosstab/65048/
    Understanding and Using APPLY (Part 1) - http://www.sqlservercentral.com/articles/APPLY/69953/
    Understanding and Using APPLY (Part 2) - http://www.sqlservercentral.com/articles/APPLY/69954/

  • Sean Lange (4/8/2014)


    And then, encrypt those SSNs and stop storing them in plain text immediately!!!

    Very good point, how would you do it and still allow him to find his dupes and fix them once their found? Does the data entry involve entering the encrypted value, or do we produce the list decrypted? Can we group by the encrypted value? What's his encryption choices in light of what he wants done? Where do we store the key to allow all of this to happen? If the column is encrypted with a salt, how in the heck do we dedupe on it?

  • patrickmcginnis59 10839 (4/8/2014)


    Sean Lange (4/8/2014)


    And then, encrypt those SSNs and stop storing them in plain text immediately!!!

    Very good point, how would you do it and still allow him to find his dupes and fix them once their found? Does the data entry involve entering the encrypted value, or do we produce the list decrypted? Can we group by the encrypted value? What's his encryption choices in light of what he wants done? Where do we store the key to allow all of this to happen? If the column is encrypted with a salt, how in the heck do we dedupe on it?

    I meant once the dedupe was complete. Get the data cleaned up and then get it encrypted. 😉

    _______________________________________________________________

    Need help? Help us help you.

    Read the article at http://www.sqlservercentral.com/articles/Best+Practices/61537/ for best practices on asking questions.

    Need to split a string? Try Jeff Modens splitter http://www.sqlservercentral.com/articles/Tally+Table/72993/.

    Cross Tabs and Pivots, Part 1 – Converting Rows to Columns - http://www.sqlservercentral.com/articles/T-SQL/63681/
    Cross Tabs and Pivots, Part 2 - Dynamic Cross Tabs - http://www.sqlservercentral.com/articles/Crosstab/65048/
    Understanding and Using APPLY (Part 1) - http://www.sqlservercentral.com/articles/APPLY/69953/
    Understanding and Using APPLY (Part 2) - http://www.sqlservercentral.com/articles/APPLY/69954/

Viewing 8 posts - 1 through 7 (of 7 total)

You must be logged in to reply to this topic. Login to reply