Recursion Problem

  • Here is the scenario, we have a Customer DB that has duplicate Customers with different CUSTID values. We plan on keeping the oldest CUSTOMER Record and merge any newer records into the Oldest one and cancel the newer records:

    tblDUPS:

    CUSTNAME CUSTID CUSTID2 MatchField

    Richard Smith 101 102 NAME

    Dick Smith 102 105 NAME

    Robert Bug 103 104 CITY

    Dickie Smith 105 106 CITY

    The objective is relate all three rows to the smallest CUSTID and PIVOT the MatchField values:

    RESULT SET

    CUSTID1 CUSTID2 NAME DLN CITY

    101 102 Y N N

    101 105 N Y N

    101 106 N N Y

    103 104 N N Y

    In the example above, I have already identified the duplicate rows in the the Customer Table and populated table tblDUPS. I'm just having mental block on rendering the last result set. There is no number to the amount of duplicates that can be in the table. In the example above CUSTID's 101,102,105, and 106 all belong to the same customer. Eventually, we will roll all purchase history into the smallest (oldest) CUSTID's (101,103) and Cancel the high CUSTID's (102,104,105,106).

    Any idea's?

  • Hi hawaiian,

    Can you try following code

    select

    isnull(tt.custid, t.custid) custid,

    t.custid2,

    NAME = CASE when t.MatchField = 'NAME' then 'Y' else 'N' end,

    DLN = CASE when t.MatchField = 'CITY' then 'Y' else 'N' end,

    CITY = CASE when t.MatchField = 'CITY' then 'Y' else 'N' end

    from tblDUPS t

    left join tblDUPS tt on t.custid = tt.custid2

    Sorry for the DLN column, I have no idea how to populate it

  • Provide DDL, DML for sample data and expected results relative to the sample data please.

    There are no special teachers of virtue, because virtue is taught by the whole community.
    --Plato

  • Below is the DDL. I should explain that the matched field column means the to customers matched on either their Drivers License Number (DLN); Last 4 of their Social Security Number (SSN); or some combination of their name.

    You will notice after the table build out that Customer 102 matches to customer 105 based on a Name Combination Match. Customer 105 also matches to customer 106 on a Name Combination Match. The Name Combination Match query compares the First letter of the first name and some combination of the Last Name.

    Here is a real life example:

    Customer ID First Name Last Name

    102 Mary Elizabeth Del La Rosa

    105 Mary Rosa

    106 M Elizabeth Rosa

    The customer table is compared to itsself to find duplicates, but to keep the customer ID from showig up on both sides of the results there is a WHERE clause that states WHERE CustID1 < CustID2. So the problem is 102 will relate to 105 on a name match and 105 will relate to 106 on a name match. The solution should:

    Origianl Data

    101102DLN

    101103SSN

    102105NAME

    105106NAME

    Pass1

    Relate 101 to 102

    Relate 101 to 103

    Relate 101 to 105

    Relate 102 to 106

    Pass2

    Relate 101 to 102

    Relate 101 to 103

    Relate 101 to 105

    Relate 101 to 106

    The code below will get me to Pass 1. I tried it with an Update Statement and a Loop but that had its own problem. Hence, a recursion problem.

    USE [SQLDEVL]

    GO

    /****** Object: Table [dbo].[tblCustomerIDs] Script Date: 07/13/2011 22:32:37 ******/

    IF EXISTS (SELECT * FROM sys.objects WHERE object_id = OBJECT_ID(N'[dbo].[tblDUPS]') AND type in (N'U'))

    DROP TABLE [dbo].[tblDUPS]

    GO

    USE [SQLDEVL]

    GO

    /****** Object: Table [dbo].[tblCustomerIDs] Script Date: 07/13/2011 22:32:37 ******/

    SET ANSI_NULLS ON

    GO

    SET QUOTED_IDENTIFIER ON

    GO

    CREATE TABLE [dbo].[tblDUPS](

    [custid] [int] NULL,

    [custid2] [int] NULL,

    [MatchField] [CHAR](4)

    ) ON [PRIMARY]

    GO

    INSERT INTO tblDUPS (custid,custid2,MatchField)

    VALUES(101,102,'DLN')

    GO

    INSERT INTO tblDUPS (custid,custid2,MatchField)

    VALUES(101,103,'SSN')

    GO

    INSERT INTO tblDUPS (custid,custid2,MatchField)

    VALUES(102,105,'NAME')

    GO

    INSERT INTO tblDUPS (custid,custid2,MatchField)

    VALUES(105,106,'NAME')

    GO

    SELECT * FROM tblDUPS

    GO

    SELECT

    ISNULL(tt.custid, t.custid) AS custid,

    t.custid2,

    NAME = CASE

    WHEN t.MatchField = 'NAME' THEN 'Y' ELSE 'N'

    END,

    DLN = CASE

    WHEN t.MatchField = 'DLN' THEN 'Y' ELSE 'N'

    END,

    CITY = CASE

    WHEN t.MatchField = 'CITY' THEN 'Y' ELSE 'N'

    END

    FROM

    tblDUPS AS t LEFT JOIN tblDUPS AS tt ON t.custid = tt.custid2;

  • BTT

  • See if this approach will work for you. If you have questions about my choices let me know.

    There may be more efficient ways to do this too...others may drop in and contribute new or improve on this.

    USE tempdb

    GO

    IF EXISTS ( SELECT *

    FROM sys.objects

    WHERE object_id = OBJECT_ID(N'[dbo].[tblDUPS]')

    AND type IN (N'U') )

    DROP TABLE [dbo].[tblDUPS]

    GO

    CREATE TABLE [dbo].[tblDUPS]

    (

    [custid] [int] NULL,

    [custid2] [int] NULL,

    [MatchField] [CHAR](4)

    )

    GO

    INSERT INTO tblDUPS

    (custid, custid2, MatchField)

    VALUES (101, 102, 'DLN'),

    (102, 105, 'NAME'),

    (105, 106, 'NAME'),

    (101, 103, 'SSN'),

    (109, 110, 'NAME'),

    (110, 112, 'NAME'),

    (112, 113, 'NAME') ;

    GO

    IF EXISTS ( SELECT *

    FROM sys.objects

    WHERE object_id = OBJECT_ID(N'dbo.get_leaves')

    AND type IN (N'FN', N'IF', N'TF', N'FS', N'FT') )

    DROP FUNCTION dbo.get_leaves ;

    GO

    CREATE FUNCTION dbo.get_leaves (@custid INT)

    RETURNS TABLE

    AS

    RETURN

    (WITH cte(custid, custid2, [level])

    AS (

    SELECT DISTINCT

    t.custid AS custid,

    t.custid2 AS custid2,

    0 AS [level]

    FROM dbo.tblDUPS t

    WHERE custid = @custid

    UNION ALL

    SELECT t.custid,

    t.custid2,

    cte.[level] + 1 AS [level]

    FROM cte

    JOIN dbo.tblDUPS t ON cte.custid2 = t.custid

    ),

    cte2

    AS (

    SELECT custid,

    custid2,

    level,

    ROW_NUMBER() OVER (PARTITION BY custid2 ORDER BY level DESC) AS row_num

    FROM cte

    )

    SELECT DISTINCT

    @custid AS custid,

    custid2

    FROM cte2

    WHERE row_num = 1 ) ;

    GO

    SELECT DISTINCT

    t.custid,

    gl.custid2

    FROM dbo.tblDUPS t

    CROSS APPLY dbo.get_leaves(t.custid) gl

    WHERE NOT EXISTS ( SELECT *

    FROM dbo.tblDUPS

    WHERE custid2 = t.custid )

    GO

    There are no special teachers of virtue, because virtue is taught by the whole community.
    --Plato

  • opc.three you got it; hence, you got the problem. Two things I want to stress:

    I did come up with a solution but it was very ... ugly and worked for small groups of data. When I process all 6k ROWS, it failed. So it wasn’t really much of a solution.

    Secondly, I made up this scenario. There is no Customer table that begins with tbl. A record is composed of one or more rows from one or more tables. A tuple is a row but you never hear it being used today.

    Anyway, there was no way I could post my customers data DDL. What is true is the fact that the database has multiple "customers" because people can register 200 times in a day with 200 different names or veriations of their name. I was tasked with finding duplicate registrations, reporting on them, and prevent any new duplicates to enter the system.

    A human being has to actually verify if the two customers are in fact the same person. They can then use a merge program to merge the newer customer registration and their history into the oldest customer ID. It would be very helpful if the entire related customer ID's were group as opc has shown me.

    Out of 2.1 million customers there are 6 thousand duplicate pairs. Identifying them was easy, reporting on them was a challenge.

    Thank you again opc!!!

    Have a great day

    Hawaiian

  • You're welcome Hawaiian! I am happy it will work for you...enjoy the weekend 🙂

    There are no special teachers of virtue, because virtue is taught by the whole community.
    --Plato

  • Heheeeee. I got mad props from my customer 🙂

    I couldn't in good conscience take all the credit

Viewing 9 posts - 1 through 8 (of 8 total)

You must be logged in to reply to this topic. Login to reply