Recursive CTE assistance

  • I'm trying to better understand recursive CTE queries, and I've hit a bit of a hurdle. In the recursive portion, I'm trying to build a comma-delimited list of values. I want to end up only with the final (largest) list for each group, but I'm getting every step in the recursion.

    (This is my first post, so I apologize if I screw up the tags...)

    CREATE TABLE #Orders

    (

    EmpID INT,

    OrderID VARCHAR(30),

    OrderDate DATETIME

    )

    INSERT INTO #Orders

    VALUES (101, 95001, '03/01/2011'),

    (101, 95026, '03/01/2011'),

    (101, 95053, '03/01/2011'),

    (104, 95014, '03/06/2011'),

    (104, 95018, '03/06/2011'),

    (106, 95077, '03/03/2011'),

    (106, 95121, '03/03/2011'),

    (106, 95085, '03/03/2011'),

    (106, 95222, '03/09/2011'),

    (115, 95117, '03/05/2011'),

    (115, 95141, '03/05/2011'),

    (130, 95066, '03/11/2011'),

    (142, 95512, '03/19/2011'),

    (142, 95588, '03/19/2011'),

    (142, 95722, '03/26/2011'),

    (142, 95764, '03/26/2011'),

    (142, 95848, '03/26/2011')

    WITH OrderRows AS (

    SELECT ROW_NUMBER() OVER(PARTITION BY EmpID, OrderDate ORDER BY EmpID, OrderID, OrderDate) AS RowNum,

    EmpID, OrderID, CONVERT(VARCHAR(10), OrderDate, 101) AS OrderDate

    FROM #Orders

    ),

    CTE AS (

    SELECT RowNum, EmpID, CAST(OrderID AS VARCHAR(500)) AS OrderID, OrderDate

    FROM OrderRows

    UNION ALL

    SELECT O.RowNum, C.EmpID, CAST(C.OrderID + ISNULL(', ' + O.OrderID, '') AS VARCHAR(500)), C.OrderDate

    FROM OrderRows O

    INNER JOIN CTE C

    ON O.EmpID = C.EmpID

    AND O.OrderDate = C.OrderDate

    AND O.RowNum = C.RowNum + 1

    )

    SELECT *

    FROM CTE C

    ORDER BY EmpID

    The first CTE (OrderRows) is there to derive a table with a rownum, for joining purposes. The CAST() of OrderID is to get the data types to match in the anchor and recursive portions. The CONVERT() is just to drop the time portion.

    So, these are the results I want (don't need RowNum; it's there just for comprehension):

    RowNumEmpIDOrderDateOrderID

    31013/1/201195001, 95026, 95053

    21043/6/201195014, 95018

    31063/3/201195077, 95085, 95121

    11063/9/201195222

    21153/5/201195117, 95141

    11303/11/201195066

    21423/19/201195512, 95588

    31423/26/201195722, 95764, 95848

    And these are the results I get:

    RowNumEmpIDOrderDateOrderID

    11013/1/201195001

    21013/1/201195026

    31013/1/201195053

    31013/1/201195026, 95053

    21013/1/201195001, 95026

    31013/1/201195001, 95026, 95053

    21043/6/201195014, 95018

    11043/6/201195014

    21043/6/201195018

    11063/3/201195077

    21063/3/201195085

    31063/3/201195121

    11063/9/201195222

    31063/3/201195085, 95121

    21063/3/201195077, 95085

    31063/3/201195077, 95085, 95121

    21153/5/201195117, 95141

    11153/5/201195117

    21153/5/201195141

    11303/11/201195066

    11423/19/201195512

    21423/19/201195588

    11423/26/201195722

    21423/26/201195764

    31423/26/201195848

    31423/26/201195764, 95848

    21423/26/201195722, 95764

    31423/26/201195722, 95764, 95848

    21423/19/201195512, 95588

    Help? ๐Ÿ™‚

  • OK, I just figured out to add

    WHERE RowNum = 1 in the anchor portion of the recursive CTE, but I'm still not quite there.

  • Which is more important to you ?

    Learning about recursive CTEs

    OR

    Getting the required query results?

    I ask because we can teach you about CTEs or we can show you the better way to get the query result.

    The two are not the same thing.

    Thanks for the sample data - good work!

    MM



    select geometry::STGeomFromWKB(0x0106000000020000000103000000010000000B0000001000000000000840000000000000003DD8CCCCCCCCCC0840000000000000003DD8CCCCCCCCCC08408014AE47E17AFC3F040000000000104000CDCCCCCCCCEC3F9C999999999913408014AE47E17AFC3F9C99999999991340000000000000003D0000000000001440000000000000003D000000000000144000000000000000400400000000001040000000000000F03F100000000000084000000000000000401000000000000840000000000000003D0103000000010000000B000000000000000000143D000000000000003D009E99999999B93F000000000000003D009E99999999B93F8014AE47E17AFC3F400000000000F03F00CDCCCCCCCCEC3FA06666666666FE3F8014AE47E17AFC3FA06666666666FE3F000000000000003D1800000000000040000000000000003D18000000000000400000000000000040400000000000F03F000000000000F03F000000000000143D0000000000000040000000000000143D000000000000003D, 0);

  • Forum Etiquette: How to post Reporting Services problems
  • [/url]
  • Forum Etiquette: How to post data/code on a forum to get the best help - by Jeff Moden
  • [/url]
  • How to Post Performance Problems - by Gail Shaw
  • [/url]

  • Really, both are important, in a sense. I already HAVE the results in the form I wanted (and not by just deleting results rows); I ended up creating a temp table storing the max(len(OrderID)) for each EmpID/OrderDate group, then joined back to that to just get the rows I wanted.

    But I'm sure this could be done in a single query, right? If you're suggesting there is a more elegant solution WITHOUT CTE's, I'd love to see that! But this actually came about as I was spending some time this morning working on understanding CTE's better. (My current puzzle is breaking a string of words into strings of n words each, which is a bit over my head right now.)

    So if I had to pick one of your options, it's to understand CTE's better. I already emailed out the results desired anyway. ๐Ÿ™‚

    ETA: Now that I've added in the restriction of RowNum = 1 in the anchor portion of the recursive CTE, I see that I could do what I did before, but just on MAX(RowNum) instead of worrying about the LEN(). Still looking for the single-query solution, though!

  • Hi jeffem,

    Ok, so first the CTE: you have not "missed" anything really, but you have found one of the problems associated with using a recursive CTE for this type of job ; knowing which output rows to keep!

    I am sure there are as many methods out there as there are varieties of apples, here is one:

    It relies on recording the depth of the recursion, simply by adding 1 each time, then on an extra sort to find the lowest level for each group (read EmpId and OrderDate) so that the results can finally be filtered.

    Here's this one method:

    ;WITH OrderRows AS (

    SELECT ROW_NUMBER() OVER(PARTITION BY EmpID, OrderDate ORDER BY EmpID, OrderID, OrderDate) AS RowNum,

    EmpID, OrderID, CONVERT(VARCHAR(10), OrderDate, 101) AS OrderDate

    FROM #Orders

    ),

    CTE AS (

    SELECT RowNum, EmpID, CAST(OrderID AS VARCHAR(500)) AS OrderID, OrderDate, 1 as OrderCount

    FROM OrderRows

    UNION ALL

    SELECT O.RowNum, C.EmpID, CAST(C.OrderID + ISNULL(', ' + O.OrderID, '') AS VARCHAR(500)), C.OrderDate, 1+OrderCount

    FROM OrderRows O

    INNER JOIN CTE C

    ON O.EmpID = C.EmpID

    AND O.OrderDate = C.OrderDate

    AND O.RowNum = C.RowNum + 1

    ), joined_rows AS (

    SELECT *, row_number() OVER(PARTITION BY EmpId,OrderDate ORDER BY OrderCount DESC) as Marker

    FROM CTE C

    )

    select EmpId,OrderId,OrderDate

    from joined_rows

    where Marker = 1

    order BY EmpId

    Notice the extra column called OrderCount in your CTE (CTE!)? It records the depth of the row (or it's count) so that in the new CTE called "joined_rows", I can generate yet another row number for each EmpID and OrderDate combination, ordered by the OrderCount DESC.

    This new "Marker" then contains the value 1 for the rows we need and we can select by that.

    This is horrible code, very wasteful and will grind to a halt on a decent size set of data.

    Take a look at the execution plan for that recursive ( :sick: :crying: ) cte:

    Now, let's take a look at one method (there will be many of these too) for doing this without a recursive CTE:

    First get a list of the EmpID and OrderDate combinations you need and then go grab the order list for each one...

    SELECT EmpID,OrderId,OrderDate

    FROM

    (

    SELECT EmpID,OrderDate AS OrderDateTime

    FROM #Orders

    GROUP BY EmpID,OrderDate

    ) AS EmpDates

    OUTER APPLY

    (

    SELECT ','+OrderId

    FROM #Orders AS Orders

    WHERE Orders.EmpID = EmpDates.EmpID

    AND Orders.OrderDate = EmpDates.OrderDateTime

    ORDER BY OrderId

    FOR XML PATH('') -- this concatenates the OrderIds

    ) AS OA(List)

    OUTER APPLY

    (

    -- This strips the leading comma from each list and strips the time element from the OrderDate

    SELECT STUFF(OA.List,1,1,''),CONVERT(CHAR(10),OrderDateTime,101)

    ) AS Result(OrderId,OrderDate)

    ORDER BY EmpID

    And the execution plan for this : (Indexing is obviously missing from this trivial example)

    What can you tell from those plans? Not a lot at first sight, but if you examine them you will see 30 scans of the orders table in the recursive cte and the comparitively low (still not great though) 9 scans in the alternative, xml path method.

    Now imagine scaling up the data set to 1 million rows... I bet the cte would not finish.

    Hope this helps.

    MM



    select geometry::STGeomFromWKB(0x0106000000020000000103000000010000000B0000001000000000000840000000000000003DD8CCCCCCCCCC0840000000000000003DD8CCCCCCCCCC08408014AE47E17AFC3F040000000000104000CDCCCCCCCCEC3F9C999999999913408014AE47E17AFC3F9C99999999991340000000000000003D0000000000001440000000000000003D000000000000144000000000000000400400000000001040000000000000F03F100000000000084000000000000000401000000000000840000000000000003D0103000000010000000B000000000000000000143D000000000000003D009E99999999B93F000000000000003D009E99999999B93F8014AE47E17AFC3F400000000000F03F00CDCCCCCCCCEC3FA06666666666FE3F8014AE47E17AFC3FA06666666666FE3F000000000000003D1800000000000040000000000000003D18000000000000400000000000000040400000000000F03F000000000000F03F000000000000143D0000000000000040000000000000143D000000000000003D, 0);

  • Forum Etiquette: How to post Reporting Services problems
  • [/url]
  • Forum Etiquette: How to post data/code on a forum to get the best help - by Jeff Moden
  • [/url]
  • How to Post Performance Problems - by Gail Shaw
  • [/url]

  • As magoo has said, there are better methods for this type of query than a recursive cte.

    Jason...AKA CirqueDeSQLeil
    _______________________________________________
    I have given a name to my pain...MCM SQL Server, MVP
    SQL RNNR
    Posting Performance Based Questions - Gail Shaw[/url]
    Learn Extended Events

  • Seรฑor Magoo,

    That is some terrific info and very easy to digest! I greatly appreciate the time you spent on that. It helps me understand CTE's a bit better, and now I see that I need to explore how to best use APPLY some more! Just trying to get more tools in my toolbox.

    Thanks again!

  • Viewing 7 posts - 1 through 6 (of 6 total)

    You must be logged in to reply to this topic. Login to reply