Cross Tabs and Pivots, Part 1 – Converting Rows to Columns

  • Jeffrey Williams (8/19/2008)


    Jeff Moden (8/19/2008)Yeap... 'bout six weeks ago if I recall correctly... the only thing missing for me was doing those bloody dual Pivots...

    Do you happen to have the URL for that thread? I'd lost track of it and would like to have it posted here because a lot of good folks, like yourself, had some pretty good ideas there.

    Wasn't too hard to find - but then again, I don't have over 10,000 posts to search through :w00t:

    Here it is: http://qa.sqlservercentral.com/Forums/Topic521489-338-1.aspx

    Very cool... take a look at the very first post on this thread for this article, now... I'd forgotten it was you that first managed to accomplish the double pivot... nicely done! I'll edit the link into the article, too. Thanks for finding it...

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.
    "Change is inevitable... change for the better is not".

    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)
    Intro to Tally Tables and Functions

  • Jeff Moden (8/19/2008)


    Jeffrey Williams (8/19/2008)


    Jeff Moden (8/19/2008)Yeap... 'bout six weeks ago if I recall correctly... the only thing missing for me was doing those bloody dual Pivots...

    Do you happen to have the URL for that thread? I'd lost track of it and would like to have it posted here because a lot of good folks, like yourself, had some pretty good ideas there.

    Wasn't too hard to find - but then again, I don't have over 10,000 posts to search through :w00t:

    Here it is: http://qa.sqlservercentral.com/Forums/Topic521489-338-1.aspx

    Very cool... take a look at the very first post on this thread for this article, now... I'd forgotten it was you that first managed to accomplish the double pivot... nicely done! I'll edit the link into the article, too. Thanks for finding it...

    I've gotta thank Peter Larson for the tip on pre-aggregation... I always did that for readability but he showed that it can help performance, as well.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.
    "Change is inevitable... change for the better is not".

    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)
    Intro to Tally Tables and Functions

  • Another Jewel in your crown Jeff. Great article. Wonderful...:)

  • Jeff Moden (8/19/2008)


    Chris Morris (8/19/2008)


    Really nice to read, Jeff, and absolutely spot on.

    I've had to do pivots in the last couple of weeks, with an unknown number of output (transposed) columns with unknown names, as in the following. It will be interesting to see your solution, you have a knack of coming up with "why the heck didn't I think of that?" solutions!

    Thanks, Chris... I really appreciate the feedback and the example. I'm gonna play with it. 🙂

    Say, would you mind a huge favor please? Your good example code is a bit wide and makes it so everyone has to scroll right to read everything. Could I trouble you to "break" the line that starts with "SET @SQLstr = 'SELECT HospitalName" a couple of times just before a plus sign or two just to narrow it up? Thanks a bunch.

    Sure!

    [font="Courier New"]

    SET NOCOUNT ON

    CREATE TABLE #EDIInvoices (HospitalName VARCHAR(20), Purchaser VARCHAR(20), Sales money)

    INSERT INTO #EDIInvoices (HospitalName, Purchaser, Sales)

    SELECT 'Guy''s', 'BUPA', 10001 UNION ALL

    SELECT 'Guy''s', 'Microsoft', 10002 UNION ALL

    SELECT 'Guy''s', 'NIH', 10003 UNION ALL

    SELECT 'Tommy''s', 'BUPA', 20001 UNION ALL

    SELECT 'Tommy''s', 'Microsoft', 20002 UNION ALL

    SELECT 'Tommy''s', 'NIH', 20003 UNION ALL

    SELECT 'Tommy''s', 'Cornhill', 20004 UNION ALL

    SELECT 'Raigmore', 'BUPA', 30001 UNION ALL

    SELECT 'Raigmore', 'Microsoft', 30002 UNION ALL

    SELECT 'Raigmore', 'NIH', 30003 UNION ALL

    SELECT 'Raigmore', 'Cornhill', 30004 UNION ALL

    SELECT 'Raigmore', 'HPA', 30005 UNION ALL

    SELECT 'Raigmore', 'RaigmoreOnly', 30006 UNION ALL

    SELECT 'Yorkhill', 'BUPA', 40001 UNION ALL

    SELECT 'Yorkhill', 'Microsoft', 40002 UNION ALL

    SELECT 'Yorkhill', 'Cornhill', 40004 UNION ALL

    SELECT 'Yorkhill', 'HPA', 40005 UNION ALL

    SELECT 'Yorkhill', 'YorkhillOnly', 40007

    --SELECT * FROM #EDIInvoices -- Sanity check

    -- PIVOT the results

    DECLARE @SQLstr VARCHAR(5000)

    SET @SQLstr = ''

    SELECT @SQLstr = @SQLstr + CHAR(10) +

       '   MAX(CASE Purchaser WHEN ''' + Purchaser + ''' THEN Sales ELSE 0 END) AS [' + Purchaser + '], '

    FROM (SELECT Purchaser FROM #EDIInvoices GROUP BY Purchaser) d

    SET @SQLstr = 'SELECT HospitalName, ' +

       REVERSE(STUFF(REVERSE(@SQLstr),1,2,'')) + ' ' + CHAR(10) +

       'INTO ##EDIInvoicesPivot ' + CHAR(10) +

       'FROM #EDIInvoices ' + CHAR(10) +

       'GROUP BY HospitalName'

    PRINT @SQLstr -- sanity check

    EXECUTE (@SQLstr)

    SELECT * FROM ##EDIInvoicesPivot

    DROP TABLE #EDIInvoices

    DROP TABLE ##EDIInvoicesPivot[/font]

    Statement generated:

    [font="Courier New"]

    SELECT HospitalName,

       MAX(CASE Purchaser WHEN 'BUPA' THEN Sales ELSE 0 END) AS [BUPA],

       MAX(CASE Purchaser WHEN 'Cornhill' THEN Sales ELSE 0 END) AS [Cornhill],

       MAX(CASE Purchaser WHEN 'HPA' THEN Sales ELSE 0 END) AS [HPA],

       MAX(CASE Purchaser WHEN 'Microsoft' THEN Sales ELSE 0 END) AS [Microsoft],

       MAX(CASE Purchaser WHEN 'NIH' THEN Sales ELSE 0 END) AS [NIH],

       MAX(CASE Purchaser WHEN 'RaigmoreOnly' THEN Sales ELSE 0 END) AS [RaigmoreOnly],

       MAX(CASE Purchaser WHEN 'YorkhillOnly' THEN Sales ELSE 0 END) AS [YorkhillOnly]

    INTO ##EDIInvoicesPivot

    FROM #EDIInvoices

    GROUP BY HospitalName[/font]

    Output:

    HospitalName BUPA Cornhill HPA Microsoft NIH RaigmoreOnly YorkhillOnly

    -------------- ------------ ------------ ------------ ------------ ------------ -------------- --------------

    Guy's 10001.0000 .0000 .0000 10002.0000 10003.0000 .0000 .0000

    Raigmore 30001.0000 30004.0000 30005.0000 30002.0000 30003.0000 30006.0000 .0000

    Tommy's 20001.0000 20004.0000 .0000 20002.0000 20003.0000 .0000 .0000

    Yorkhill 40001.0000 40004.0000 40005.0000 40002.0000 .0000 .0000 40007.0000

    Cheers

    ChrisM

    “Write the query the simplest way. If through testing it becomes clear that the performance is inadequate, consider alternative query forms.” - Gail Shaw

    For fast, accurate and documented assistance in answering your questions, please read this article.
    Understanding and using APPLY, (I) and (II) Paul White
    Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden

  • There's nothing like seeing the latest go around of PMS (pivot madness syndrome). And it seems time after time, year after year, release after release there's a contest to see who is the maddest crosstab coder (hatter) of them all 🙂 Code and more code is a crutch for those who don't use abstractions and since a (dynamic) crosstab abstraction doesn't quite fit in with the sql server transaction log you'd think a wise user might look elsewhere for a solution. As if other software over the last 30 years has totally neglected the subject. If you can live with an over the counter med instead of a prescription for something heavy duty and $$ you can check out the Rac utility. While it will simplify the task, which is its intent, it will deprive the user of showing he is just as crazy as those who came 5 years before him. Of course this frees the user to go on to some other form of madness 🙂

    www.Rac4sql.net

    www.beyondsql.blogspot.com

  • Chris Morris (8/20/2008)


    Jeff Moden (8/19/2008)


    Chris Morris (8/19/2008)


    Really nice to read, Jeff, and absolutely spot on.

    I've had to do pivots in the last couple of weeks, with an unknown number of output (transposed) columns with unknown names, as in the following. It will be interesting to see your solution, you have a knack of coming up with "why the heck didn't I think of that?" solutions!

    Thanks, Chris... I really appreciate the feedback and the example. I'm gonna play with it. 🙂

    Say, would you mind a huge favor please? Your good example code is a bit wide and makes it so everyone has to scroll right to read everything. Could I trouble you to "break" the line that starts with "SET @SQLstr = 'SELECT HospitalName" a couple of times just before a plus sign or two just to narrow it up? Thanks a bunch.

    Sure!

    Heh... thanks, but I meant for you to edit your original post so it would bring the width down. Sorry I wasn't clear... thanks for trying. 🙂

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.
    "Change is inevitable... change for the better is not".

    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)
    Intro to Tally Tables and Functions

  • rog pike (8/20/2008)


    There's nothing like seeing the latest go around of PMS (pivot madness syndrome). And it seems time after time, year after year, release after release there's a contest to see who is the maddest crosstab coder (hatter) of them all 🙂 Code and more code is a crutch for those who don't use abstractions and since a (dynamic) crosstab abstraction doesn't quite fit in with the sql server transaction log you'd think a wise user might look elsewhere for a solution. As if other software over the last 30 years has totally neglected the subject. If you can live with an over the counter med instead of a prescription for something heavy duty and $$ you can check out the Rac utility. While it will simplify the task, which is its intent, it will deprive the user of showing he is just as crazy as those who came 5 years before him. Of course this frees the user to go on to some other form of madness 🙂

    www.Rac4sql.net

    www.beyondsql.blogspot.com

    Still spamming I see Rog. 😛

    Ok... show us the code that you would do it in RAC with... please use the double aggregate method from the article. And, I don't mean a link... wer need to compare apples to apples and there's test data in the article all ready to go. Show us how to do the double aggregate cross tab using RAC code. Do it right (comments about how the code you write works and the code) and I promise I'll stop riding your donkey about spamming your product on us.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.
    "Change is inevitable... change for the better is not".

    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)
    Intro to Tally Tables and Functions

  • Anirban Paul (8/20/2008)


    Another Jewel in your crown Jeff. Great article. Wonderful...:)

    Thanks, Anirban. I appreciate it... 🙂

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.
    "Change is inevitable... change for the better is not".

    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)
    Intro to Tally Tables and Functions

  • When one wants no NULLs in the output, is it not faster to put that in the base FROM "derived table", view, or query... the same as mentioned when converting dates to months or quarters? Why NULL check at the cross tab or PIVOT level? I do not know about speed, but it sure is easier to read not having ISNULL or COALESCE everywhere. I tend to lean on the idea that data cleansing should not be happening at the report, cube, or group level, and that these levels should merely roll up and present the data.

    Also, trying to compare "cross tab" to PIVOT seems arbitrary to me: like apples to oranges. PIVOT is highly useful where the result PIVOT column names are dynamic, whereas a "cross tab" to do the same would be... complicated... I mean, that is the whole point of having a PIVOT. But it sounds like you may probably cover that in a later section. The only reason in the examples given so far to specifically SELECT the fields in a PIVOT(s) is to demonstrate 1) COALESCE the output - see first paragraph, and 2) Alias the fields - if the column names are instead derived from a lookup table in the "derived table" FROM clause, or if PIVOTing GROUPed data to be used as your column names, hard aliasing goes away. For results that require all possible PIVOT columns in the output, even when column values have no data, a cartesian join against all the output columns solves that. For example, say I wanted to show a 120-day spread curve model on certain datapoint, but data was not logged or received every single day of the 120 days, I use a cartesian join against a temp (or fixed) dataset that numbers from 1 to 120, and then gather the PIVOT results to the days they land on (and handling the NULL values in the process). Bring on some graphical tool to display the data and the Execs go, "WOOOOOH... AHHHHH!" Do this for all datapoints at once and suddenly the CEO knows your first name... but I digress.

    Finally, the metion by another poster about MS ACCESS since 1992 is a fine one. As a user since MSA v2.0 and MS SQL Server since version 6... well, it is a fine mention, and completely off topic. 🙂

  • Wanted to see what the difference aggregating by scalar functions was to see how it would perform based on a native datetime so I ran

    PRINT '=============== "Normal" Cross Tab ==============='

    SET STATISTICS IO ON

    SET STATISTICS TIME ON

    SELECT Company,

    Year,

    SUM(CASE WHEN Quarter = 1 THEN Amount ELSE 0 END) AS Q1Amt,

    SUM(CASE WHEN Quarter = 1 THEN Quantity ELSE 0 END) AS Q1Qty,

    SUM(CASE WHEN Quarter = 2 THEN Amount ELSE 0 END) AS Q2Amt,

    SUM(CASE WHEN Quarter = 2 THEN Quantity ELSE 0 END) AS Q2Qty,

    SUM(CASE WHEN Quarter = 3 THEN Amount ELSE 0 END) AS Q3Amt,

    SUM(CASE WHEN Quarter = 3 THEN Quantity ELSE 0 END) AS Q3Qty,

    SUM(CASE WHEN Quarter = 4 THEN Amount ELSE 0 END) AS Q4Amt,

    SUM(CASE WHEN Quarter = 4 THEN Quantity ELSE 0 END) AS Q4Qty,

    SUM(Amount) AS TotalAmt,

    SUM(Quantity) AS TotalQty

    FROM #SomeTable3

    GROUP BY Company, Year

    ORDER BY Company, Year

    SET STATISTICS TIME OFF

    SET STATISTICS IO OFF

    PRINT '=============== "UGLY inline datefunction" Cross Tab ==============='

    SET STATISTICS IO ON

    SET STATISTICS TIME ON

    SELECT Company,

    Year(date),

    SUM(CASE WHEN month(date) between 1 and 3 THEN Amount ELSE 0 END) AS Q1Amt,

    SUM(CASE WHEN month(date) between 1 and 3 THEN Quantity ELSE 0 END) AS Q1Qty,

    SUM(CASE WHEN month(date) between 4 and 6 THEN Amount ELSE 0 END) AS Q2Amt,

    SUM(CASE WHEN month(date) between 4 and 6 THEN Quantity ELSE 0 END) AS Q2Qty,

    SUM(CASE WHEN month(date) between 7 and 9 THEN Amount ELSE 0 END) AS Q3Amt,

    SUM(CASE WHEN month(date) between 7 and 9 THEN Quantity ELSE 0 END) AS Q3Qty,

    SUM(CASE WHEN month(date) between 10 and 12 THEN Amount ELSE 0 END) AS Q4Amt,

    SUM(CASE WHEN month(date) between 10 and 12 THEN Quantity ELSE 0 END) AS Q4Qty,

    SUM(Amount) AS TotalAmt,

    SUM(Quantity) AS TotalQty

    FROM #SomeTable3

    GROUP BY Company, Year(date)

    ORDER BY Company, Year(date)

    SET STATISTICS TIME OFF

    SET STATISTICS IO OFF

    Which gave

    =============== "Normal" Cross Tab ===============

    CPU time = 2656 ms, elapsed time = 1506 ms.

    =============== "UGLY inline datefunction" Cross Tab ===============

    CPU time = 5016 ms, elapsed time = 2828 ms.

  • dphillips (8/20/2008)


    When one wants no NULLs in the output, is it not faster to put that in the base FROM "derived table", view, or query... the same as mentioned when converting dates to months or quarters? Why NULL check at the cross tab or PIVOT level? I do not know about speed, but it sure is easier to read not having ISNULL or COALESCE everywhere. I tend to lean on the idea that data cleansing should not be happening at the report, cube, or group level, and that these levels should merely roll up and present the data.

    Also, trying to compare "cross tab" to PIVOT seems arbitrary to me: like apples to oranges. PIVOT is highly useful where the result PIVOT column names are dynamic, whereas a "cross tab" to do the same would be... complicated... I mean, that is the whole point of having a PIVOT. But it sounds like you may probably cover that in a later section. The only reason in the examples given so far to specifically SELECT the fields in a PIVOT(s) is to demonstrate 1) COALESCE the output - see first paragraph, and 2) Alias the fields - if the column names are instead derived from a lookup table in the "derived table" FROM clause, or if PIVOTing GROUPed data to be used as your column names, hard aliasing goes away. For results that require all possible PIVOT columns in the output, even when column values have no data, a cartesian join against all the output columns solves that. For example, say I wanted to show a 120-day spread curve model on certain datapoint, but data was not logged or received every single day of the 120 days, I use a cartesian join against a temp (or fixed) dataset that numbers from 1 to 120, and then gather the PIVOT results to the days they land on (and handling the NULL values in the process). Bring on some graphical tool to display the data and the Execs go, "WOOOOOH... AHHHHH!" Do this for all datapoints at once and suddenly the CEO knows your first name... but I digress.

    Finally, the metion by another poster about MS ACCESS since 1992 is a fine one. As a user since MSA v2.0 and MS SQL Server since version 6... well, it is a fine mention, and completely off topic. 🙂

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.
    "Change is inevitable... change for the better is not".

    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)
    Intro to Tally Tables and Functions

  • I think your response got lost, Jeff.

    [font="Times New Roman"]-- RBarryYoung[/font], [font="Times New Roman"] (302)375-0451[/font] blog: MovingSQL.com, Twitter: @RBarryYoung[font="Arial Black"]
    Proactive Performance Solutions, Inc.
    [/font]
    [font="Verdana"] "Performance is our middle name."[/font]

  • rbarryyoung (8/21/2008)


    I think your response got lost, Jeff.

    I think Jeff liked that post he wanted to be able to find it easily in his list of posts.

  • Hi,

    Just wanted to say thanks for the great article - I myself was actually one of the noobies who posted in the forums about this sort of thing. Looking forward to the dynamic cross-tab article(s)!

    Matt

  • Well, it seems I have nominated myself for the virtual public whipping. I have used Crosstab AKA Pivot queries in MS Access for so many years, that when I learned MS had added PIVOT to SQL Svr 2005, I thought, "It's about time." I mean, they've always been wickedly simple to run together. For the last 3.5 years I've been devoting most of my work efforts on programming services in .NET hitting a number of servers including 2005 admin, etc. For the last 6 months I've filled a position working with SSIS. Just this week we've begun to dive into data warehousing for the reporting, so I am pretty new to that. I came upon this topic as I was about (NOT) to implement a report based on a crosstab of some datasources and their respective returns... such as I have done sooo many times in the past - with other tools. BzzzzdFzzzt. Negative. So that assumptive, "Yeah I can build you a report set like this in no time flat!" was yesterday. I have yet to leave work... literally 30 hours later and a lot of blood-shot eye searches of the net. I refuse to resort to Access as we want this on our SSRS web server... with the other deployed (and far simpler) reports.

    Ack. Some angry and bitter crow here. Why, oh why would they implement a command well known and awaited by so many and _not_ give it the same TRANSFORM functionality... ack... the whole point of a PIVOT...

    Well, I cannot blame MS or anyone for my complete ignorance on the subject of PIVOT capability in SQL Server 2005. Without the TRANSFORM counterpart, I see absolutely no reason to use it in it's current form. I would gladly hear anyone having an alternate opinion.

    I do find the crosstab work-arounds and variations on those work-arounds aplenty.

    I wanted to retract with Jeff Moden before posting this publicly, but his mailbox here is full.

    One thing I am highly curious about: what was it that may have intrigued you about my post Jeff, that you held off replying? Double checking Pivot info? Something about the cartesian join? Maybe the nukeing the nulls before they bubble up to the top layer? Curve Modelling? No time? Or just giving me time to dwell...

    I await your next article...

    My apologies to the community for my ignorance... I shall endeavor to be more careful.

Viewing 15 posts - 46 through 60 (of 243 total)

You must be logged in to reply to this topic. Login to reply