How to return a "start of group" column?

  • Suppose we have a table schema as

    create table Inventory(

    Name nvarchar(50) not null,

    Category nvarchar(50) not null,

    )

    then

    select Category, Name

    from Inventory

    order by Category ASC

    Can lead to a result like

    Category Name

    ---- ----

    C0 N0

    C0 N1

    C0 N2

    C1 N3

    C1 N4

    If what I want is a result like this

    Startof Category Name

    ---- ----

    C0 N0

    NULL N1

    NULL N2

    C1 N3

    NULL N4

    How should I do? Must I use a cursor?

    Thank you.

  • Hi

    You can check the row-number by category partition:

    DECLARE @Category TABLE (Category VARCHAR(50), Name VARCHAR(50))

    INSERT INTO @Category

    SELECT 'C0', 'N0'

    UNION SELECT 'C0', 'N1'

    UNION SELECT 'C0', 'N2'

    UNION SELECT 'C1', 'N0'

    UNION SELECT 'C1', 'N1'

    SELECT CASE WHEN ROW_NUMBER() OVER (PARTITION BY Category ORDER BY Name) = 1 THEN Category ELSE NULL END,

    Name

    FROM @Category

    Greets

    Flo

  • Thank you very much. It's gorgeous!

  • You're always welcome!

  • Generally this is more appropriately done on the front end, as it's usually a formatting/display issue.

    While Florian's solution works very nicely, in relational terms, the resulting rowset has three rows that have no category and two that do. It certainly can be done in the DB, often it's not the most appropriate place to do that kind of formatting.

    Gail Shaw
    Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
    SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

    We walk in the dark places no others will enter
    We stand on the bridge and no one may pass
  • Good answer flo - it does exactly what the OP asked for, and efficiently too, with one minor comment: The UNIONs should be UNION ALLs (constant scan instead of merge joins!) but I am sure that's just a typo. In any case, that part of the script was just re-creating the example input - so I'm just being a pedant really 😉

    I agree that formatting is often best done in the presentation (if indeed that is the requirement - my crystal ball is cloudy today); however, there are a million reasons why the hard-pressed database person might be asked to do this sort of thing, not all of them good, I suspect.

    Hey that's life, eh?

    /Paul

  • Paul White (4/4/2009)


    Good answer flo - it does exactly what the OP asked for, and efficiently too, with one minor comment: The UNIONs should be UNION ALLs (constant scan instead of merge joins!) but I am sure that's just a typo.

    No, it's not a typo, it's just an example. I've just been to lazy for the "ALL" 😀 but I will try thinking on it in future.

    so I'm just being a pedant really 😉

    In this case... yepp 😛

    I agree that formatting is often best done in the presentation (if indeed that is the requirement - my crystal ball is cloudy today); however, there are a million reasons why the hard-pressed database person might be asked to do this sort of thing, not all of them good, I suspect.

    I can completely confirm! Sure the formatting should be done in frond end but the world is not (yet) perfect... I have a - quiet ugly - inline function to format datetime values to many different styles. Yes, since I'm here I know I could redesign it with Calendar table but it does it's work and - as known - never change a running system. I don't really like this server side date formatting, but there are some business cases which require it.

    Greets

    Flo

  • Florian Reischl (4/4/2009)


    No, it's not a typo, it's just an example. I've just been to lazy for the "ALL" 😀 but I will try thinking on it in future.

    More coffee, Flo. 😉

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.
    "Change is inevitable... change for the better is not".

    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)
    Intro to Tally Tables and Functions

  • One thing I will suggest, if this does have to be done on the DB. Put an order by in place. As it currently stands, there's no guarantee that the rows will come out in the desired order. They probably will, but to be 100% sure all, put an appropriate order by

    SELECT CASE WHEN ROW_NUMBER() OVER (PARTITION BY Category ORDER BY Name) = 1 THEN Category ELSE NULL END AS StartOfCategory,

    Name

    FROM @Category

    ORDER BY Category, Name

    Gail Shaw
    Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
    SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

    We walk in the dark places no others will enter
    We stand on the bridge and no one may pass
  • GilaMonster (4/4/2009)


    Generally this is more appropriately done on the front end, as it's usually a formatting/display issue.

    While Florian's solution works very nicely, in relational terms, the resulting rowset has three rows that have no category and two that do. It certainly can be done in the DB, often it's not the most appropriate place to do that kind of formatting.

    I'm sorry that I havn't checked this thread these days:(

    I'm not under hard-pressure:) Currently, I just design my system to generate this "start of group" column in the client side! But actually I'm architecturing the system towards vNext and am very glad to put the right job in the right place. So I want to ask you to re-consider your opinion.

    I'm not agree with that the requirement/feature is merely something about formatting. If the returned set is a true set (unordered), then the consumer(client,middleware whatever) should first sort it in order to go to the next stage (keep a local variable, loop it, nullify the same group columns ). But if the sorting should be done in the db (we can use the index and we can do paging in order to return only necessary), I think this "start of group" column should be generated in the db also.

    I don't think some cells in this column are NULL while others are not will make the result table "un-relational".

    I'm very longing to hear your opinion.:-)

  • Soooo.... does that mean you're happy now?

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.
    "Change is inevitable... change for the better is not".

    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)
    Intro to Tally Tables and Functions

  • Sheng (4/6/2009)


    I'm sorry that I havn't checked this thread these days:(

    I'm not under hard-pressure:) Currently, I just design my system to generate this "start of group" column in the client side! But actually I'm architecturing the system towards vNext and am very glad to put the right job in the right place. So I want to ask you to re-consider your opinion.

    I'm not agree with that the requirement/feature is merely something about formatting. If the returned set is a true set (unordered), then the consumer(client,middleware whatever) should first sort it in order to go to the next stage (keep a local variable, loop it, nullify the same group columns ). But if the sorting should be done in the db (we can use the index and we can do paging in order to return only necessary), I think this "start of group" column should be generated in the db also.

    I don't think some cells in this column are NULL while others are not will make the result table "un-relational".

    I'm very longing to hear your opinion.:-)

    I have no objection to doing sorting in the DB, that is one of it's strengths. I have no objection with generating row numbers in the database, that's one of it's strengths. However the formatting of the 'startofgroup' to be blank in some cases and to show values in other cases is a formatting issue. It should not be done in the DB and, quite frankly, client-side tools are often better at suppressing 'duplicates' to do this kind of formatting.

    Gail Shaw
    Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
    SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

    We walk in the dark places no others will enter
    We stand on the bridge and no one may pass
  • Jeff Moden (4/6/2009)


    Soooo.... does that mean you're happy now?

    ...... I can't follow you.:-) I just want to discuss if I've made a right decision about generating this column in DB. I'll only be happy if I know what's right in theory and practice (the truth), whether I was wrong or right. Thanks!

  • GilaMonster (4/7/2009)


    I have no objection to doing sorting in the DB, that is one of it's strengths. I have no objection with generating row numbers in the database, that's one of it's strengths. However the formatting of the 'startofgroup' to be blank in some cases and to show values in other cases is a formatting issue. It should not be done in the DB and, quite frankly, client-side tools are often better at suppressing 'duplicates' to do this kind of formatting.

    When it's done in the DB, we use the expressive (descriptive) logic to achieve our goal.

    When it's done in the client, we use procedure algorithm.

    Myself is inclined to the logical one, because it's not prone to bring errors when the logic is right.But being objectivly, should there be any one win OBVIOUSLY?

    I think I'm off the way somehow:hehe:

  • Sheng (4/7/2009)


    Jeff Moden (4/6/2009)


    Soooo.... does that mean you're happy now?

    ...... I can't follow you.:-) I just want to discuss if I've made a right decision about generating this column in DB. I'll only be happy if I know what's right in theory and practice (the truth), whether I was wrong or right. Thanks!

    If I understand all of this correctly, marking a "start of group" for paging in the table is a bit counter productive in the face of changing data. True, it can make life simpler but you have to remember to update the marks if data changes in the table that may affect the ultimate desired paging order. If the data is mostly static, then I agree... it may be worth it.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.
    "Change is inevitable... change for the better is not".

    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)
    Intro to Tally Tables and Functions

Viewing 15 posts - 1 through 14 (of 14 total)

You must be logged in to reply to this topic. Login to reply