Grouping by min/max in a series

  • Yeah. I know.

    I wasn't sure about the amount of records to start with. I thought there might be only a few hundred of them.

    I tested your code on SQL Server 2005, and there was no need to go dynamic. It run equally fast!

    Good algorithm Jeff!

    But... MAX portion does not seem to work on SQL Server 2005.

    I had to drop the clustered index and recreate it descending.

    USE TempDB

    SET NOCOUNT ON

    IF OBJECT_ID('MyTable') IS NOT NULL

        DROP TABLE MyTable

    CREATE TABLE MyTable

                 (

                     Code INT IDENTITY(1,1),

                     Status CHAR(1) NOT NULL

                 )

    INSERT     MyTable

               (

                   Status

               )

    SELECT     TOP 100000 

               CAST(RAND(CAST(NEWID() AS VARBINARY)) * 3 AS INT) AS Status

    FROM       Master.dbo.SYSCOLUMNS AS sc1

    CROSS JOIN Master.dbo.SYSCOLUMNS AS sc2

    UPDATE MyTable

           SET Status = CASE STATUS

                           WHEN '0' THEN 'A'

                           WHEN '1' THEN 'S'

                           WHEN '2' THEN 'T'

                        END

    SELECT   TOP 100 *

    FROM     MyTable

    ORDER BY Code

    DECLARE @StartTime DATETIME

    SET @StartTime = CURRENT_TIMESTAMP

    SELECT   Code,

             Status

             CAST(NULL AS INT) AS MyMin,

             CAST(NULL AS INT) AS MyMax

    INTO     #MyHead

    FROM     MyTable

    ORDER BY Code,

             Status

    ALTER TABLE #MyHead

    ADD CONSTRAINT PK_MyHead_Min PRIMARY KEY CLUSTERED (Code, Status)

    DECLARE @PrevStatus CHAR(1),

            @PrevCode INT

    SELECT @PrevStatus = ' ',

           @PrevCode = 0

    UPDATE #MyHead

    SET    @PrevCode = MyMin = CASE Status

                                  WHEN @PrevStatus THEN @PrevCode

                                  ELSE Code

                               END

           @PrevStatus = Status

    FROM   #MyHead

    ALTER TABLE #MyHead

    DROP PK_MyHead_Min

    ALTER TABLE #MyHead

    ADD CONSTRAINT PK_MyHead_Max PRIMARY KEY CLUSTERED (Code DESC, Status DESC)

    SELECT @PrevStatus = ' ',

           @PrevCode = 0

    UPDATE #MyHead

    SET    @PrevCode = MyMax = CASE Status 

                                  WHEN @PrevStatus THEN @PrevCode

                                  ELSE Code 

                               END

           @PrevStatus = Status

    FROM   #MyHead

    SELECT   DISTINCT MyMin AS Min,

             MyMax AS Max,

             Status

    FROM     #MyHead

    ORDER BY MyMin

    DROP TABLE #MyHead

    PRINT DATEDIFF(ms,@StartTime,GETDATE())


    N 56°04'39.16"
    E 12°55'05.25"

  • Jeff,

    Thanks for taking the time to explain your algorithm.

     

  • Yes, the source table only has little over 400 rows, and the retructuring only needs to be done once a year. The table was also supposed to have unique 'codes', but I had to clean that up.

    Thank you ALL for goin at this.

    This truely is an addiction as I explained to my new intern. Once challenged, it's amazing the response, which proves it. Guess theres a little McGuiver in all of us.

     

  • Randy and Ken,

    Yeah, it's pretty much an addiction to me.   Thanks for the feedback... it's part of the addiction

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.
    "Change is inevitable... change for the better is not".

    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)
    Intro to Tally Tables and Functions

  • Peter,

    Ol' friend, thanks for the feedback.  I haven't been "blessed" with 2k5 yet, so I always wonder how these things work out there.

    Yes, you're correct.  Randy originally said that this was for something a bit less than 900 rows... you and I have been in this business long enough to know that that will change... so I always try to write with the idea that it could be a million rows a year from now.  It's part of that addiction Randy was talking about.

    Hadn't thought about dropping the clustered index and reforming it to eliminate the dynamic SQL although it didn't seem to add much overhead... kinda surprised that 2k5 didn't like the index for Max.  Really appreciate the rewrite you posted.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.
    "Change is inevitable... change for the better is not".

    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)
    Intro to Tally Tables and Functions

  • The bug feature with clustered index versus non-clustered index and the technique with

    SELECT @t = @t + ';' + Name

    FROM   Table1

    was discussed some time ago on the other forum we usually meet

    Too bad UPDATE does not come with ORDER BY...

    I like the algorithm, it is VERY powerful and speedy.

    But must be used with caution, as seen with SQL Server 2005. Who knows when MS change the specs again?

    Until next time, Jeff!

    Kudos

     


    N 56°04'39.16"
    E 12°55'05.25"

  • Heh... yeah... I remember that discussion... It's a bloody shame that Microsoft keeps changing features to bugs especially on things like this... no wonder lot's of folks are hesitant to make the switch... they would have to do full regression testing and validation to check for junk like this... some new fault with TOP is another of those nasty bugs they introduced in 2k5... lot's of powerful code is suddenly making flushing sounds...

    Hmmmm.... UPDATE with an ORDER BY... wonder if a derived table would do the trick there... probably requires a TOP 100 PERCENT which, I understand, has one of the new bugs features in 2k5.

    [Edit]... You're usually pretty thorough when it comes to testing but I've gotta be sure... Did you try the WITH (INDEX) in 2k5, as well?

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.
    "Change is inevitable... change for the better is not".

    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)
    Intro to Tally Tables and Functions

  • Yes. I did test with INDEX hint.

    In the code posted above, I omitted them becuase the bug feature of only following the clustered index anyway.

    At least in the UPDATE. For the SELECT part

    DECLARE @t VARCHAR

    SELECT @t = ISNULL(@t + ',', '') + CAST(c.Number AS VARCHAR)

    FROM (SELECT TOP 100 PERCENT Number FROM master..spt_values WHERE Name IS NULL AND Number BETWEEN 1 AND 9 ORDER BY Number) AS c

    works, regardless of clustered index.

     


    N 56°04'39.16"
    E 12°55'05.25"

Viewing 8 posts - 16 through 22 (of 22 total)

You must be logged in to reply to this topic. Login to reply