Find which rows have a column value that is less than the previous row's column value

  • SQL Kiwi (6/1/2012)


    Matt Miller (#4) (6/1/2012)


    A common stumbling block is to think there is some physical order to the table (there isn't).

    Of course there is. You just can't depend on results coming back in that physical order 🙂

    DECLARE @Example AS TABLE

    (

    ColValue integer NOT NULL

    );

    INSERT INTO @Example

    (ColValue)

    VALUES

    (23),

    (123),

    (233),

    (32),

    (45),

    (124),

    (32);

    SELECT

    e.*,

    file_num = CONVERT(integer,CONVERT(binary(2),REVERSE(SUBSTRING(%%PhysLoc%%,5,2)))),

    page_num = CONVERT(integer,CONVERT(binary(4),REVERSE(SUBSTRING(%%PhysLoc%%,1,4)))),

    slot_num = CONVERT(integer,CONVERT(binary(2),REVERSE(SUBSTRING(%%PhysLoc%%,7,2))))

    FROM @Example AS e

    ORDER BY

    file_num,

    page_num,

    slot_num

    OPTION (RECOMPILE);

    As an aside - if you have a healthy table (i.e anything over a couple hundred records), you will want to avoid table variables. Table variables tend to play nasty tricks on execution plans (estimated # of rows in a table variable stay pegged at 1), so the engine can't optimize for "large" table variables.

    Well it can a bit. See the example above (the number of estimated rows from the table variable in the actual plan is 7). If a statement-level recompile occurs (for whatever reason) the optimizer will see the run-time cardinality of the table variable. Table variables do not, however, currently support statistics (even on indexes). Where a logical query has cost-based physical execution choices that benefit from statistical information, temporary tables are a better choice. If a trivial plan is possible, creating statistics on the temporary table just adds overhead.

    Oh - now THAT's a nice trick. I have a few things that interact with user-defined types (table valued procs, etc...) which might beneift from that.

    ----------------------------------------------------------------------------------
    Your lack of planning does not constitute an emergency on my part...unless you're my manager...or a director and above...or a really loud-spoken end-user..All right - what was my emergency again?

  • Jeff-

    This really confused me for a while, until I made an assumption that you meant "can't". Assumption correct? If not, could you help me understand how you mean it?

  • You could always upgrade to 2012 and use one of the new window functions 🙂

    http://msdn.microsoft.com/en-us/library/hh231256.aspx

    Jared
    CE - Microsoft

  • dwain.c (6/1/2012)


    jeffem (6/1/2012)


    Dwain, good job on accounting for the missing IDs! 🙂

    Just a note about the performance: I did a comparison, and using a table variable will give a bit of a performance advantage (15% or so, for reasonable numbers of records) to your method. If this were in an actual table (temp, or otherwise) with a good, covering index, then the execution plan between our two solutions is essentially identical.

    Can you post your comparison? I'm confused whether you're comparing actual run results (e.g., with Jeff's 1M row test harness) or just comparing execution plans. I'd do it myself but I'm kinda tied up on something at the moment, but if I don't see something posted maybe I'll come back to it.

    I'm really just curious because I saw a subquery version like mine blast the heck out of an alternative something like yours (no comparison here intended of course) before, where the execution plans rated them about the same.

    I use table variables for convenience so I don't have temp tables littering my sandbox if I forget to drop them or something. I agree they don't always end up rating the same as using an actual table. And with good indexing, it could completely reverse the results.

    At first, I only compared the execution plans, which is all I intended to represent in my post. So, after reading your post, I did some comparisons.

    Scenarios: Unindexed; indexed by key only; fully covered index

    Results: Sorted; Unsorted

    Dataset: 2-column table with an int key column, and a CHAR(30) text column (populated with random characters from the NEWID() function)

    What I found really surprised me, but it makes sense in hindsight. With no indexes, the TOP clause chokes, as it has to sort by the key in order to produce the correct TOP 1 record. As a comparison, with 1M records in the table, the INNER JOIN returned 500,333 records in 0:05 (that REALLY surprised me), while I stopped the CTE query after 2:00, and it had produced 14,415 records. I cleared the cache, added an index on the key column, and then it took 0:05 for me to return the results and 0:04 for you to return yours, unsorted. With sorting, they were both at 0:07 (yours was still slightly faster, but they rounded to be the same whole number of seconds).

    Where they separated was when scaling to even larger datasets. It seems like (on my machine), the break-even point between the two, with this amount and types of data, is around 900K records. So, the 1M test harness shows a very slight edge to yours. When I increased to 2M and 4M, the differences grew.

    For 2M, the unsorted results list was still about the same, with mine completing in 0:13, and yours in 0:14. But when sorting, yours completes in 0:18, and mine in 0:24.

    For 4M, my unsorted returned in 0:23, and yours in 0:18, and the sorted results returned in 0:43 and 0:18, respectively.

    So, not surprising, the INNER JOIN grows cumbersome on large sets of data whereas the CTE handles it easily. But because of the sorting required with the TOP N, if it's not indexed, it doesn't really work at all.

  • SQLKnowItAll (6/4/2012)


    You could always upgrade to 2012 and use one of the new window functions 🙂

    http://msdn.microsoft.com/en-us/library/hh231256.aspx

    Just don't assume they will always be 'better':

    http://sqlblog.com/blogs/michael_zilberstein/archive/2012/03/14/42332.aspx

  • jeffem (6/4/2012)


    dwain.c (6/1/2012)


    jeffem (6/1/2012)


    Dwain, good job on accounting for the missing IDs! 🙂

    Just a note about the performance: I did a comparison, and using a table variable will give a bit of a performance advantage (15% or so, for reasonable numbers of records) to your method. If this were in an actual table (temp, or otherwise) with a good, covering index, then the execution plan between our two solutions is essentially identical.

    Can you post your comparison? I'm confused whether you're comparing actual run results (e.g., with Jeff's 1M row test harness) or just comparing execution plans. I'd do it myself but I'm kinda tied up on something at the moment, but if I don't see something posted maybe I'll come back to it.

    I'm really just curious because I saw a subquery version like mine blast the heck out of an alternative something like yours (no comparison here intended of course) before, where the execution plans rated them about the same.

    I use table variables for convenience so I don't have temp tables littering my sandbox if I forget to drop them or something. I agree they don't always end up rating the same as using an actual table. And with good indexing, it could completely reverse the results.

    At first, I only compared the execution plans, which is all I intended to represent in my post. So, after reading your post, I did some comparisons.

    Scenarios: Unindexed; indexed by key only; fully covered index

    Results: Sorted; Unsorted

    Dataset: 2-column table with an int key column, and a CHAR(30) text column (populated with random characters from the NEWID() function)

    What I found really surprised me, but it makes sense in hindsight. With no indexes, the TOP clause chokes, as it has to sort by the key in order to produce the correct TOP 1 record. As a comparison, with 1M records in the table, the INNER JOIN returned 500,333 records in 0:05 (that REALLY surprised me), while I stopped the CTE query after 2:00, and it had produced 14,415 records. I cleared the cache, added an index on the key column, and then it took 0:05 for me to return the results and 0:04 for you to return yours, unsorted. With sorting, they were both at 0:07 (yours was still slightly faster, but they rounded to be the same whole number of seconds).

    Where they separated was when scaling to even larger datasets. It seems like (on my machine), the break-even point between the two, with this amount and types of data, is around 900K records. So, the 1M test harness shows a very slight edge to yours. When I increased to 2M and 4M, the differences grew.

    For 2M, the unsorted results list was still about the same, with mine completing in 0:13, and yours in 0:14. But when sorting, yours completes in 0:18, and mine in 0:24.

    For 4M, my unsorted returned in 0:23, and yours in 0:18, and the sorted results returned in 0:43 and 0:18, respectively.

    So, not surprising, the INNER JOIN grows cumbersome on large sets of data whereas the CTE handles it easily. But because of the sorting required with the TOP N, if it's not indexed, it doesn't really work at all.

    I am glad you thoroughly checked. Sorry that I didn't have the time myself. Now you should be able to choose the one that is best for your case.


    My mantra: No loops! No CURSORs! No RBAR! Hoo-uh![/I]

    My thought question: Have you ever been told that your query runs too fast?

    My advice:
    INDEXing a poor-performing query is like putting sugar on cat food. Yeah, it probably tastes better but are you sure you want to eat it?
    The path of least resistance can be a slippery slope. Take care that fixing your fixes of fixes doesn't snowball and end up costing you more than fixing the root cause would have in the first place.

    Need to UNPIVOT? Why not CROSS APPLY VALUES instead?[/url]
    Since random numbers are too important to be left to chance, let's generate some![/url]
    Learn to understand recursive CTEs by example.[/url]
    [url url=http://www.sqlservercentral.com/articles/St

  • SQL Kiwi (6/4/2012)


    SQLKnowItAll (6/4/2012)


    You could always upgrade to 2012 and use one of the new window functions 🙂

    http://msdn.microsoft.com/en-us/library/hh231256.aspx

    Just don't assume they will always be 'better':

    http://sqlblog.com/blogs/michael_zilberstein/archive/2012/03/14/42332.aspx

    Who said I assumed that? 🙂 It is just another option (Of course, there are several other considerations to take into account before upgrading, and not really suggested to solve 1 query.)

    Jared
    CE - Microsoft

Viewing 7 posts - 16 through 21 (of 21 total)

You must be logged in to reply to this topic. Login to reply