Find which rows have a column value that is less than the previous row's column value

Question

Post reply

Find which rows have a column value that is less than the previous row's column value

Pianoman

SSC Veteran

Points: 232
More actions
May 31, 2012 at 9:50 am

#258948

Hi Guys,
I know this should be simple to code but i've hit a wall trying to produce something simple that will do the following:-
I need to traverse through a table to find which rows have a column value that is less than the previous row's column value
e.g.
ColValue
23
123
233
32 ----- Bingo! This value is less than previous value
45
124
32 ----Bingo! This value is less than previous value
and so on
Any ideas?
Thanks,
Pianoman

Viewing 15 posts - 1 through 15 (of 21 total)

You must be logged in to reply to this topic. Login to reply

ColdCoffee SSC-Dedicated Points: 39971 More actions · Answer 1

How about this?

DECLARE @tab TABLE

( ID INT , ColValue INT)

;

INSERT INTO @tab (ID, ColValue)

SELECT 1, 23

UNION ALL SELECT 2, 123

UNION ALL SELECT 3, 233

UNION ALL SELECT 4, 32

UNION ALL SELECT 5, 45

UNION ALL SELECT 6, 124

UNION ALL SELECT 7, 32

;

WITH rCTE AS

(

SELECT T.ID , T.ColValue

, 0 AS Indicator

FROM @tab T

WHERE T.ID = 1

UNION ALL

SELECT Base.ID , BASE.ColValue

, CASE WHEN BasE.ColValue < R.ColValue THEN 1

ELSE 0

END

FROM @tab Base

JOIN rCTE R

ON R.ID + 1 = Base.ID

)

SELECT *

FROM rCTE

WHERE Indicator = 1

jeffem Ten Centuries Points: 1222 More actions · Answer 2

If you want something simpler than ColdCoffee's solution, but not nearly as efficient (according to the estimated execution plan, unless you index the join and include the return values), using the same @tab table variable...

SELECT *

FROM @tab A

INNER JOIN @tab B

ON A.ID = B.ID - 1

WHERE A.ColValue > B.ColValue

Pianoman SSC Veteran Points: 232 More actions · Answer 3

Thanks ColdCoffee and Jeffem - Appreciate the replies. Will give these suggestions a try.

My original attempt to code this problem ended up being a ugly cursor - I knew the would be a more elegant way.

Hope to return the favour (SQL engineering more my forte.)

Cheers!

Luis Cazares SSC Guru Points: 183686 More actions · Answer 4

There can be a small problem with Jeffem and ColdCoffee's solution.

If you miss an ID value (most common scenario is when you delete a record and the identity does not use the value anymore)

For example:

INSERT INTO @tab (ID, ColValue)

SELECT 1, 23

UNION ALL SELECT 2, 123

UNION ALL SELECT 3, 233

UNION ALL SELECT 5, 32

UNION ALL SELECT 6, 45

UNION ALL SELECT 7, 124

UNION ALL SELECT 8, 32

;

ColdCoffee's solution won't show any value and jeffem's solution will only show one value.

I have the same problem and used something like this, but I'm concerned about the performance.

DECLARE @tab TABLE

( ID INT , ColValue INT)

;

INSERT INTO @tab (ID, ColValue)

SELECT 1, 23

UNION ALL SELECT 2, 123

UNION ALL SELECT 3, 233

UNION ALL SELECT 4, 32

UNION ALL SELECT 5, 45

UNION ALL SELECT 6, 124

UNION ALL SELECT 7, 32

;

WITH rCTE AS

(

SELECT A.ID, A.ColValue, MAX( B.ID) AS prev_ID

FROM @tab A

INNER JOIN @tab B

ON A.ID > B.ID

GROUP BY A.ColValue, A.ID

)

SELECT *

FROM rCTE A

INNER JOIN @tab B

ON A.prev_ID = B.ID

WHERE A.ColValue < B.ColValue

Luis C.
General Disclaimer:
Are you seriously taking the advice and code from someone from the internet without testing it? Do you at least understand it? Or can it easily kill your server?

How to post data/code on a forum to get the best help: Option 1 / Option 2

jeffem Ten Centuries Points: 1222 More actions · Answer 5

A good point! It seems to me the best way to handle it might be to create a ROW_NUMBER(), to have perfectly sequential IDs, then use the solutions presented before. But that requires the data to be sortable exactly as the comparison is desired.

I more wanted to supply the logic without getting into those details, which was a bit irresponsible. But I think ROW_NUMBER() is the best way to go.

Dwain Camps SSC Guru Points: 86913 More actions · Answer 6

You can also do it with a subquery like this:

DECLARE @tab TABLE

( ID INT , ColValue INT)

;

INSERT INTO @tab (ID, ColValue)

SELECT 1, 23

UNION ALL SELECT 2, 123

UNION ALL SELECT 3, 233

UNION ALL SELECT 5, 32

UNION ALL SELECT 6, 45

UNION ALL SELECT 7, 124

UNION ALL SELECT 8, 32

;

;WITH CTE AS (

SELECT ID, ColValue

,(SELECT TOP 1 t2.ColValue

FROM @tab t2

WHERE t1.ID > t2.ID

ORDER BY ID DESC) PrevRow

FROM @tab t1

)

SELECT ID, ColValue

FROM CTE

WHERE 1=CASE WHEN PrevRow IS NULL THEN 0 WHEN PrevRow<ColValue THEN 0 ELSE 1 END

Just guessing mind you but because of the TOP clause in the subquery this should perform better than joining the table onto itself.

Note that this solution also handles the case of missing IDs.

My mantra: No loops! No CURSORs! No RBAR! Hoo-uh![/I]
My thought question: Have you ever been told that your query runs too fast?

My advice:
INDEXing a poor-performing query is like putting sugar on cat food. Yeah, it probably tastes better but are you sure you want to eat it?
The path of least resistance can be a slippery slope. Take care that fixing your fixes of fixes doesn't snowball and end up costing you more than fixing the root cause would have in the first place.

Need to UNPIVOT? Why not CROSS APPLY VALUES instead?[/url]
Since random numbers are too important to be left to chance, let's generate some![/url]
Learn to understand recursive CTEs by example.[/url]
[url url=http://www.sqlservercentral.com/articles/St

Pianoman SSC Veteran Points: 232 More actions · Answer 7

Cheers Guys.

Jeff - Yup I thought about adding ROW_NUMBER() too for the same reason Luis made about missing ID's (thanks Luis)

Dwain - Cheers to your feedback. Nice to see lovely bits of code with no cursor in sight 🙂

thanks

Pianoman

jeffem Ten Centuries Points: 1222 More actions · Answer 8

Dwain, good job on accounting for the missing IDs! 🙂

Just a note about the performance: I did a comparison, and using a table variable will give a bit of a performance advantage (15% or so, for reasonable numbers of records) to your method. If this were in an actual table (temp, or otherwise) with a good, covering index, then the execution plan between our two solutions is essentially identical.

SQL Padawan Say Hey Kid Points: 712 More actions · Answer 9

Try this too: DECLARE @tab TABLE

( ID INT , ColValue INT)

;

INSERT INTO @tab (ID, ColValue)

SELECT 1, 23

UNION ALL SELECT 2, 123

UNION ALL SELECT 3, 233

UNION ALL SELECT 4, 32

UNION ALL SELECT 5, 45

UNION ALL SELECT 6, 124

UNION ALL SELECT 7, 32

SELECT t1.id,t1.colvalue

FROM @tab t1 LEFT JOIN @tab t2 ON t1.id = t2.id + 1 AND t1.colvalue < t2.colvalue

WHERE t2.id IS NOT NULL

Should be pretty good on performance. For some reason I'm currently in love with LEFT JOINS.

jeffem Ten Centuries Points: 1222 More actions · Answer 10

It doesn't hurt to do it with a left join, but if you use an inner join there, you don't need to require where t2.id is not null.

Matt Miller (4) SSC Guru Points: 124210 More actions · Answer 11

I'm surprised it hasn't been mentioned before, but I do hope you have some way to actually define the ordering within the table. A common stumbling block is to think there is some physical order to the table (there isn't). Assuming you DO, a CTE with Row_number() will perform fairly well, even on large sets. I'd think it would perform something with a correlated sub, but that will depend on existing indexes on the correlated table.

As an aside - if you have a healthy table (i.e anything over a couple hundred records), you will want to avoid table variables. Table variables tend to play nasty tricks on execution plans (estimated # of rows in a table variable stay pegged at 1), so the engine can't optimize for "large" table variables.

----------------------------------------------------------------------------------
Your lack of planning does not constitute an emergency on my part...unless you're my manager...or a director and above...or a really loud-spoken end-user..All right - what was my emergency again?

Jeff Moden SSC Guru Points: 1000539 More actions · Answer 12

jeffem (5/31/2012)
But that requires the data to be sortable exactly as the comparison is desired

Heh.... if you can sort exactly as the comparison is desired, you're dead no matter what. 🙂

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.
"Change is inevitable... change for the better is not".

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)
Intro to Tally Tables and Functions

Dwain Camps SSC Guru Points: 86913 More actions · Answer 13

jeffem (6/1/2012)
Dwain, good job on accounting for the missing IDs! 🙂
Just a note about the performance: I did a comparison, and using a table variable will give a bit of a performance advantage (15% or so, for reasonable numbers of records) to your method. If this were in an actual table (temp, or otherwise) with a good, covering index, then the execution plan between our two solutions is essentially identical.

Can you post your comparison? I'm confused whether you're comparing actual run results (e.g., with Jeff's 1M row test harness) or just comparing execution plans. I'd do it myself but I'm kinda tied up on something at the moment, but if I don't see something posted maybe I'll come back to it.

I'm really just curious because I saw a subquery version like mine blast the heck out of an alternative something like yours (no comparison here intended of course) before, where the execution plans rated them about the same.

I use table variables for convenience so I don't have temp tables littering my sandbox if I forget to drop them or something. I agree they don't always end up rating the same as using an actual table. And with good indexing, it could completely reverse the results.

My mantra: No loops! No CURSORs! No RBAR! Hoo-uh![/I]
My thought question: Have you ever been told that your query runs too fast?

My advice:
INDEXing a poor-performing query is like putting sugar on cat food. Yeah, it probably tastes better but are you sure you want to eat it?
The path of least resistance can be a slippery slope. Take care that fixing your fixes of fixes doesn't snowball and end up costing you more than fixing the root cause would have in the first place.

Need to UNPIVOT? Why not CROSS APPLY VALUES instead?[/url]
Since random numbers are too important to be left to chance, let's generate some![/url]
Learn to understand recursive CTEs by example.[/url]
[url url=http://www.sqlservercentral.com/articles/St

Paul White SSC Guru Points: 150457 More actions · Answer 14

Matt Miller (#4) (6/1/2012)
A common stumbling block is to think there is some physical order to the table (there isn't).

Of course there is. You just can't depend on results coming back in that physical order 🙂

DECLARE @Example AS TABLE

(

ColValue integer NOT NULL

);

INSERT INTO @Example

(ColValue)

VALUES

(23),

(123),

(233),

(32),

(45),

(124),

(32);

SELECT

e.*,

file_num = CONVERT(integer,CONVERT(binary(2),REVERSE(SUBSTRING(%%PhysLoc%%,5,2)))),

page_num = CONVERT(integer,CONVERT(binary(4),REVERSE(SUBSTRING(%%PhysLoc%%,1,4)))),

slot_num = CONVERT(integer,CONVERT(binary(2),REVERSE(SUBSTRING(%%PhysLoc%%,7,2))))

FROM @Example AS e

ORDER BY

file_num,

page_num,

slot_num

OPTION (RECOMPILE);

As an aside - if you have a healthy table (i.e anything over a couple hundred records), you will want to avoid table variables. Table variables tend to play nasty tricks on execution plans (estimated # of rows in a table variable stay pegged at 1), so the engine can't optimize for "large" table variables.

Well it can a bit. See the example above (the number of estimated rows from the table variable in the actual plan is 7). If a statement-level recompile occurs (for whatever reason) the optimizer will see the run-time cardinality of the table variable. Table variables do not, however, currently support statistics (even on indexes). Where a logical query has cost-based physical execution choices that benefit from statistical information, temporary tables are a better choice. If a trivial plan is possible, creating statistics on the temporary table just adds overhead.

Paul White
SQLPerformance.com
SQLkiwi blog
@SQL_Kiwi