Windows Functions: Tell me when that changes.

Question

Windows Functions: Tell me when that changes.

Viewing 15 posts - 16 through 30 (of 30 total)

You must be logged in to reply to this topic. Login to reply

Luis Cazares SSC Guru Points: 183686 More actions · Answer 1

j-1064772 (2/18/2016)
The execution plans do not tell the whole story. I have seen some cases by Gila Monster where the execution was simplified but the load on the server was not improved (roughly same execution time.

Agreed, that's why I mention it as a possibility, not a certainty. To be sure about performance improvement, we would need to generate sample data and run tests. I'm not sure that I could generate useful sample data without knowing the real behavior for this data. That's the reason I didn't provide a test as well.

Luis C.
General Disclaimer:
Are you seriously taking the advice and code from someone from the internet without testing it? Do you at least understand it? Or can it easily kill your server?

How to post data/code on a forum to get the best help: Option 1 / Option 2

John Mitchell-245523 SSC Guru Points: 148786 More actions · Answer 2

davidawest00 (2/17/2016)
SS will create ALL THE ROWS for a CTE before it processes the main query, whether it will need all those rows or not

Not necessarily. The query below only does an index seek on the nc1 index of sysjobsteps when I ran it. Sure, if the CTE is complex enough, or returns enough of the rows in the table(s) it uses, it may be materialised. You can't generalise that to all cases, though.

WITH JobsAS (

SELECT * FROM msdb.dbo.sysjobsteps

)

SELECT step_name

FROM Jobs

WHERE job_id = '22B6E268-E13D-497C-B582-0C0559E42382'

John

Luis Cazares SSC Guru Points: 183686 More actions · Answer 3

John Mitchell-245523 (2/18/2016)
davidawest00 (2/17/2016)
SS will create ALL THE ROWS for a CTE before it processes the main query, whether it will need all those rows or not
Not necessarily. The query below only does an index seek on the nc1 index of sysjobsteps when I ran it. Sure, if the CTE is complex enough, or returns enough of the rows in the table(s) it uses, it may be materialised. You can't generalise that to all cases, though.
WITH JobsAS (
SELECT * FROM msdb.dbo.sysjobsteps
)
SELECT step_name
FROM Jobs
WHERE job_id = '22B6E268-E13D-497C-B582-0C0559E42382'
John

Just another example to demonstrate that we're telling SQL Server what we want, not how we want it to do it. There's a lot happening behind our code which is not interpreted left to right.

CREATE TABLE #Example(

Mystring char(5)

);

INSERT INTO #Example VALUES('a'), ('1');

WITH CTE AS(

SELECT *

FROM #Example

WHERE ISNUMERIC(Mystring) = 1

)

SELECT *

FROM CTE

WHERE Mystring = 1;

DROP TABLE #Example;

Luis C.
General Disclaimer:
Are you seriously taking the advice and code from someone from the internet without testing it? Do you at least understand it? Or can it easily kill your server?

How to post data/code on a forum to get the best help: Option 1 / Option 2

akljfhnlaflkj SSC Guru Points: 76202 More actions · Answer 4

akljfhnlaflkj

SSC Guru

Points: 76202

March 8, 2016 at 5:02 pm

#1863884

Thanks for the article.

ChrisM@Work SSC Guru Points: 186127 More actions · Answer 5

davidawest00 - Wednesday, February 17, 2016 10:35 AM
I have encountered this precise scenario a dozen times.You can do the same thing with a correlated sub-query, and they usually scale better than any CTE because a CTE is just an in-line view, which SQL Server will generate first.In other words, SS will create ALL THE ROWS for a CTE before it processes the main query, whether it will need all those rows or not, and you can't put an index on a CTE.Slow and disk intensive. Bleh! Better to create 2 temp tables and index them: 1 for the current order and 1 for the previous order.Then join. Last time I did that the stored proc went from 2 hours to 20 minutes.

Just to clarify, this is a myth. SQL Server does not "evaluate" a CTE in this manner at all. CTE's are simply a coding construct similar to a derived table or a correlated subquery.
There are some parallels with recursive CTE's but that's a different story.

^{“Write the query the simplest way. If through testing it becomes clear that the performance is inadequate, consider alternative query forms.” - Gail Shaw}

For fast, accurate and documented assistance in answering your questions, please read this article.
Understanding and using APPLY, (I) and (II) Paul White
Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden

DesNorton SSC-Insane Points: 23842 More actions · Answer 6

You can also provide a Default value for the LEAD/LAG Window functions.

In this case, it means not having to add the extra AND PREVIOUS_CODE IS NOT NULL in the where clause

WITH cteSAMPLEDATA AS ( SELECT ID, ORDER_ID, CODE, ORDER_DATE FROM (VALUES (1, 'ORDER1', 'DS', N'20151001'), (2, 'ORDER2', 'DS', N'20151001'), (5, 'ORDER2', 'DS', N'20151002'), (3, 'ORDER3', 'DS', N'20151001'), (6, 'ORDER3', 'AG', N'20151002'), (8, 'ORDER3', 'AG', N'20151003'), (4, 'ORDER4', 'DS', N'20151001'), (7, 'ORDER4', 'AG', N'20151002'), (9, 'ORDER4', 'DS', N'20151003') ) AS MYTABLE(ID, ORDER_ID, CODE, ORDER_DATE) ), cteGET_MAX_DATE_AND_CODE AS ( SELECT H1.ID, H1.ORDER_ID, H1.CODE, H1.ORDER_DATE, MAX(H1.ORDER_DATE) --Get the latest date for each ORDER OVER (PARTITION BY ORDER_ID) AS MAX_DATE, LEAD(H1.CODE, 1, H1.CODE) --Get the code 1 row before the current row. OVER (PARTITION BY ORDER_ID ORDER BY ORDER_DATE DESC) AS PREVIOUS_CODE FROM cteSAMPLEDATA H1 ) SELECT ID, ORDER_ID, CODE, PREVIOUS_CODE, ORDER_DATE, MAX_DATE FROM cteGET_MAX_DATE_AND_CODE WHERE ORDER_DATE = MAX_DATE AND CODE != PREVIOUS_CODE AND PREVIOUS_CODE IS NOT NULL ;

How to post data/code on a forum to get the best help.

Jeff Moden SSC Guru Points: 1000539 More actions · Answer 7

ChrisM@Work - Friday, March 17, 2017 2:54 AM
davidawest00 - Wednesday, February 17, 2016 10:35 AM
I have encountered this precise scenario a dozen times.You can do the same thing with a correlated sub-query, and they usually scale better than any CTE because a CTE is just an in-line view, which SQL Server will generate first.In other words, SS will create ALL THE ROWS for a CTE before it processes the main query, whether it will need all those rows or not, and you can't put an index on a CTE.Slow and disk intensive. Bleh! Better to create 2 temp tables and index them: 1 for the current order and 1 for the previous order.Then join. Last time I did that the stored proc went from 2 hours to 20 minutes.
Just to clarify, this is a myth. SQL Server does not "evaluate" a CTE in this manner at all. CTE's are simply a coding construct similar to a derived table or a correlated subquery.
There are some parallels with recursive CTE's but that's a different story.

Amen to that! I knew if I read down far enough, one of you heavy hitters would pick up on that. Same thing with a View. They usually aren't "materialized first" and cannot be treated as if they were a table. They become a part of the overall code, just as an iTVF usually does.

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.
"Change is inevitable... change for the better is not".

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)
Intro to Tally Tables and Functions

Jeff Moden SSC Guru Points: 1000539 More actions · Answer 8

adame - Thursday, February 18, 2016 1:38 AM
Agreed, always look to use new features.

Nah...you did good. I agree that there have been some very useful and well thought out new features (ROW_NUMBER and APPLY are my all-time "new feature" favorites, so far) but a lot of the new features have been, ummmm.... no other word for it... "crap" or lead to misuse. My three least favorite "new" features so far are the newer date and time (including DATETIME2, etc, because they disallow direct temporal math), PIVOT/UNPIVOT (pre-aggregated CROSS TABS are both easier and faster for what I do), and FORMAT (44 times slower than CONVERT) with incremental Recursive CTEs following as a very close second.

Heh... and yeah.... I definitely get the "legacy" stuff. I had to use a crow-bar and jet-propelled pork chop launcher to get our company to finally upgrade from 2005 to 2012 and that was just last year. We actually had the hardware for the migration setup and ready nearly 2 years before that. A lot of people simply can't afford to do any upgrades and so I frequently shoot for the lowest common denominator (generally 2005) unless it just doesn't make sense to do so because of some huge performance gain... which usually ISN'T the case. 😉

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.
"Change is inevitable... change for the better is not".

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)
Intro to Tally Tables and Functions

ChrisM@Work SSC Guru Points: 186127 More actions · Answer 9

Jeff Moden - Friday, March 17, 2017 9:14 AM
adame - Thursday, February 18, 2016 1:38 AM
Agreed, always look to use new features.
Nah...you did good. I agree that there have been some very useful and well thought out new features (ROW_NUMBER and APPLY are my all-time "new feature" favorites, so far) but a lot of the new features have been, ummmm.... no other word for it... "crap" or lead to misuse. My three least favorite "new" features so far are the newer date and time (including DATETIME2, etc, because they disallow direct temporal math), PIVOT/UNPIVOT (pre-aggregated CROSS TABS are both easier and faster for what I do), and FORMAT (44 times slower than CONVERT) with incremental Recursive CTEs following as a very close second.
Heh... and yeah.... I definitely get the "legacy" stuff. I had to use a crow-bar and jet-propelled pork chop launcher to get our company to finally upgrade from 2005 to 2012 and that was just last year. We actually had the hardware for the migration setup and ready nearly 2 years before that. A lot of people simply can't afford to do any upgrades and so I frequently shoot for the lowest common denominator (generally 2005) unless it just doesn't make sense to do so because of some huge performance gain... which usually ISN'T the case. 😉

Whatever did we do without ROW_NUMBER? Oh, I remember now - we ran stuff into a temp table, using the IDENTITY() function in the SELECT. Mimicing PARTITION BY was less trivial. Throw NTILE, SUM, COUNT etc into the mix and you've got a sea-change in the way we code. Magic 🙂

^{“Write the query the simplest way. If through testing it becomes clear that the performance is inadequate, consider alternative query forms.” - Gail Shaw}

For fast, accurate and documented assistance in answering your questions, please read this article.
Understanding and using APPLY, (I) and (II) Paul White
Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden

lyrixg SSC Veteran Points: 201 More actions · Answer 10

Nice article..i love window functions..makes for an

elegant code..my go to analytical function for

Year onYear type or queries .

liver.larson SSC Veteran Points: 236 More actions · Answer 11

Good example of LEAD. As with all things SQL, there are so many approaches to the problem. I do love window functions, and lead/lag are particularly useful. That said, I might have approached this specific problem differently. Mostly I throw this out there just for alternative thoughts, not as a criticism.

One of the things that struck me about the approach outlined is that at the core of the requirement in this case is that we are only interested in the most recent two records for an order. With that in mind, I would have addressed that first, so that the comparison of last value to previous value only has to manage two records per order. You could go so far as to even remove orders with only one record. I would accomplish this with a CTE that uses either ROW_NUMBER or COUNT() OVER () to serially number each set of records over an orderId with desc on date. From that CTE, it's easy to throw a filter on the next select where row_number < 2 and now you've got a record set that doesn't have any records that don't matter. If you really wanted to get fancy, you could filter them further for orders that have more than one record. This would mean the record set that you are ultimately scraping for last value to previous value comparison has precisely 2 records per order. No more and no less.

From there a lead function could be used, or you could self join the table, or you could use the pivot operator and compare columns, or you could group on orderId and compare MAX(value) to MIN(value) - any of which I feel could read cleaner, with varying degrees of performance considerations.

For me, readability is my first consideration, then performance (assuming results are correct, of course). That said, this is what I visualized immediately (not tested)

; WITH ORDERCODES AS ( SELECT ID , ORDER_ID , CODE , ORDER_DATE , REV_ORDER_SEQUENCE = ROW_NUMBER() OVER (PARTITION BY ORDER_ID ORDER BY ORDER_DATE DESC) , TOTAL_ORDER_RECORDS = COUNT(1) OVER (PARTITION BY ORDER_ID) FROM SAMPLEVALUES ) , ORDERSINERROR ( SELECT ORDER_ID FROM ORDERCODES OC WHERE OC.REV_ORDER_SEQUENCE <= 2 AND OC.TOTAL_ORDER_RECORDS > 1 GROUP BY ORDER_ID HAVING MAX(CODE) <> MIN(CODE) )

SELECT OC.* FROM ORDERCODES OC INNER JOIN ORDERSINERROR OIE ON OIE.ORDER_ID = OC.ORDER_ID AND OC.REV_ORDER_SEQUENCE <= 2 ORDER BY ORDER_ID , ORDER_DATE

Jeff Moden SSC Guru Points: 1000539 More actions · Answer 12

ChrisM@Work - Friday, March 17, 2017 10:37 AM
Whatever did we do without ROW_NUMBER? Oh, I remember now - we ran stuff into a temp table, using the IDENTITY() function in the SELECT. Mimicing PARTITION BY was less trivial. Throw NTILE, SUM, COUNT etc into the mix and you've got a sea-change in the way we code. Magic 🙂

Partition By was pretty easy to solve once you had things in the Temp Table. I don't even have to tell you how because you already know. The reminder is "QU". 😉

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.
"Change is inevitable... change for the better is not".

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)
Intro to Tally Tables and Functions

ChrisM@home SSC-Insane Points: 24260 More actions · Answer 13

Jeff Moden - Saturday, March 18, 2017 3:31 PM
ChrisM@Work - Friday, March 17, 2017 10:37 AM
Whatever did we do without ROW_NUMBER? Oh, I remember now - we ran stuff into a temp table, using the IDENTITY() function in the SELECT. Mimicing PARTITION BY was less trivial. Throw NTILE, SUM, COUNT etc into the mix and you've got a sea-change in the way we code. Magic 🙂
Partition By was pretty easy to solve once you had things in the Temp Table. I don't even have to tell you how because you already know. The reminder is "QU". 😉

You know that I know that...
Our BI team had a logic problem last week. I gave them three different solutions to play with, all learned here on ssc. The fastest was a QU solution which ran in moments. They were blown away.

[font="Arial"]^{Low-hanging fruit picker and defender of the moggies}[/font]

For better assistance in answering your questions, please read this[/url].

Understanding and using APPLY, (I)[/url] and (II)[/url] Paul White[/url]

Hidden RBAR: Triangular Joins[/url] / The "Numbers" or "Tally" Table: What it is and how it replaces a loop[/url] Jeff Moden[/url]

bc_ Hall of Fame Points: 3106 More actions · Answer 14

ChrisM@home - Saturday, March 18, 2017 4:07 PM
Jeff Moden - Saturday, March 18, 2017 3:31 PM
ChrisM@Work - Friday, March 17, 2017 10:37 AM
Whatever did we do without ROW_NUMBER? Oh, I remember now - we ran stuff into a temp table, using the IDENTITY() function in the SELECT. Mimicing PARTITION BY was less trivial. Throw NTILE, SUM, COUNT etc into the mix and you've got a sea-change in the way we code. Magic 🙂
Partition By was pretty easy to solve once you had things in the Temp Table. I don't even have to tell you how because you already know. The reminder is "QU". 😉
You know that I know that...
Our BI team had a logic problem last week. I gave them three different solutions to play with, all learned here on ssc. The fastest was a QU solution which ran in moments. They were blown away.

Could either of you point to an article or description of a "QU solution", please?

[font="Arial Narrow"]bc[/font]

Luis Cazares SSC Guru Points: 183686 More actions · Answer 15

bc_ - Wednesday, March 22, 2017 12:43 PM
ChrisM@home - Saturday, March 18, 2017 4:07 PM
Jeff Moden - Saturday, March 18, 2017 3:31 PM
ChrisM@Work - Friday, March 17, 2017 10:37 AM
Whatever did we do without ROW_NUMBER? Oh, I remember now - we ran stuff into a temp table, using the IDENTITY() function in the SELECT. Mimicing PARTITION BY was less trivial. Throw NTILE, SUM, COUNT etc into the mix and you've got a sea-change in the way we code. Magic 🙂
Partition By was pretty easy to solve once you had things in the Temp Table. I don't even have to tell you how because you already know. The reminder is "QU". 😉
You know that I know that...
Our BI team had a logic problem last week. I gave them three different solutions to play with, all learned here on ssc. The fastest was a QU solution which ran in moments. They were blown away.
Could either of you point to an article or description of a "QU solution", please?

Here's the article explaining it.
http://qa.sqlservercentral.com/articles/T-SQL/68467/

Luis C.
General Disclaimer:
Are you seriously taking the advice and code from someone from the internet without testing it? Do you at least understand it? Or can it easily kill your server?

How to post data/code on a forum to get the best help: Option 1 / Option 2