Solving the Running Total and Ordinal Rank Problems (Rewritten)

Question

Solving the Running Total and Ordinal Rank Problems (Rewritten)

Viewing 15 posts - 241 through 255 (of 307 total)

You must be logged in to reply to this topic. Login to reply

Jeff Moden SSC Guru Points: 1000539 More actions · Answer 1

Here's the non-Quirky Update code to do what you want. For those that think a CTE would be better, please read the comments in the code to find out why you might want to reconsider...

--=====================================================================================================================

-- Setup the demonstration. Nothing in this section is a part of the solution. It's just to create test data.

--=====================================================================================================================

--===== Do this demonstration in a nice safe place that everyone has.

USE tempdb;

--===== Conditionally drop all the test tables to make reruns easier

IF OBJECT_ID('tempdb.dbo.tmp_conf_server' ,'U') IS NOT NULL DROP TABLE tempdb.dbo.tmp_conf_server;

--===== Create a simulation of a real table in tempdb

CREATE TABLE dbo.tmp_conf_server

(

RecordedDateTime DATETIME NOT NULL,

ServerName VARCHAR(260) NOT NULL,

LastRestarted DATETIME NOT NULL,

TotalBatchRequests BIGINT NOT NULL,

TotalDeadlocks BIGINT NOT NULL,

TotalCompilations BIGINT NOT NULL,

TotalReCompilations BIGINT NOT NULL,

CONSTRAINT PK_cls_RecordedDateTime_Servername__tmp_conf_server

PRIMARY KEY CLUSTERED (RecordedDateTime ASC, ServerName ASC)

);

--===== Populate the simulated real table with test data

INSERT INTO tempdb.dbo.tmp_conf_server

(RecordedDateTime, ServerName, LastRestarted, TotalBatchRequests, TotalDeadlocks, TotalCompilations, TotalReCompilations)

SELECT '2011-04-05 16:46:46.630','server1','2011-02-25 13:29:00.000',500,12,3050,5555 UNION ALL

SELECT '2011-04-05 17:49:22.350','server1','2011-02-25 13:29:00.000',750,12,4000,6666 UNION ALL

SELECT '2011-04-06 17:34:22.473','server1','2011-02-25 13:29:00.000',810,12,4075,7777 UNION ALL

SELECT '2011-04-05 16:46:47.630','server2','2011-02-27 11:29:00.000',888,5 ,30 ,3333 UNION ALL

SELECT '2011-04-05 17:49:23.350','server2','2011-02-27 11:29:00.000',891,6 ,40 ,5900 UNION ALL

SELECT '2011-04-06 17:34:23.473','server2','2011-02-27 11:29:00.000',950,8 ,7000,5950;

--=====================================================================================================================

-- Solve the problem of getting a "delta" between the current and previous rows for the "INC" columns

-- in a simplified manner and without a "Quirky Update".

--=====================================================================================================================

--===== Conditionally drop all the test tables to make reruns easier

IF OBJECT_ID('tempdb..#tmp_conf_server_Inc','U') IS NOT NULL DROP TABLE #tmp_conf_server_Inc;

--===== Create and populate the table we'll use to calculate the increments with.

-- Notice how ISNULL is used to make a NOT NULL column on the fly.

SELECT RowNum = ISNULL(ROW_NUMBER() OVER (ORDER BY ServerName, RecordedDateTime),0),

ServerName,

RecordedDateTime,

LastRestarted,

TotalBatchRequests,

TotalDeadlocks,

TotalCompilations,

TotalReCompilations,

IncSeconds = CAST(0 AS BIGINT),

IncBatchRequests = CAST(0 AS BIGINT),

IncDeadlocks = CAST(0 AS BIGINT),

IncCompilations = CAST(0 AS BIGINT),

IncReCompilations = CAST(0 AS BIGINT)

INTO #tmp_conf_server_Inc

FROM dbo.tmp_conf_server;

--===== Add a clustered index (as a PKK in this case) NOT to do a "Quirky Update" but to simply make the code run faster.

-- As a sidebar, NEVER name a constraint on a #Temp table or it will destroy the ability to do concurrent runs.

ALTER TABLE #tmp_conf_server_Inc

ADD PRIMARY KEY CLUSTERED (RowNum);

--===== You could certainly do this without a Temp Table using a CTE, but it would take twice as long because it would

-- have to execute the CTE twice.

SELECT hi.ServerName,

hi.RecordedDateTime,

hi.LastRestarted,

hi.TotalBatchRequests,

hi.TotalDeadlocks,

hi.TotalCompilations,

hi.TotalReCompilations,

IncSeconds = CASE WHEN hi.ServerName = lo.ServerName THEN DATEDIFF(ss, lo.RecordedDateTime , hi.RecordedDateTime) ELSE NULL END,

IncBatchRequests = CASE WHEN hi.ServerName = lo.ServerName THEN hi.TotalBatchRequests - lo.TotalBatchRequests ELSE NULL END,

IncDeadlocks = CASE WHEN hi.ServerName = lo.ServerName THEN hi.TotalDeadlocks - lo.TotalDeadlocks ELSE NULL END,

IncCompilations = CASE WHEN hi.ServerName = lo.ServerName THEN hi.TotalCompilations - lo.TotalCompilations ELSE NULL END,

IncReCompilations = CASE WHEN hi.ServerName = lo.ServerName THEN hi.TotalReCompilations - lo.TotalReCompilations ELSE NULL END

FROM #tmp_conf_server_Inc lo

RIGHT OUTER JOIN #tmp_conf_server_Inc hi ON lo.RowNum+1 = hi.RowNum;

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.
"Change is inevitable... change for the better is not".

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)
Intro to Tally Tables and Functions

Langston Montgomery SSC Eights! Points: 934 More actions · Answer 2

Jeff Moden (4/8/2011)
Also, be advised. You didn't add the required clustered index to the #Temp table and that may be why your 2 part code didn't work the way you expected. Still and like I said before, a "Quirky Update" isn't need for the simple chore of diffing two adjacent rows. I'll have some code for you for that soon.

Nadrek,

In case you're wondering, and if I'm not mistaken, the clustered index fields are in the wrong order.

Jeff Moden SSC Guru Points: 1000539 More actions · Answer 3

They in the incorrect order only on the source table. That's why he copies the rows to another table where he can apply a different clustered index.

Still, you just don't need a QU for this exercise.

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.
"Change is inevitable... change for the better is not".

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)
Intro to Tally Tables and Functions

Nadrek SSC-Insane Points: 20039 More actions · Answer 4

No, I'm not trolling; I'm advertising your article! The full context was:

-- Quirky update!!! BEWARE!!! Do not use this for critical things!

-- BEFORE USING, read http://qa.sqlservercentral.com/articles/T-SQL/68467/

Jeff Moden (4/8/2011)
Also, be advised. You didn't add the required clustered index to the #Temp table and that may be why your 2 part code didn't work the way you expected. Still and like I said before, a "Quirky Update" isn't need for the simple chore of diffing two adjacent rows. I'll have some code for you for that soon.

What is the problem with creating the clustered index as the primary key in the CREATE TABLE #tmp_conf_server_Inc statement itself, which was:

CONSTRAINT [PK_cls_RecordedDateTime_Servername__#tmp_conf_server_Inc] PRIMARY KEY CLUSTERED

(

[Anchor],

[ServerName] ASC,

[RecordedDateTime] ASC

)WITH (PAD_INDEX = ON, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON, FILLFACTOR = 90, DATA_COMPRESSION = PAGE) ON [PRIMARY]

Was there another problem with my two-part set statement variant?

I see the Rownum based self-join; I'll take a look at how that performs with several hundred thousand rows when I get a chance.

Jeff Moden SSC Guru Points: 1000539 More actions · Answer 5

Nadrek (4/11/2011)
No, I'm not trolling; I'm advertising your article! The full context was:
-- Quirky update!!! BEWARE!!! Do not use this for critical things!
-- BEFORE USING, read http://qa.sqlservercentral.com/articles/T-SQL/68467/

"Do not use this for critical things" is an "advertisement"? Not where I come from. 😉

And, yes, I use it for "critical" things.

Anyway, sorry about the confusion on the clustered index. I got a bit cross-eyed on that because of the names of the tables. I'll take another look.

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.
"Change is inevitable... change for the better is not".

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)
Intro to Tally Tables and Functions

Nadrek SSC-Insane Points: 20039 More actions · Answer 6

Jeff Moden (4/12/2011)
"Do not use this for critical things" is an "advertisement"? Not where I come from. 😉
And, yes, I use it for "critical" things.
Anyway, sorry about the confusion on the clustered index. I got a bit cross-eyed on that because of the names of the tables. I'll take another look.

Well, the link to the article is an advertisement :). The "do not use this for critical things" is much easier to say than "you have to follow all the rules; if you fail even one rule, you may, or may not, receive invalid results"; and I tend towards limited use of undocumented features for critical work; I'll use them more often in noncritical work.

ethan.murray Newbie Points: 9 More actions · Answer 7

Thanks to all the contributors to this article and its comments. To help some purists understand how Jeff's article is adding value: I work in a management consulting firm where we use SQL as a more powerful version of Excel - very far from production, with custom-coded analyses we use for just a few months in a specific server environment, with the analyses carefully sanity checked by humans before implementing anything, and we just have to be right on average and not 100% of the time. A decade ago, one of us discovered quirky update (probably from one of the poorly written articles back then), and it has enabled us to run sophisticated analysis on multi-billion row tables to drive hundreds of millions of dollars of measurable value for our clients. Without quirky update, we would often not be able to deliver our answers quickly enough to drive this value. (Although I'm excited to try Paul's SQLCLR solution now, especially to see if it works or can be adapted to work for partitioned tables.)

Over the years, we discovered many of these rules (most importantly, the row-increment error checking that I think Paul and Tom posted here), but it has been very helpful to have them all in one place! For the last year we have been referring our new analysts to this article as a helpful checklist to accompany the careful sanity checking and sample testing they always do when writing new code, and we have been so grateful to have it.

For the forthcoming rewrite, it might be helpful to note (in an appendix?) that if you set the value of the same variable multiple times, it is not predictable in which order these statements will be processed. I know that should be the default assumption, and it has been pointed out in some of the earlier posts in this conversation. Still, it's something we've found worth including in our internal training documents because it often trips up newcomers to the technique and costs them some debugging time to learn it the hard way.

Jeff Moden SSC Guru Points: 1000539 More actions · Answer 8

Nadrek (4/12/2011)
Jeff Moden (4/12/2011)
"Do not use this for critical things" is an "advertisement"? Not where I come from. 😉
And, yes, I use it for "critical" things.
Anyway, sorry about the confusion on the clustered index. I got a bit cross-eyed on that because of the names of the tables. I'll take another look.
Well, the link to the article is an advertisement :). The "do not use this for critical things" is much easier to say than "you have to follow all the rules; if you fail even one rule, you may, or may not, receive invalid results"; and I tend towards limited use of undocumented features for critical work; I'll use them more often in noncritical work.

Sorry... I missed this... Maybe not in your country or language but that's not what's implied by such language in my country and the two statements have drastically different meanings. I actually take offense to the wording or would prefer that you not use it again and also change existing wording.

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.
"Change is inevitable... change for the better is not".

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)
Intro to Tally Tables and Functions

Jeff Moden SSC Guru Points: 1000539 More actions · Answer 9

ethan.murray (5/11/2011)
Thanks to all the contributors to this article and its comments. To help some purists understand how Jeff's article is adding value: I work in a management consulting firm where we use SQL as a more powerful version of Excel - very far from production, with custom-coded analyses we use for just a few months in a specific server environment, with the analyses carefully sanity checked by humans before implementing anything, and we just have to be right on average and not 100% of the time. A decade ago, one of us discovered quirky update (probably from one of the poorly written articles back then), and it has enabled us to run sophisticated analysis on multi-billion row tables to drive hundreds of millions of dollars of measurable value for our clients. Without quirky update, we would often not be able to deliver our answers quickly enough to drive this value. (Although I'm excited to try Paul's SQLCLR solution now, especially to see if it works or can be adapted to work for partitioned tables.)
Over the years, we discovered many of these rules (most importantly, the row-increment error checking that I think Paul and Tom posted here), but it has been very helpful to have them all in one place! For the last year we have been referring our new analysts to this article as a helpful checklist to accompany the careful sanity checking and sample testing they always do when writing new code, and we have been so grateful to have it.
For the forthcoming rewrite, it might be helpful to note (in an appendix?) that if you set the value of the same variable multiple times, it is not predictable in which order these statements will be processed. I know that should be the default assumption, and it has been pointed out in some of the earlier posts in this conversation. Still, it's something we've found worth including in our internal training documents because it often trips up newcomers to the technique and costs them some debugging time to learn it the hard way.

Ethan... thank you very much for taking the time to write that up. I appreciate it very much.

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.
"Change is inevitable... change for the better is not".

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)
Intro to Tally Tables and Functions

Nadrek SSC-Insane Points: 20039 More actions · Answer 10

Jeff Moden (5/11/2011)
Sorry... I missed this... Maybe not in your country or language but that's not what's implied by such language in my country and the two statements have drastically different meanings. I actually take offense to the wording or would prefer that you not use it again and also change existing wording.

My apologies; I did not mean any offense. I've edited my code and post with the more precise version.

That said, while I do not need to use the quirky update, did anyone ever notice what mistake I made on the 2 part version, so I can avoid making it again?

Jeff Moden SSC Guru Points: 1000539 More actions · Answer 11

Nadrek (5/12/2011)
Jeff Moden (5/11/2011)
Sorry... I missed this... Maybe not in your country or language but that's not what's implied by such language in my country and the two statements have drastically different meanings. I actually take offense to the wording or would prefer that you not use it again and also change existing wording.
My apologies; I did not mean any offense. I've edited my code and post with the more precise version.
That said, while I do not need to use the quirky update, did anyone ever notice what mistake I made on the 2 part version, so I can avoid making it again?

Thank you for your consideration.

I'll also go back and try to figure out your question. Thanks, Nadrek.

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.
"Change is inevitable... change for the better is not".

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)
Intro to Tally Tables and Functions

WayneS SSC Guru Points: 95405 More actions · Answer 12

FYI:

I just compared this Quirky Update code to the newest enhancements in the next version of SQL Server (code named "Denali") CTP3 and found that the Quirky Update performs far greater than the new enhancements in Denali CTP3 currently do. (The Quirky Update was not modified to perform the safety check.)

Denali CTP3 code:

-- DENALI CTP3 Running Totals with enhanced over clause.

;

WITH cte AS

(

SELECT TransactionDetailID,

[Date],

AccountID,

Amount,

AccountRunningTotal = SUM(Amount) OVER (PARTITION BY AccountID ORDER BY Date, TransactionDetailID),

AccountRunningCount = SUM(1) OVER (PARTITION BY AccountID ORDER BY Date, TransactionDetailID),

NCID

FROM dbo.TransactionDetail

)

UPDATE td

SET AccountRunningTotal = cte.AccountRunningTotal,

AccountRunningCount = cte.AccountRunningCount

FROM dbo.TransactionDetail td

JOIN cte

ON cte.TransactionDetailID = td.TransactionDetailID

GO

--===== Verify the the running total worked on the

-- TransactionDetail table

USE TEMPDB --Takes 10 seconds on my machine

EXEC dbo.Verify

GO

Results (comparing Quirky Update to new features):

(Not sure why the Quirky Update is showing zero writes - possible bug in Profiler with CTP3?)

Wayne
Microsoft Certified Master: SQL Server 2008
Author - SQL Server T-SQL Recipes

If you can't explain to another person how the code that you're copying from the internet works, then DON'T USE IT on a production system! After all, you will be the one supporting it!

Links:
For better assistance in answering your questions
Performance Problems
Common date/time routines
Understanding and Using APPLY Part 1 & Part 2

Jeff Moden SSC Guru Points: 1000539 More actions · Answer 13

It's not going to matter... even though I'm seriously disappointed in the performance (as you've measured), the new SUM() is faster than a While Loop, easier to write, does not require a table update, and it's "officially" supported.

I normally don't go near the bleeding edge but I guess I'm going to have to load up Denalli CTP3 and do some testing on my own.

Thanks for the early feedback, Wayne. I really appreciate the time you put into this one.

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.
"Change is inevitable... change for the better is not".

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)
Intro to Tally Tables and Functions

Paul White SSC Guru Points: 150457 More actions · Answer 14

Wayne, you can avoid the join:

UPDATE target

SET AccountRunningTotal = art,

AccountRunningCount = arc

FROM

(

SELECT

td.AccountRunningTotal,

td.AccountRunningCount,

art = SUM(td.Amount) OVER (PARTITION BY td.AccountID ORDER BY td.Date, td.TransactionDetailID),

arc = COUNT_BIG(*) OVER (PARTITION BY td.AccountID ORDER BY td.Date, td.TransactionDetailID)

FROM dbo.TransactionDetail AS td

) AS target

Paul White
SQLPerformance.com
SQLkiwi blog
@SQL_Kiwi

Jeff Moden SSC Guru Points: 1000539 More actions · Answer 15

That should run quite a bit faster.

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.
"Change is inevitable... change for the better is not".

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)
Intro to Tally Tables and Functions