Solving the Running Total and Ordinal Rank Problems (Rewritten)

Question

Solving the Running Total and Ordinal Rank Problems (Rewritten)

Viewing 15 posts - 106 through 120 (of 307 total)

You must be logged in to reply to this topic. Login to reply

Hugo Kornelis SSC Guru Points: 64748 More actions · Answer 1

Hi Jeff,

As indicated in my previous message, I somehow lost sight of this discussion. Sorry for my sudden disappearance from this topic.

First, thank you for optimizing my alternative running totals code even further. I hope you don't mind me using the optimized version of this code (with attribution, of course) for future presentations, blog posts, articles, etc.

When reading over all the new posts, I didn't find any reaction to one of the points I raised. It's about the claim that nobody has yet managed to break the quirky update. I did post at least two bits of code in this thread that did break it. You dismissed the first as using a JOIN; I objected to that, as an IN with a constant list rather than a subquery is not equivalent to a JOIN, but to a series of OR'ed conditions, and then posted another code snippet that breaks it without using IN or OR. If you ever responded to my objection or the last snippet, I overlooked the reaction - can you please tell me the page number to re-read? And if you never responded, than I hope you will do that this time.

For your convenience, here is the last code I posted again:

DECLARE @Counter INT;

SET @Counter = 0;

UPDATE dbo.TransactionDetail

SET @Counter = AccountRunningCount = @Counter + 1

FROM dbo.TransactionDetail WITH (TABLOCKX)

WHERE TransactionDetailID BETWEEN 120000 AND 120010

OPTION (MAXDOP 1);

GO

--===== Select all the rows in order by the clustered index

SELECT *

FROM dbo.TransactionDetail

WHERE TransactionDetailID BETWEEN 120000 AND 120010

ORDER BY AccountID, Date, TransactionDetailID;

Hugo Kornelis, SQL Server/Data Platform MVP (2006-2016)
Visit my SQL Server blog: https://sqlserverfast.com/blog/
SQL Server Execution Plan Reference: https://sqlserverfast.com/epr/

tony rogerson SSC Veteran Points: 242 More actions · Answer 2

Cursor is imho the best approach for this because you can put an ORDER BY - you have repeatable results with syntax that is well defined in terms of behaviour in the product manual (bol).

That particular use of UPDATE in this context is only asking for trouble.

Yes, it might work like that now; but where in the documentation does it define that it will always process the rows in order?

My two cents.

Tony.

Paul White SSC Guru Points: 150457 More actions · Answer 3

Hi Hugo,

You overlooked my post which addressed the 'problem'. It was quite long, so I'm quite shocked that you missed it 😉

Extract: Paul White to Jeff Moden
How much do you regret retiring the INDEX(0) specification present in your original article? That simple hint does away with Hugo's attempt to break it using a WHERE clause or an IN statement. In addition to that, this method has always been about persisting a running total for all records in the set. If you want to do a small subset (enough to invoke an unordered update) you'd probably use another method. That said, adding the INDEX(0) back in solves all those worries....

DECLARE @Counter INTEGER;

SET @Counter = 0;

BEGIN TRANSACTION;

UPDATE TD

SET @Counter = AccountRunningCount = @Counter + 1

FROM dbo.TransactionDetail TD WITH (TABLOCKX, INDEX(0))

WHERE TransactionDetailID BETWEEN 120000 AND 120010 -- IN (12,123,1234,23,234,2345,34,345,3456)

OPTION (MAXDOP 1);

SELECT *

FROM dbo.TransactionDetail

WHERE TransactionDetailID BETWEEN 120000 AND 120010

ORDER BY AccountID, Date, TransactionDetailID;

ROLLBACK TRANSACTION;

BTW, your RANK() solution is much more clever than I originally gave it credit for. My sincere apologies for that; I hadn't fully appreciated the subtleties until relatively recently.

My only reservation about it is that its performance depends somewhat on the data. If the data is split into many groups - AccountIDs in Jeff's sample data - it absolutely flies. On the other end of the scale, where there is only one group, it behaves like an extremely slow cursor, and is in fact roundly beaten by same.

For running total calculations where the data is not written back to an existing column in the source, your method beats the quirky update by a small margin in my tests (test rig available). I am currently unable to produce a SQLCLR routine which beats either of these methods on their home turf, although I can get very, very close 🙂

Paul

Paul White
SQLPerformance.com
SQLkiwi blog
@SQL_Kiwi

Jeff Moden SSC Guru Points: 1000539 More actions · Answer 4

tony rogerson (2/4/2010)
Cursor is imho the best approach for this because you can put an ORDER BY - you have repeatable results with syntax that is well defined in terms of behaviour in the product manual (bol).
That particular use of UPDATE in this context is only asking for trouble.
Yes, it might work like that now; but where in the documentation does it define that it will always process the rows in order?
My two cents.
Tony.

Thanks for the visit, Tony and I appreciate the words of caution which are similar to the same cautions I gave in the article. And, yes, I agree it's an undocumented feature as I also said in the article. If it ever breaks then I too will use a cursor or maybe even a "CLR" as a workaround.

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.
"Change is inevitable... change for the better is not".

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)
Intro to Tally Tables and Functions

tony rogerson SSC Veteran Points: 242 More actions · Answer 5

Relying on query hints to force a side effect feature is just madness - you are relying on the behaviour of the engine in specific product versions (and builds).

Seriously - don't use the side effect feature of the UPDATE like this.

It was originally put in the product to capture back into a local variable so we don't have to use an UPDATE and seperate SELECT.

Tony.

Paul White SSC Guru Points: 150457 More actions · Answer 6

tony rogerson (2/4/2010)
Cursor is imho the best approach for this because you can put an ORDER BY - you have repeatable results with syntax that is well defined in terms of behaviour in the product manual (bol).

*This is not a knee-jerk anti-cursor reaction*

Tony, are you serious? Hugo's method beats the cursor method in 99.9% of cases. SQLCLR methods beat the cursor in 99.9% of cases. Quirky Update beats the cursor in 100% of cases but requires care. I'm sure there are other methods too that have far more merit than even the most keenly optimized LOCAL READ_ONLY FORWARD_ONLY STATIC cursor!

Cursors do suck, and they discourage set-based thinking. As John McEnroe famously said: "You cannot be serious!"

😛

Paul

Paul White
SQLPerformance.com
SQLkiwi blog
@SQL_Kiwi

Paul White SSC Guru Points: 150457 More actions · Answer 7

Jeff Moden (2/4/2010)
...I too will use [...] a "CLR"

At last! :laugh: 😛 😀 :w00t:

And I have it in writing. Even if it did require extreme editing skills.

Paul

Paul White
SQLPerformance.com
SQLkiwi blog
@SQL_Kiwi

tony rogerson SSC Veteran Points: 242 More actions · Answer 8

Jeff Moden (2/4/2010)
tony rogerson (2/4/2010)
Cursor is imho the best approach for this because you can put an ORDER BY - you have repeatable results with syntax that is well defined in terms of behaviour in the product manual (bol).
That particular use of UPDATE in this context is only asking for trouble.
Yes, it might work like that now; but where in the documentation does it define that it will always process the rows in order?
My two cents.
Tony.
Thanks for the visit, Tony and I appreciate the words of caution which are similar to the same cautions I gave in the article. And, yes, I agree it's an undocumented feature as I also said in the article. If it ever breaks then I too will use a cursor or maybe even a "CLR" as a workaround.

Hi Jeff - I do respect the amount of work you have put into this - I know the effort involved, I've done it so many times myself.

But, to publically recommend this method is just irresponsible to be frank.

Unless you are going to go through every single version and sp and cu update of the product in order to make sure they have not changed something that breaks the undocumented feature you are relying on then the fact it works on the one version, one build is mute.

I know I'll get flamed for posting this - but this is the level of problem this will cause! Not everybody is at your level, a lot of people will just use it.

Also - the warning is just that other people have been bitching that this feature is undocumented rather than an up front warning that "never ever use this in a production environment as you are relying on undocumented behaviour that may change right under your feet on the next service pack or cumulative update you apply".

If you do that and put it in big red bold letters right at the start of your article then I'd be happy!

Tony.

tony rogerson SSC Veteran Points: 242 More actions · Answer 9

Not in my testing they don't!

I've presented a number of times at UK user group meeting on running totals; I have an example that proves the opposite.

The cursor was faster - its all about how you right your code.

Tony.

Paul White SSC Guru Points: 150457 More actions · Answer 10

tony rogerson (2/4/2010)
I've presented a number of times at UK user group meeting on running totals; I have an example that proves the opposite. The cursor was faster - its all about how you right [sic] your code.

Please share 😉

Paul White
SQLPerformance.com
SQLkiwi blog
@SQL_Kiwi

Jeff Moden SSC Guru Points: 1000539 More actions · Answer 11

Hugo Kornelis (2/4/2010)
Hi Jeff,
As indicated in my previous message, I somehow lost sight of this discussion. Sorry for my sudden disappearance from this topic.
First, thank you for optimizing my alternative running totals code even further. I hope you don't mind me using the optimized version of this code (with attribution, of course) for future presentations, blog posts, articles, etc.

Absolutely not a problem. Thanks for asking. Glad I could make a contribution. Thanks, Hugo.

When reading over all the new posts, I didn't find any reaction to one of the points I raised. It's about the claim that nobody has yet managed to break the quirky update. I did post at least two bits of code in this thread that did break it. You dismissed the first as using a JOIN; I objected to that, as an IN with a constant list rather than a subquery is not equivalent to a JOIN, but to a series of OR'ed conditions, and then posted another code snippet that breaks it without using IN or OR. If you ever responded to my objection or the last snippet, I overlooked the reaction - can you please tell me the page number to re-read? And if you never responded, than I hope you will do that this time.
For your convenience, here is the last code I posted again:
DECLARE @Counter INT;
SET @Counter = 0;
UPDATE dbo.TransactionDetail
SET @Counter = AccountRunningCount = @Counter + 1
FROM dbo.TransactionDetail WITH (TABLOCKX)
WHERE TransactionDetailID BETWEEN 120000 AND 120010
OPTION (MAXDOP 1);
GO
--===== Select all the rows in order by the clustered index
SELECT *
FROM dbo.TransactionDetail
WHERE TransactionDetailID BETWEEN 120000 AND 120010
ORDER BY AccountID, Date, TransactionDetailID;

My apologies. You're correct. I didn't address that. I'll go back and look at your original post to make sure I get the context right.

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.
"Change is inevitable... change for the better is not".

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)
Intro to Tally Tables and Functions

tony rogerson SSC Veteran Points: 242 More actions · Answer 12

Its on my list to blog about but I'm concentrating on my MSc at the moment, here you are....

If you hunt around the presentation may be online - I did the talk at SQL Bits in 2008. Hence the USE below 🙂

USE SQLBits20080301

go

/***

A look at RUNNING totals

Run these with the Actual Query Plan

SET STATISTICS IO ON

SET STATISTICS TIME ON

****/

create table #amts (

id int not null primary key clustered,

tran_date smalldatetime not null,

amount decimal(28,2) not null

)

create index ncidx_tran_date on #amts( tran_date )

truncate table #amts

insert #amts ( id, tran_date, amount ) values( 1, '20080208', 10 )

insert #amts ( id, tran_date, amount ) values( 2, '20080212', 10 )

insert #amts ( id, tran_date, amount ) values( 3, '20080208', 10 )

insert #amts ( id, tran_date, amount ) values( 4, '20080211', 10 )

insert #amts ( id, tran_date, amount ) values( 5, '20080212', 10 )

insert #amts ( id, tran_date, amount ) values( 6, '20080212', 10 )

-- Wrong....

select *,

rolling = coalesce(

( select sum( amount )

from #amts a2

where a2.tran_date <= a1.tran_date

and a2.id <> a1.id ), 0 )

+ amount

from #amts a1

order by tran_date, id

-- Wrong....

select *,

rolling = coalesce(

( select sum( amount )

from #amts a2

where a2.tran_date <= a1.tran_date

and a2.id < a1.id ), 0 )

+ amount

from #amts a1

order by tran_date, id

-- Wrong....

select *,

rolling = coalesce(

( select sum( amount )

from #amts a2

where a2.tran_date <= a1.tran_date

), 0 )

+ amount

from #amts a1

order by tran_date, id

-- Correct....

select *,

rolling = coalesce(

( select sum( amount )

from #amts a2

where a2.tran_date < a1.tran_date

or ( a2.tran_date = a1.tran_date

and a2.id < a1.id )

), 0 )

+ amount

from #amts a1

order by tran_date, id

go

-- Without the sub-query...

select a.*, dt.running_total + a.amount

from (

select a1.id, running_total = coalesce( sum( a2.amount ), 0 )

from #amts a1

left outer join #amts a2 on a2.tran_date < a1.tran_date

or ( a2.tran_date = a1.tran_date

and a2.id < a1.id )

group by a1.id

) as dt

inner join #amts a on a.id = dt.id

order by tran_date, id

/***

A look at using a CTE to help

***/

-- First we need a sequence

-- Notice the Sequence Generator Step

with Trans ( row_no, id, tran_date, amount )

as (

select row_no = row_number() over( order by tran_date, id ), id, tran_date, amount

from #amts

)

select *

from Trans

go

-- Now the real query

-- Notice 3!!! Sequence Generator Steps - notice where the Filter step is

with Trans ( row_no, id, tran_date, amount )

as (

select row_no = row_number() over( order by tran_date, id ), id, tran_date, amount

from #amts

)

select t1.*,

running_amount = coalesce(

( select sum( amount )

from Trans t2

where t2.row_no < t1.row_no ), 0 )

+ amount

from Trans t1

order by row_no

go

/************************

On Real Data Volumes

*********************/

-- Rows in the limited data set... - check IO

select count(*) from transactions where account_id between 1000 and 1020

select * from transactions where account_id between 1000 and 1020

-- 588

-- CTE way - row_number() approach

-- Problematic - row_number() is calculated on entire table rather than just the ones we want!

-- Bug / Optimiser short fall?

-- Sub Query with CTE - ROW_NUMBER() just corrects the out of order id (surrogate key)

with Trans ( row_no, id, salesperson_id, tran_date, clear_date, amount, transaction_types_id, account_id )

as (

select row_no = row_number() over( order by account_id, tran_date, id ), id, salesperson_id, tran_date, clear_date, amount, transaction_types_id, account_id

from Transactions

where account_id between 1000 and 1020

)

select t1.*,

running_amount = coalesce(

( select sum( amount )

from Trans t2

where t2.row_no < t1.row_no

and t2.account_id = t1.account_id ), 0 )

+ amount

from Trans t1

order by account_id, row_no

go

-- Using a temporary table instead...

select row_no = row_number() over( order by account_id, tran_date, id ), id, salesperson_id, tran_date, clear_date, amount, transaction_types_id, account_id

into #t

from Transactions

where account_id between 1000 and 1020;

create unique clustered index clidx on #t ( account_id, row_no )

; with Trans ( row_no, id, salesperson_id, tran_date, clear_date, amount, transaction_types_id, account_id )

as (

select *

from #t

)

select t1.*,

running_amount = coalesce(

( select sum( amount )

from Trans t2

where t2.row_no < t1.row_no

and t2.account_id = t1.account_id ), 0 )

+ amount

from Trans t1

order by account_id, row_no

go

drop table #t

/***

The other methods on real data...

****/

-- Just using a sub-query and using a surrogate key (id column)

select *,

rolling = coalesce(

( select sum( amount )

from Transactions a2

where a2.account_id = a1.account_id

and ( a2.tran_date < a1.tran_date

or ( a2.tran_date = a1.tran_date

and a2.id < a1.id ) )

), 0 )

+ amount

from Transactions a1

where account_id between 1000 and 1020

order by account_id, tran_date, id

go

-- Self join approach

select a.*, dt.running_total + a.amount

from (

select a1.id, running_total = coalesce( sum( a2.amount ), 0 )

from Transactions a1

left outer join Transactions a2 on a2.account_id = a1.account_id

and ( a2.tran_date < a1.tran_date

or ( a2.tran_date = a1.tran_date

and a2.id < a1.id ) )

where a1.account_id between 1000 and 1020

group by a1.id

) as dt

inner join Transactions a on a.id = dt.id

order by account_id, tran_date, id

go

-- Cursor - using UPDATE method

set nocount on

create table #Trans(

row_no int not null,

account_id int not null,

id int not null,

amount decimal(28,2) not null,

rolling_total decimal(28,2) not null,

salesperson_id int not null,

tran_date smalldatetime not null,

clear_date smalldatetime not null,

transaction_types_id int not null,

primary key clustered( account_id, row_no )

)

insert #Trans (

row_no,

account_id,

id,

amount,

rolling_total,

salesperson_id,

tran_date,

clear_date,

transaction_types_id )

select row_no = row_number() over( partition by account_id order by tran_date, id ),

account_id,

id,

amount,

0,

salesperson_id,

tran_date,

clear_date,

transaction_types_id

from Transactions

where account_id between 1000 and 5000

declare trans_cur cursor forward_only for

select account_id, row_no, amount

from #Trans

order by account_id, row_no

for update of rolling_total

declare @account_id int,

@row_no int,

@amount decimal(28,2),

@rolling_total decimal(28,2)

open trans_cur

fetch next from trans_cur into @account_id, @row_no, @amount

begin tran

while @@fetch_status = 0

begin

if @row_no = 1 -- initialise on each partition

set @rolling_total = @amount

else

set @rolling_total = @rolling_total + @amount

update #Trans

set rolling_total = @rolling_total

where current of trans_cur

fetch next from trans_cur into @account_id, @row_no, @amount

end

deallocate trans_cur

commit tran

select id, salesperson_id, tran_date, clear_date, amount, transaction_types_id, amount, rolling_total

from #Trans

order by account_id, tran_date, id

drop table #Trans

go

-- Cursor - INSERT method

set nocount on

create table #Trans(

account_id int not null,

id int not null,

amount decimal(28,2) not null,

rolling_total decimal(28,2) not null,

salesperson_id int not null,

tran_date smalldatetime not null,

clear_date smalldatetime not null,

transaction_types_id int not null

)

declare trans_cur cursor fast_forward for

select account_id,

row_no = row_number() over( partition by account_id order by tran_date, id ),

id,

amount,

salesperson_id,

tran_date,

clear_date,

transaction_types_id

from Transactions

where account_id between 1000 and 5000

order by account_id, row_no

declare @account_id int,

@row_no int,

@id int,

@amount decimal(28,2),

@salesperson_id int,

@tran_date smalldatetime,

@clear_date smalldatetime,

@transaction_types_id int,

@rolling_total decimal(28,2)

open trans_cur

fetch next from trans_cur

into @account_id, @row_no, @id, @amount,

@salesperson_id, @tran_date, @clear_date, @transaction_types_id

begin tran

while @@fetch_status = 0

begin

if @row_no = 1 -- initialise on each partition

set @rolling_total = @amount

else

set @rolling_total = @rolling_total + @amount

insert #Trans ( account_id, id, amount, rolling_total, salesperson_id, tran_date, clear_date, transaction_types_id )

values( @account_id, @id, @amount, @rolling_total, @salesperson_id, @tran_date, @clear_date, @transaction_types_id )

fetch next from trans_cur

into @account_id, @row_no, @id, @amount,

@salesperson_id, @tran_date, @clear_date, @transaction_types_id

end

deallocate trans_cur

commit tran

select id, salesperson_id, tran_date, clear_date, amount, transaction_types_id, amount, rolling_total

from #Trans

order by account_id, tran_date, id

drop table #Trans

go

with Trans ( row_no, id, salesperson_id, tran_date, clear_date, amount, transaction_types_id, account_id )

as (

select row_no = row_number() over( order by account_id, tran_date, id ), id, salesperson_id, tran_date, clear_date, amount, transaction_types_id, account_id

from Transactions

where account_id between 1000 and 5000

)

select t1.*,

running_amount = coalesce(

( select sum( amount )

from Trans t2

where t2.row_no < t1.row_no

and t2.account_id = t1.account_id ), 0 )

+ amount

from Trans t1

order by account_id, row_no

go

The set up script....

drop table accounts_active

drop table accounts_inactive

drop table accounts

drop table transactions

drop table account_statuses

drop table transaction_types

create table account_statuses (

id int not null identity constraint pk_account_statuses primary key clustered,

status_code char(50) not null constraint uk_accounts_active_status_code unique,

status_desc varchar(50) not null constraint uk_accounts_active_status_desc unique

)

insert account_statuses ( status_code, status_desc ) values( 'ONHLD', 'On Hold' )

insert account_statuses ( status_code, status_desc ) values( 'ACTIV', 'Active' )

insert account_statuses ( status_code, status_desc ) values( 'LAP3M', 'Lapsed 3 months' )

insert account_statuses ( status_code, status_desc ) values( 'LAP6M', 'Lapsed 6 months' )

insert account_statuses ( status_code, status_desc ) values( 'BADCR', 'Bad Credit' )

insert account_statuses ( status_code, status_desc ) values( 'LAPSD', 'Lapsed >12 months' )

insert account_statuses ( status_code, status_desc ) values( 'FRAUD', 'Fraud' )

go

create table transaction_types (

id int not null identity constraint pk_transaction_types primary key clustered,

transaction_type_code char(50) not null constraint uk_transaction_type_status_code unique,

transaction_type_desc varchar(50) not null constraint uk_transaction_type_status_desc unique,

sign_amount smallint not null check ( sign_amount in ( -1, 1 ) )

)

insert transaction_types ( transaction_type_code, transaction_type_desc, sign_amount ) values( 'PAYCC', 'Pay by Credit Card', 1 )

insert transaction_types ( transaction_type_code, transaction_type_desc, sign_amount ) values( 'PAYCH', 'Pay by Cheque', 1 )

insert transaction_types ( transaction_type_code, transaction_type_desc, sign_amount ) values( 'PAYMY', 'Pay by Cash', 1 )

insert transaction_types ( transaction_type_code, transaction_type_desc, sign_amount ) values( 'REFUD', 'Refund', -1 )

insert transaction_types ( transaction_type_code, transaction_type_desc, sign_amount ) values( 'ADJST', 'Adjustment', -1 )

go

create table accounts (

id int not null identity constraint pk_accounts primary key clustered,

first_name varchar(50) not null,

last_name varchar(50) not null,

dob smalldatetime not null

constraint ck_accounts_dob check( dob = cast( convert( char(8), dob, 112 ) as smalldatetime ) ),

account_no varchar(50) not null constraint uk_accounts unique

)

go

insert accounts( first_name, last_name, dob, account_no )

select FirstName, LastName, dateadd( month, floor( rand(cast(newid() as varbinary)) * 100 ), '19701004' ) , right( '0000000000' + cast( contactid as varchar(50) ), 10 )

from AdventureWorks.Person.Contact

create table accounts_active (

account_id int not null references accounts( id ),

account_statuses_id int not null references account_statuses( id ),

constraint pk_accounts_active primary key clustered( account_id )

)

go

create table accounts_inactive (

account_id int not null references accounts( id ),

account_statuses_id int not null references account_statuses ( id ),

constraint pk_accounts_inactive primary key clustered( account_id )

)

go

select top 10000 id, 1

from accounts

order by newid()

union all

select top 1000 id, 2

from account_statuses

where status_code in ( 'ONHLD', 'ACTIV' )

order by newid()

insert accounts_active ( account_id, account_statuses_id )

select *

from (

select top 10000 id, s = 1

from accounts

order by newid() ) as d

insert accounts_active ( account_id, account_statuses_id )

select id, s

from (

select top 1000 id, s = 2

from accounts a

where id not in ( select account_id from accounts_active )

order by newid() ) as d

insert accounts_inactive ( account_id, account_statuses_id )

select id, s

from (

select top 1000 id, s = 6

from accounts a

where id not in ( select account_id from accounts_active )

and id not in ( select account_id from accounts_inactive )

order by newid() ) as d

create table individual(

id int not null identity constraint pk_individual primary key clustered,

first_name varchar(50) not null,

last_name varchar(50) not null

)

insert individual( first_name, last_name )

select FirstName, LastName

from AdventureWorks.Person.Contact

drop table sales_commission

drop function dbo.fn_check_sales_commission_window

go

create function dbo.fn_check_sales_commission_window (

@sales_commission_id int,

@individual_id int,

@comm_start_date smalldatetime,

@comm_end_date smalldatetime )

returns varchar(3)

as

begin

declare @status varchar(3)

if exists (

select *

from sales_commission

where individual_id = @individual_id

and id <> @sales_commission_id

and ( @comm_start_date between comm_start_date and coalesce( comm_end_date, @comm_start_date )

or @comm_end_date between comm_start_date and coalesce( comm_end_date, @comm_end_date ) )

)

set @status = 'BAD'

else

set @status = 'OK'

return @status

end

go

create table sales_commission (

id int not null identity constraint pk_sales_commission primary key clustered,

individual_id int not null references individual( id ),

comm_rate decimal( 5, 2 ) not null,

comm_start_date smalldatetime not null check( comm_start_date = cast( convert( char(8), comm_start_date, 112 ) as smalldatetime ) ),

constraint ck_sales_commission_window check( dbo.fn_check_sales_commission_window( id, individual_id, comm_start_date, comm_end_date ) = 'OK' ),

comm_end_date smalldatetime null check( comm_end_date is null or comm_end_date = cast( convert( char(8), comm_end_date, 112 ) as smalldatetime ) ),

constraint uk_sales_commission unique ( individual_id, comm_start_date ), -- really do RI in a trigger to prevent window overlaps.

)

go

insert sales_commission( individual_id, comm_rate, comm_start_date, comm_end_date )

select top 50 id, 20, '20070410', '20080110'

from individual

insert sales_commission( individual_id, comm_rate, comm_start_date, comm_end_date )

select top 50 id, 20, '20080111', '20080210'

from individual

insert sales_commission( individual_id, comm_rate, comm_start_date, comm_end_date )

select top 50 id, 10, '20080211', null

from individual

select * from sales_commission order by comm_start_date, comm_end_date

create table transactions (

id int not null identity constraint pk_transactions primary key clustered,

salesperson_id int not null references individual( id ),

tran_date smalldatetime not null

constraint ck_transactions_tran_date check( tran_date = cast( convert( char(8), tran_date, 112 ) as smalldatetime ) ), -- Force date only

clear_datesmalldatetime not null

constraint ck_transactions_clear_date check( clear_date = cast( convert( char(8), clear_date, 112 ) as smalldatetime ) ), -- Force date only

amount decimal(28,2) not null check( amount > 0 ), -- adjustments done via tran_type

transaction_types_id int not null references transaction_types( id ),

account_id int not null references accounts ( id )

)

go

print 105 % 100

set nocount on

declare @i int

set @i = 1

declare @dt smalldatetime

begin tran

while @i <= 4000000

begin

set @dt = cast( convert( char(8), dateadd( day, -floor( rand() * 100 ), current_timestamp ), 112 ) as smalldatetime )

if @i % 100 = 0

begin

commit tran

begin tran

end

insert transactions (

salesperson_id,

tran_date,

clear_date,

amount,

transaction_types_id,

account_id )

select ( select top 1 individual_id from sales_commission order by newid() ),

@dt,

dateadd( day, 14, @dt ),

( select top 1 id from sales_commission order by newid() ),

( select top 1 id from transaction_types order by newid() ),

( select top 1 id from accounts order by newid() )

set @i = @i + 1

end

commit tran

Paul White SSC Guru Points: 150457 More actions · Answer 13

Tony, I know you're busy - hey we all are, but I have to say (from a quick scan of your script) that it doesn't seem to feature any of:

1. Hugo's set-based iteration method

2. SQLCLR (procedure or TVF)

3. Quirky Update

My contention was (and is) that those three methods are all superior to a cursor. Yes, there are even slower ways (triangular joins, recursive CTEs) but that doesn't make the cursor good or cool 😛

I know you have issues with ranking functions at the moment (which I also happen to disagree with you about by the way) but leaving that aside for a moment, if your main problems with the Quirky Update are (a) the lack of a guarantee on update order, and (b) the fact that the behaviour might change at any moment, let me just say:

1. This undocumented method has worked for at least ten years in all versions, service packs, cumulative updates, and what-have-you.

2. Yes, you would need to test that the latest release hasn't broken it. Don't we usually do that anyway?

3. Documented features have a history of breaking too.

4. The increase in performance may be worth the small extra new-version testing that is required, for some folks.

5. There is no law requiring people to use the method.

Anyway, onto the thing about the ranking function. If you would feel happier with a slight tweak to the method that throws an error if a row tries to be quirky-updated out of sequence, consider this:

DECLARE @PrevAccountID INT;

DECLARE @AccountRunningTotal MONEY;

DECLARE @AccountRunningCount INT;

DECLARE @Sequence INT;

SET @Sequence = 0;

BEGIN TRAN;

WITH Source

AS (

SELECT Sequence = ROW_NUMBER() OVER (ORDER BY TD.AccountID ASC, TD.Date ASC, TD.TransactionDetailID ASC),

TD.AccountID,

TD.AccountRunningCount,

TD.AccountRunningTotal,

TD.Amount,

TD.Date,

TD.NCID,

TD.TransactionDetailID

FROM dbo.TransactionDetail TD WITH (INDEX(0))

)

UPDATE Source WITH (TABLOCKX)

SET @AccountRunningTotal = AccountRunningTotal =

CASE

WHEN AccountID = @PrevAccountID

THEN @AccountRunningTotal+Amount

ELSE Amount

END,

@AccountRunningCount = AccountRunningCount =

CASE

WHEN AccountID = @PrevAccountID

THEN @AccountRunningCount + 1

ELSE 1

END,

@Sequence =

CASE

WHEN Sequence = @Sequence + 1 THEN Sequence -- row is in the expected sequence

ELSE 1/0 -- produce an error

END,

@PrevAccountID = AccountID

OPTION (MAXDOP 1);

SELECT *

FROM dbo.TransactionDetail

ORDER BY

AccountID, Date, TransactionDetailID

ROLLBACK;

If you run that (on the test rig in the article) you will see that all that is added is a Segment and Sequence Project for the ROW_NUMBER() - no sort is required of course. The majority of the execution time is the rollback 🙂

The truly paranoid reader could extend this basic method using the OUTPUT clause of the UPDATE statement to write the PK, sequence, and running totals to another table. Only if no error was produced would the reader go on to change the data. Something along those lines anyway - I lack the time currently for a full implementation, and lack sufficient paranoia...

Someone is also bound to notice that the above method also works on heaps - in fact on any table that either (a) has a clustered index in the Quirky Update order; or (b) is a heap. Non-clustered indexes make no never mind.

Paul

Paul White
SQLPerformance.com
SQLkiwi blog
@SQL_Kiwi

tony rogerson SSC Veteran Points: 242 More actions · Answer 14

Paul White (2/4/2010)
1. This undocumented method has worked for at least ten years in all versions, service packs, cumulative updates, and what-have-you.
2. Yes, you would need to test that the latest release hasn't broken it. Don't we usually do that anyway?
3. Documented features have a history of breaking too.
4. The increase in performance may be worth the small extra new-version testing that is required, for some folks.
5. There is no law requiring people to use the method.

Point 1 - I'm sure all the people that got caught out with TOP 100 PERCENT in a VIEW would disagree; and yes - I know the feature has been there for a long time - I can't remember when I first came across it - probably 6.5; not sure it was in 4.21, possibly 6.0; TOP 100 PERCENT in a VIEW was introduced in 7.0?? It's something like that..

Point 2 - No, the majority of people do not do a full system test before upgrading to the next service pack or cu. That is a reality out in the field; the majority of people are not huge enterprise companies that can justify that resource and expense.

Point 3 - Name a couple that were'nt subsequently fixed.

Point 4 - doh

Point 5 - Yes, there is a law - its the law of practicalities; there are many many folk out there that are not full time database developers or administrators and will see and just implement the "feature" rather than doing it properly.

Anyway - I think we are at a point of complete disagreement on this.

I'm on the side of stability, supportability and maintainability - with this approach, are you honestly arguing that this undocumented feature is the correct method to use?

Seriously now - not just to pick a side that disagrees with my point.

Tony.

Hugo Kornelis SSC Guru Points: 64748 More actions · Answer 15

Paul White (2/4/2010)
Hi Hugo,
You overlooked my post which addressed the 'problem'. It was quite long, so I'm quite shocked that you missed it 😉

Guilty as charged 😉 I didn't have the time to read all the new posts in this thread, so I mainly focused on finding Jeff's post, and then quickscanning those posts to see if they addressed my point.

I did not repeat the testruns, but I have no doubt that the INDEX(0) hint will solve the issue, just as you suggested. However, I will maintain that this actually enforces my objections, rather than weakening it.

The first time Jeff wrote about this issue, the list of rules for using the "quirky update" was rather short - I believe he had three or four points on them at that time (and I believe that the INDEX(0) hint was still included then). That resulted in a discussion of proportions even bigger than this one, with several people posting code examples that broke his code, all without breaking any of the rules that were valid then.

Now, Jeff has rewritten his article, and the list of rules has grown to 10 points. Okay, 9 if you don't count the "TEST" rule (which is valid for ALL code, not just the quirky update). Each of the new rules was apparently designed to remove one or more of the counter-examples that were posted to his original article. And yet, the method is still not fully reliable, as I have demonstrated by posting my code.

Your reaction is to impose yet another rule, to counter my code. But I fail to see how this would increase anyone's faith in this method - a method that can only keep up it's claim of reliability by adding more and more rules as counterexamples are found. I could try to find another example to break it, even with the INDEX(0) hint. I might find one. Maybe now,, maybe only when the next service pack is released, who knows. And if I did, I have no doubt that you, or Jeff, or any other proponent of this method, will add yet another rule to the list. A list that, in my opinion, could just as easily be paraphrased and shortened as a single rule: "Unless it fails (which might happen under various circumstances, not all of which have been identified yet)". And indeed - a claim that this method works reliable "unless it fails" is one that I cannot contradict.

So, if you want to build your production systems on a method that "works unless it doesn't", go ahead. But if you, or even Jeff, a SQL author who deservedly has a lot of respect in the SQL Server community, endorses it, then I think you are leading people with a lesser understanding into very dangerous territory. And I agree with Tony's comments that there should be a warning, in big red flashy letters, at the start of this article: "This method might fail under unexpected circumstances".

Hugo Kornelis, SQL Server/Data Platform MVP (2006-2016)
Visit my SQL Server blog: https://sqlserverfast.com/blog/
SQL Server Execution Plan Reference: https://sqlserverfast.com/epr/