Solving the Running Total and Ordinal Rank Problems (Rewritten)

  • Hi Jeff,

    As indicated in my previous message, I somehow lost sight of this discussion. Sorry for my sudden disappearance from this topic.

    First, thank you for optimizing my alternative running totals code even further. I hope you don't mind me using the optimized version of this code (with attribution, of course) for future presentations, blog posts, articles, etc.

    When reading over all the new posts, I didn't find any reaction to one of the points I raised. It's about the claim that nobody has yet managed to break the quirky update. I did post at least two bits of code in this thread that did break it. You dismissed the first as using a JOIN; I objected to that, as an IN with a constant list rather than a subquery is not equivalent to a JOIN, but to a series of OR'ed conditions, and then posted another code snippet that breaks it without using IN or OR. If you ever responded to my objection or the last snippet, I overlooked the reaction - can you please tell me the page number to re-read? And if you never responded, than I hope you will do that this time.

    For your convenience, here is the last code I posted again:

    DECLARE @Counter INT;

    SET @Counter = 0;

    UPDATE dbo.TransactionDetail

    SET @Counter = AccountRunningCount = @Counter + 1

    FROM dbo.TransactionDetail WITH (TABLOCKX)

    WHERE TransactionDetailID BETWEEN 120000 AND 120010

    OPTION (MAXDOP 1);

    GO

    --===== Select all the rows in order by the clustered index

    SELECT *

    FROM dbo.TransactionDetail

    WHERE TransactionDetailID BETWEEN 120000 AND 120010

    ORDER BY AccountID, Date, TransactionDetailID;


    Hugo Kornelis, SQL Server/Data Platform MVP (2006-2016)
    Visit my SQL Server blog: https://sqlserverfast.com/blog/
    SQL Server Execution Plan Reference: https://sqlserverfast.com/epr/

  • Cursor is imho the best approach for this because you can put an ORDER BY - you have repeatable results with syntax that is well defined in terms of behaviour in the product manual (bol).

    That particular use of UPDATE in this context is only asking for trouble.

    Yes, it might work like that now; but where in the documentation does it define that it will always process the rows in order?

    My two cents.

    Tony.

  • Hi Hugo,

    You overlooked my post which addressed the 'problem'. It was quite long, so I'm quite shocked that you missed it 😉

    Extract: Paul White to Jeff Moden


    How much do you regret retiring the INDEX(0) specification present in your original article? That simple hint does away with Hugo's attempt to break it using a WHERE clause or an IN statement. In addition to that, this method has always been about persisting a running total for all records in the set. If you want to do a small subset (enough to invoke an unordered update) you'd probably use another method. That said, adding the INDEX(0) back in solves all those worries....

    DECLARE @Counter INTEGER;

    SET @Counter = 0;

    BEGIN TRANSACTION;

    UPDATE TD

    SET @Counter = AccountRunningCount = @Counter + 1

    FROM dbo.TransactionDetail TD WITH (TABLOCKX, INDEX(0))

    WHERE TransactionDetailID BETWEEN 120000 AND 120010 -- IN (12,123,1234,23,234,2345,34,345,3456)

    OPTION (MAXDOP 1);

    SELECT *

    FROM dbo.TransactionDetail

    WHERE TransactionDetailID BETWEEN 120000 AND 120010

    ORDER BY AccountID, Date, TransactionDetailID;

    ROLLBACK TRANSACTION;

    BTW, your RANK() solution is much more clever than I originally gave it credit for. My sincere apologies for that; I hadn't fully appreciated the subtleties until relatively recently.

    My only reservation about it is that its performance depends somewhat on the data. If the data is split into many groups - AccountIDs in Jeff's sample data - it absolutely flies. On the other end of the scale, where there is only one group, it behaves like an extremely slow cursor, and is in fact roundly beaten by same.

    For running total calculations where the data is not written back to an existing column in the source, your method beats the quirky update by a small margin in my tests (test rig available). I am currently unable to produce a SQLCLR routine which beats either of these methods on their home turf, although I can get very, very close 🙂

    Paul

  • tony rogerson (2/4/2010)


    Cursor is imho the best approach for this because you can put an ORDER BY - you have repeatable results with syntax that is well defined in terms of behaviour in the product manual (bol).

    That particular use of UPDATE in this context is only asking for trouble.

    Yes, it might work like that now; but where in the documentation does it define that it will always process the rows in order?

    My two cents.

    Tony.

    Thanks for the visit, Tony and I appreciate the words of caution which are similar to the same cautions I gave in the article. And, yes, I agree it's an undocumented feature as I also said in the article. If it ever breaks then I too will use a cursor or maybe even a "CLR" as a workaround.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.
    "Change is inevitable... change for the better is not".

    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)
    Intro to Tally Tables and Functions

  • Relying on query hints to force a side effect feature is just madness - you are relying on the behaviour of the engine in specific product versions (and builds).

    Seriously - don't use the side effect feature of the UPDATE like this.

    It was originally put in the product to capture back into a local variable so we don't have to use an UPDATE and seperate SELECT.

    Tony.

  • tony rogerson (2/4/2010)


    Cursor is imho the best approach for this because you can put an ORDER BY - you have repeatable results with syntax that is well defined in terms of behaviour in the product manual (bol).

    *This is not a knee-jerk anti-cursor reaction*

    Tony, are you serious? Hugo's method beats the cursor method in 99.9% of cases. SQLCLR methods beat the cursor in 99.9% of cases. Quirky Update beats the cursor in 100% of cases but requires care. I'm sure there are other methods too that have far more merit than even the most keenly optimized LOCAL READ_ONLY FORWARD_ONLY STATIC cursor!

    Cursors do suck, and they discourage set-based thinking. As John McEnroe famously said: "You cannot be serious!"

    😛

    Paul

  • Jeff Moden (2/4/2010)


    ...I too will use [...] a "CLR"

    At last! :laugh: 😛 😀 :w00t:

    And I have it in writing. Even if it did require extreme editing skills.

    Paul

  • Jeff Moden (2/4/2010)


    tony rogerson (2/4/2010)


    Cursor is imho the best approach for this because you can put an ORDER BY - you have repeatable results with syntax that is well defined in terms of behaviour in the product manual (bol).

    That particular use of UPDATE in this context is only asking for trouble.

    Yes, it might work like that now; but where in the documentation does it define that it will always process the rows in order?

    My two cents.

    Tony.

    Thanks for the visit, Tony and I appreciate the words of caution which are similar to the same cautions I gave in the article. And, yes, I agree it's an undocumented feature as I also said in the article. If it ever breaks then I too will use a cursor or maybe even a "CLR" as a workaround.

    Hi Jeff - I do respect the amount of work you have put into this - I know the effort involved, I've done it so many times myself.

    But, to publically recommend this method is just irresponsible to be frank.

    Unless you are going to go through every single version and sp and cu update of the product in order to make sure they have not changed something that breaks the undocumented feature you are relying on then the fact it works on the one version, one build is mute.

    I know I'll get flamed for posting this - but this is the level of problem this will cause! Not everybody is at your level, a lot of people will just use it.

    Also - the warning is just that other people have been bitching that this feature is undocumented rather than an up front warning that "never ever use this in a production environment as you are relying on undocumented behaviour that may change right under your feet on the next service pack or cumulative update you apply".

    If you do that and put it in big red bold letters right at the start of your article then I'd be happy!

    Tony.

  • Not in my testing they don't!

    I've presented a number of times at UK user group meeting on running totals; I have an example that proves the opposite.

    The cursor was faster - its all about how you right your code.

    Tony.

  • tony rogerson (2/4/2010)


    I've presented a number of times at UK user group meeting on running totals; I have an example that proves the opposite. The cursor was faster - its all about how you right [sic] your code.

    Please share 😉

  • Hugo Kornelis (2/4/2010)


    Hi Jeff,

    As indicated in my previous message, I somehow lost sight of this discussion. Sorry for my sudden disappearance from this topic.

    First, thank you for optimizing my alternative running totals code even further. I hope you don't mind me using the optimized version of this code (with attribution, of course) for future presentations, blog posts, articles, etc.

    Absolutely not a problem. Thanks for asking. Glad I could make a contribution. Thanks, Hugo.

    When reading over all the new posts, I didn't find any reaction to one of the points I raised. It's about the claim that nobody has yet managed to break the quirky update. I did post at least two bits of code in this thread that did break it. You dismissed the first as using a JOIN; I objected to that, as an IN with a constant list rather than a subquery is not equivalent to a JOIN, but to a series of OR'ed conditions, and then posted another code snippet that breaks it without using IN or OR. If you ever responded to my objection or the last snippet, I overlooked the reaction - can you please tell me the page number to re-read? And if you never responded, than I hope you will do that this time.

    For your convenience, here is the last code I posted again:

    DECLARE @Counter INT;

    SET @Counter = 0;

    UPDATE dbo.TransactionDetail

    SET @Counter = AccountRunningCount = @Counter + 1

    FROM dbo.TransactionDetail WITH (TABLOCKX)

    WHERE TransactionDetailID BETWEEN 120000 AND 120010

    OPTION (MAXDOP 1);

    GO

    --===== Select all the rows in order by the clustered index

    SELECT *

    FROM dbo.TransactionDetail

    WHERE TransactionDetailID BETWEEN 120000 AND 120010

    ORDER BY AccountID, Date, TransactionDetailID;

    My apologies. You're correct. I didn't address that. I'll go back and look at your original post to make sure I get the context right.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.
    "Change is inevitable... change for the better is not".

    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)
    Intro to Tally Tables and Functions

  • Its on my list to blog about but I'm concentrating on my MSc at the moment, here you are....

    If you hunt around the presentation may be online - I did the talk at SQL Bits in 2008. Hence the USE below 🙂

    USE SQLBits20080301

    go

    /***

    A look at RUNNING totals

    Run these with the Actual Query Plan

    SET STATISTICS IO ON

    SET STATISTICS TIME ON

    ****/

    create table #amts (

    id int not null primary key clustered,

    tran_date smalldatetime not null,

    amount decimal(28,2) not null

    )

    create index ncidx_tran_date on #amts( tran_date )

    truncate table #amts

    insert #amts ( id, tran_date, amount ) values( 1, '20080208', 10 )

    insert #amts ( id, tran_date, amount ) values( 2, '20080212', 10 )

    insert #amts ( id, tran_date, amount ) values( 3, '20080208', 10 )

    insert #amts ( id, tran_date, amount ) values( 4, '20080211', 10 )

    insert #amts ( id, tran_date, amount ) values( 5, '20080212', 10 )

    insert #amts ( id, tran_date, amount ) values( 6, '20080212', 10 )

    -- Wrong....

    select *,

    rolling = coalesce(

    ( select sum( amount )

    from #amts a2

    where a2.tran_date <= a1.tran_date

    and a2.id <> a1.id ), 0 )

    + amount

    from #amts a1

    order by tran_date, id

    -- Wrong....

    select *,

    rolling = coalesce(

    ( select sum( amount )

    from #amts a2

    where a2.tran_date <= a1.tran_date

    and a2.id < a1.id ), 0 )

    + amount

    from #amts a1

    order by tran_date, id

    -- Wrong....

    select *,

    rolling = coalesce(

    ( select sum( amount )

    from #amts a2

    where a2.tran_date <= a1.tran_date

    ), 0 )

    + amount

    from #amts a1

    order by tran_date, id

    -- Correct....

    select *,

    rolling = coalesce(

    ( select sum( amount )

    from #amts a2

    where a2.tran_date < a1.tran_date

    or ( a2.tran_date = a1.tran_date

    and a2.id < a1.id )

    ), 0 )

    + amount

    from #amts a1

    order by tran_date, id

    go

    -- Without the sub-query...

    select a.*, dt.running_total + a.amount

    from (

    select a1.id, running_total = coalesce( sum( a2.amount ), 0 )

    from #amts a1

    left outer join #amts a2 on a2.tran_date < a1.tran_date

    or ( a2.tran_date = a1.tran_date

    and a2.id < a1.id )

    group by a1.id

    ) as dt

    inner join #amts a on a.id = dt.id

    order by tran_date, id

    /***

    A look at using a CTE to help

    ***/

    -- First we need a sequence

    -- Notice the Sequence Generator Step

    with Trans ( row_no, id, tran_date, amount )

    as (

    select row_no = row_number() over( order by tran_date, id ), id, tran_date, amount

    from #amts

    )

    select *

    from Trans

    go

    -- Now the real query

    -- Notice 3!!! Sequence Generator Steps - notice where the Filter step is

    with Trans ( row_no, id, tran_date, amount )

    as (

    select row_no = row_number() over( order by tran_date, id ), id, tran_date, amount

    from #amts

    )

    select t1.*,

    running_amount = coalesce(

    ( select sum( amount )

    from Trans t2

    where t2.row_no < t1.row_no ), 0 )

    + amount

    from Trans t1

    order by row_no

    go

    /************************

    On Real Data Volumes

    *********************/

    -- Rows in the limited data set... - check IO

    select count(*) from transactions where account_id between 1000 and 1020

    select * from transactions where account_id between 1000 and 1020

    -- 588

    -- CTE way - row_number() approach

    -- Problematic - row_number() is calculated on entire table rather than just the ones we want!

    -- Bug / Optimiser short fall?

    -- Sub Query with CTE - ROW_NUMBER() just corrects the out of order id (surrogate key)

    with Trans ( row_no, id, salesperson_id, tran_date, clear_date, amount, transaction_types_id, account_id )

    as (

    select row_no = row_number() over( order by account_id, tran_date, id ), id, salesperson_id, tran_date, clear_date, amount, transaction_types_id, account_id

    from Transactions

    where account_id between 1000 and 1020

    )

    select t1.*,

    running_amount = coalesce(

    ( select sum( amount )

    from Trans t2

    where t2.row_no < t1.row_no

    and t2.account_id = t1.account_id ), 0 )

    + amount

    from Trans t1

    order by account_id, row_no

    go

    -- Using a temporary table instead...

    select row_no = row_number() over( order by account_id, tran_date, id ), id, salesperson_id, tran_date, clear_date, amount, transaction_types_id, account_id

    into #t

    from Transactions

    where account_id between 1000 and 1020;

    create unique clustered index clidx on #t ( account_id, row_no )

    ; with Trans ( row_no, id, salesperson_id, tran_date, clear_date, amount, transaction_types_id, account_id )

    as (

    select *

    from #t

    )

    select t1.*,

    running_amount = coalesce(

    ( select sum( amount )

    from Trans t2

    where t2.row_no < t1.row_no

    and t2.account_id = t1.account_id ), 0 )

    + amount

    from Trans t1

    order by account_id, row_no

    go

    drop table #t

    /***

    The other methods on real data...

    ****/

    -- Just using a sub-query and using a surrogate key (id column)

    select *,

    rolling = coalesce(

    ( select sum( amount )

    from Transactions a2

    where a2.account_id = a1.account_id

    and ( a2.tran_date < a1.tran_date

    or ( a2.tran_date = a1.tran_date

    and a2.id < a1.id ) )

    ), 0 )

    + amount

    from Transactions a1

    where account_id between 1000 and 1020

    order by account_id, tran_date, id

    go

    -- Self join approach

    select a.*, dt.running_total + a.amount

    from (

    select a1.id, running_total = coalesce( sum( a2.amount ), 0 )

    from Transactions a1

    left outer join Transactions a2 on a2.account_id = a1.account_id

    and ( a2.tran_date < a1.tran_date

    or ( a2.tran_date = a1.tran_date

    and a2.id < a1.id ) )

    where a1.account_id between 1000 and 1020

    group by a1.id

    ) as dt

    inner join Transactions a on a.id = dt.id

    order by account_id, tran_date, id

    go

    -- Cursor - using UPDATE method

    set nocount on

    create table #Trans(

    row_no int not null,

    account_id int not null,

    id int not null,

    amount decimal(28,2) not null,

    rolling_total decimal(28,2) not null,

    salesperson_id int not null,

    tran_date smalldatetime not null,

    clear_date smalldatetime not null,

    transaction_types_id int not null,

    primary key clustered( account_id, row_no )

    )

    insert #Trans (

    row_no,

    account_id,

    id,

    amount,

    rolling_total,

    salesperson_id,

    tran_date,

    clear_date,

    transaction_types_id )

    select row_no = row_number() over( partition by account_id order by tran_date, id ),

    account_id,

    id,

    amount,

    0,

    salesperson_id,

    tran_date,

    clear_date,

    transaction_types_id

    from Transactions

    where account_id between 1000 and 5000

    declare trans_cur cursor forward_only for

    select account_id, row_no, amount

    from #Trans

    order by account_id, row_no

    for update of rolling_total

    declare @account_id int,

    @row_no int,

    @amount decimal(28,2),

    @rolling_total decimal(28,2)

    open trans_cur

    fetch next from trans_cur into @account_id, @row_no, @amount

    begin tran

    while @@fetch_status = 0

    begin

    if @row_no = 1 -- initialise on each partition

    set @rolling_total = @amount

    else

    set @rolling_total = @rolling_total + @amount

    update #Trans

    set rolling_total = @rolling_total

    where current of trans_cur

    fetch next from trans_cur into @account_id, @row_no, @amount

    end

    deallocate trans_cur

    commit tran

    select id, salesperson_id, tran_date, clear_date, amount, transaction_types_id, amount, rolling_total

    from #Trans

    order by account_id, tran_date, id

    drop table #Trans

    go

    -- Cursor - INSERT method

    set nocount on

    create table #Trans(

    account_id int not null,

    id int not null,

    amount decimal(28,2) not null,

    rolling_total decimal(28,2) not null,

    salesperson_id int not null,

    tran_date smalldatetime not null,

    clear_date smalldatetime not null,

    transaction_types_id int not null

    )

    declare trans_cur cursor fast_forward for

    select account_id,

    row_no = row_number() over( partition by account_id order by tran_date, id ),

    id,

    amount,

    salesperson_id,

    tran_date,

    clear_date,

    transaction_types_id

    from Transactions

    where account_id between 1000 and 5000

    order by account_id, row_no

    declare @account_id int,

    @row_no int,

    @id int,

    @amount decimal(28,2),

    @salesperson_id int,

    @tran_date smalldatetime,

    @clear_date smalldatetime,

    @transaction_types_id int,

    @rolling_total decimal(28,2)

    open trans_cur

    fetch next from trans_cur

    into @account_id, @row_no, @id, @amount,

    @salesperson_id, @tran_date, @clear_date, @transaction_types_id

    begin tran

    while @@fetch_status = 0

    begin

    if @row_no = 1 -- initialise on each partition

    set @rolling_total = @amount

    else

    set @rolling_total = @rolling_total + @amount

    insert #Trans ( account_id, id, amount, rolling_total, salesperson_id, tran_date, clear_date, transaction_types_id )

    values( @account_id, @id, @amount, @rolling_total, @salesperson_id, @tran_date, @clear_date, @transaction_types_id )

    fetch next from trans_cur

    into @account_id, @row_no, @id, @amount,

    @salesperson_id, @tran_date, @clear_date, @transaction_types_id

    end

    deallocate trans_cur

    commit tran

    select id, salesperson_id, tran_date, clear_date, amount, transaction_types_id, amount, rolling_total

    from #Trans

    order by account_id, tran_date, id

    drop table #Trans

    go

    with Trans ( row_no, id, salesperson_id, tran_date, clear_date, amount, transaction_types_id, account_id )

    as (

    select row_no = row_number() over( order by account_id, tran_date, id ), id, salesperson_id, tran_date, clear_date, amount, transaction_types_id, account_id

    from Transactions

    where account_id between 1000 and 5000

    )

    select t1.*,

    running_amount = coalesce(

    ( select sum( amount )

    from Trans t2

    where t2.row_no < t1.row_no

    and t2.account_id = t1.account_id ), 0 )

    + amount

    from Trans t1

    order by account_id, row_no

    go

    The set up script....

    drop table accounts_active

    drop table accounts_inactive

    drop table accounts

    drop table transactions

    drop table account_statuses

    drop table transaction_types

    create table account_statuses (

    id int not null identity constraint pk_account_statuses primary key clustered,

    status_code char(50) not null constraint uk_accounts_active_status_code unique,

    status_desc varchar(50) not null constraint uk_accounts_active_status_desc unique

    )

    insert account_statuses ( status_code, status_desc ) values( 'ONHLD', 'On Hold' )

    insert account_statuses ( status_code, status_desc ) values( 'ACTIV', 'Active' )

    insert account_statuses ( status_code, status_desc ) values( 'LAP3M', 'Lapsed 3 months' )

    insert account_statuses ( status_code, status_desc ) values( 'LAP6M', 'Lapsed 6 months' )

    insert account_statuses ( status_code, status_desc ) values( 'BADCR', 'Bad Credit' )

    insert account_statuses ( status_code, status_desc ) values( 'LAPSD', 'Lapsed >12 months' )

    insert account_statuses ( status_code, status_desc ) values( 'FRAUD', 'Fraud' )

    go

    create table transaction_types (

    id int not null identity constraint pk_transaction_types primary key clustered,

    transaction_type_code char(50) not null constraint uk_transaction_type_status_code unique,

    transaction_type_desc varchar(50) not null constraint uk_transaction_type_status_desc unique,

    sign_amount smallint not null check ( sign_amount in ( -1, 1 ) )

    )

    insert transaction_types ( transaction_type_code, transaction_type_desc, sign_amount ) values( 'PAYCC', 'Pay by Credit Card', 1 )

    insert transaction_types ( transaction_type_code, transaction_type_desc, sign_amount ) values( 'PAYCH', 'Pay by Cheque', 1 )

    insert transaction_types ( transaction_type_code, transaction_type_desc, sign_amount ) values( 'PAYMY', 'Pay by Cash', 1 )

    insert transaction_types ( transaction_type_code, transaction_type_desc, sign_amount ) values( 'REFUD', 'Refund', -1 )

    insert transaction_types ( transaction_type_code, transaction_type_desc, sign_amount ) values( 'ADJST', 'Adjustment', -1 )

    go

    create table accounts (

    id int not null identity constraint pk_accounts primary key clustered,

    first_name varchar(50) not null,

    last_name varchar(50) not null,

    dob smalldatetime not null

    constraint ck_accounts_dob check( dob = cast( convert( char(8), dob, 112 ) as smalldatetime ) ),

    account_no varchar(50) not null constraint uk_accounts unique

    )

    go

    insert accounts( first_name, last_name, dob, account_no )

    select FirstName, LastName, dateadd( month, floor( rand(cast(newid() as varbinary)) * 100 ), '19701004' ) , right( '0000000000' + cast( contactid as varchar(50) ), 10 )

    from AdventureWorks.Person.Contact

    create table accounts_active (

    account_id int not null references accounts( id ),

    account_statuses_id int not null references account_statuses( id ),

    constraint pk_accounts_active primary key clustered( account_id )

    )

    go

    create table accounts_inactive (

    account_id int not null references accounts( id ),

    account_statuses_id int not null references account_statuses ( id ),

    constraint pk_accounts_inactive primary key clustered( account_id )

    )

    go

    select top 10000 id, 1

    from accounts

    order by newid()

    union all

    select top 1000 id, 2

    from account_statuses

    where status_code in ( 'ONHLD', 'ACTIV' )

    order by newid()

    insert accounts_active ( account_id, account_statuses_id )

    select *

    from (

    select top 10000 id, s = 1

    from accounts

    order by newid() ) as d

    insert accounts_active ( account_id, account_statuses_id )

    select id, s

    from (

    select top 1000 id, s = 2

    from accounts a

    where id not in ( select account_id from accounts_active )

    order by newid() ) as d

    insert accounts_inactive ( account_id, account_statuses_id )

    select id, s

    from (

    select top 1000 id, s = 6

    from accounts a

    where id not in ( select account_id from accounts_active )

    and id not in ( select account_id from accounts_inactive )

    order by newid() ) as d

    create table individual(

    id int not null identity constraint pk_individual primary key clustered,

    first_name varchar(50) not null,

    last_name varchar(50) not null

    )

    insert individual( first_name, last_name )

    select FirstName, LastName

    from AdventureWorks.Person.Contact

    drop table sales_commission

    drop function dbo.fn_check_sales_commission_window

    go

    create function dbo.fn_check_sales_commission_window (

    @sales_commission_id int,

    @individual_id int,

    @comm_start_date smalldatetime,

    @comm_end_date smalldatetime )

    returns varchar(3)

    as

    begin

    declare @status varchar(3)

    if exists (

    select *

    from sales_commission

    where individual_id = @individual_id

    and id <> @sales_commission_id

    and ( @comm_start_date between comm_start_date and coalesce( comm_end_date, @comm_start_date )

    or @comm_end_date between comm_start_date and coalesce( comm_end_date, @comm_end_date ) )

    )

    set @status = 'BAD'

    else

    set @status = 'OK'

    return @status

    end

    go

    create table sales_commission (

    id int not null identity constraint pk_sales_commission primary key clustered,

    individual_id int not null references individual( id ),

    comm_rate decimal( 5, 2 ) not null,

    comm_start_date smalldatetime not null check( comm_start_date = cast( convert( char(8), comm_start_date, 112 ) as smalldatetime ) ),

    constraint ck_sales_commission_window check( dbo.fn_check_sales_commission_window( id, individual_id, comm_start_date, comm_end_date ) = 'OK' ),

    comm_end_date smalldatetime null check( comm_end_date is null or comm_end_date = cast( convert( char(8), comm_end_date, 112 ) as smalldatetime ) ),

    constraint uk_sales_commission unique ( individual_id, comm_start_date ), -- really do RI in a trigger to prevent window overlaps.

    )

    go

    insert sales_commission( individual_id, comm_rate, comm_start_date, comm_end_date )

    select top 50 id, 20, '20070410', '20080110'

    from individual

    insert sales_commission( individual_id, comm_rate, comm_start_date, comm_end_date )

    select top 50 id, 20, '20080111', '20080210'

    from individual

    insert sales_commission( individual_id, comm_rate, comm_start_date, comm_end_date )

    select top 50 id, 10, '20080211', null

    from individual

    select * from sales_commission order by comm_start_date, comm_end_date

    create table transactions (

    id int not null identity constraint pk_transactions primary key clustered,

    salesperson_id int not null references individual( id ),

    tran_date smalldatetime not null

    constraint ck_transactions_tran_date check( tran_date = cast( convert( char(8), tran_date, 112 ) as smalldatetime ) ), -- Force date only

    clear_datesmalldatetime not null

    constraint ck_transactions_clear_date check( clear_date = cast( convert( char(8), clear_date, 112 ) as smalldatetime ) ), -- Force date only

    amount decimal(28,2) not null check( amount > 0 ), -- adjustments done via tran_type

    transaction_types_id int not null references transaction_types( id ),

    account_id int not null references accounts ( id )

    )

    go

    print 105 % 100

    set nocount on

    declare @i int

    set @i = 1

    declare @dt smalldatetime

    begin tran

    while @i <= 4000000

    begin

    set @dt = cast( convert( char(8), dateadd( day, -floor( rand() * 100 ), current_timestamp ), 112 ) as smalldatetime )

    if @i % 100 = 0

    begin

    commit tran

    begin tran

    end

    insert transactions (

    salesperson_id,

    tran_date,

    clear_date,

    amount,

    transaction_types_id,

    account_id )

    select ( select top 1 individual_id from sales_commission order by newid() ),

    @dt,

    dateadd( day, 14, @dt ),

    ( select top 1 id from sales_commission order by newid() ),

    ( select top 1 id from transaction_types order by newid() ),

    ( select top 1 id from accounts order by newid() )

    set @i = @i + 1

    end

    commit tran

  • Tony, I know you're busy - hey we all are, but I have to say (from a quick scan of your script) that it doesn't seem to feature any of:

    1. Hugo's set-based iteration method

    2. SQLCLR (procedure or TVF)

    3. Quirky Update

    My contention was (and is) that those three methods are all superior to a cursor. Yes, there are even slower ways (triangular joins, recursive CTEs) but that doesn't make the cursor good or cool 😛

    I know you have issues with ranking functions at the moment (which I also happen to disagree with you about by the way) but leaving that aside for a moment, if your main problems with the Quirky Update are (a) the lack of a guarantee on update order, and (b) the fact that the behaviour might change at any moment, let me just say:

    1. This undocumented method has worked for at least ten years in all versions, service packs, cumulative updates, and what-have-you.

    2. Yes, you would need to test that the latest release hasn't broken it. Don't we usually do that anyway?

    3. Documented features have a history of breaking too.

    4. The increase in performance may be worth the small extra new-version testing that is required, for some folks.

    5. There is no law requiring people to use the method.

    Anyway, onto the thing about the ranking function. If you would feel happier with a slight tweak to the method that throws an error if a row tries to be quirky-updated out of sequence, consider this:

    DECLARE @PrevAccountID INT;

    DECLARE @AccountRunningTotal MONEY;

    DECLARE @AccountRunningCount INT;

    DECLARE @Sequence INT;

    SET @Sequence = 0;

    BEGIN TRAN;

    WITH Source

    AS (

    SELECT Sequence = ROW_NUMBER() OVER (ORDER BY TD.AccountID ASC, TD.Date ASC, TD.TransactionDetailID ASC),

    TD.AccountID,

    TD.AccountRunningCount,

    TD.AccountRunningTotal,

    TD.Amount,

    TD.Date,

    TD.NCID,

    TD.TransactionDetailID

    FROM dbo.TransactionDetail TD WITH (INDEX(0))

    )

    UPDATE Source WITH (TABLOCKX)

    SET @AccountRunningTotal = AccountRunningTotal =

    CASE

    WHEN AccountID = @PrevAccountID

    THEN @AccountRunningTotal+Amount

    ELSE Amount

    END,

    @AccountRunningCount = AccountRunningCount =

    CASE

    WHEN AccountID = @PrevAccountID

    THEN @AccountRunningCount + 1

    ELSE 1

    END,

    @Sequence =

    CASE

    WHEN Sequence = @Sequence + 1 THEN Sequence -- row is in the expected sequence

    ELSE 1/0 -- produce an error

    END,

    @PrevAccountID = AccountID

    OPTION (MAXDOP 1);

    SELECT *

    FROM dbo.TransactionDetail

    ORDER BY

    AccountID, Date, TransactionDetailID

    ROLLBACK;

    If you run that (on the test rig in the article) you will see that all that is added is a Segment and Sequence Project for the ROW_NUMBER() - no sort is required of course. The majority of the execution time is the rollback 🙂

    The truly paranoid reader could extend this basic method using the OUTPUT clause of the UPDATE statement to write the PK, sequence, and running totals to another table. Only if no error was produced would the reader go on to change the data. Something along those lines anyway - I lack the time currently for a full implementation, and lack sufficient paranoia...

    Someone is also bound to notice that the above method also works on heaps - in fact on any table that either (a) has a clustered index in the Quirky Update order; or (b) is a heap. Non-clustered indexes make no never mind.

    Paul

  • Paul White (2/4/2010)


    1. This undocumented method has worked for at least ten years in all versions, service packs, cumulative updates, and what-have-you.

    2. Yes, you would need to test that the latest release hasn't broken it. Don't we usually do that anyway?

    3. Documented features have a history of breaking too.

    4. The increase in performance may be worth the small extra new-version testing that is required, for some folks.

    5. There is no law requiring people to use the method.

    Point 1 - I'm sure all the people that got caught out with TOP 100 PERCENT in a VIEW would disagree; and yes - I know the feature has been there for a long time - I can't remember when I first came across it - probably 6.5; not sure it was in 4.21, possibly 6.0; TOP 100 PERCENT in a VIEW was introduced in 7.0?? It's something like that..

    Point 2 - No, the majority of people do not do a full system test before upgrading to the next service pack or cu. That is a reality out in the field; the majority of people are not huge enterprise companies that can justify that resource and expense.

    Point 3 - Name a couple that were'nt subsequently fixed.

    Point 4 - doh

    Point 5 - Yes, there is a law - its the law of practicalities; there are many many folk out there that are not full time database developers or administrators and will see and just implement the "feature" rather than doing it properly.

    Anyway - I think we are at a point of complete disagreement on this.

    I'm on the side of stability, supportability and maintainability - with this approach, are you honestly arguing that this undocumented feature is the correct method to use?

    Seriously now - not just to pick a side that disagrees with my point.

    Tony.

  • Paul White (2/4/2010)


    Hi Hugo,

    You overlooked my post which addressed the 'problem'. It was quite long, so I'm quite shocked that you missed it 😉

    Guilty as charged 😉 I didn't have the time to read all the new posts in this thread, so I mainly focused on finding Jeff's post, and then quickscanning those posts to see if they addressed my point.

    I did not repeat the testruns, but I have no doubt that the INDEX(0) hint will solve the issue, just as you suggested. However, I will maintain that this actually enforces my objections, rather than weakening it.

    The first time Jeff wrote about this issue, the list of rules for using the "quirky update" was rather short - I believe he had three or four points on them at that time (and I believe that the INDEX(0) hint was still included then). That resulted in a discussion of proportions even bigger than this one, with several people posting code examples that broke his code, all without breaking any of the rules that were valid then.

    Now, Jeff has rewritten his article, and the list of rules has grown to 10 points. Okay, 9 if you don't count the "TEST" rule (which is valid for ALL code, not just the quirky update). Each of the new rules was apparently designed to remove one or more of the counter-examples that were posted to his original article. And yet, the method is still not fully reliable, as I have demonstrated by posting my code.

    Your reaction is to impose yet another rule, to counter my code. But I fail to see how this would increase anyone's faith in this method - a method that can only keep up it's claim of reliability by adding more and more rules as counterexamples are found. I could try to find another example to break it, even with the INDEX(0) hint. I might find one. Maybe now,, maybe only when the next service pack is released, who knows. And if I did, I have no doubt that you, or Jeff, or any other proponent of this method, will add yet another rule to the list. A list that, in my opinion, could just as easily be paraphrased and shortened as a single rule: "Unless it fails (which might happen under various circumstances, not all of which have been identified yet)". And indeed - a claim that this method works reliable "unless it fails" is one that I cannot contradict.

    So, if you want to build your production systems on a method that "works unless it doesn't", go ahead. But if you, or even Jeff, a SQL author who deservedly has a lot of respect in the SQL Server community, endorses it, then I think you are leading people with a lesser understanding into very dangerous territory. And I agree with Tony's comments that there should be a warning, in big red flashy letters, at the start of this article: "This method might fail under unexpected circumstances".


    Hugo Kornelis, SQL Server/Data Platform MVP (2006-2016)
    Visit my SQL Server blog: https://sqlserverfast.com/blog/
    SQL Server Execution Plan Reference: https://sqlserverfast.com/epr/

Viewing 15 posts - 106 through 120 (of 307 total)

You must be logged in to reply to this topic. Login to reply