Update order of a table with a clustered index

  • Hello everybody,

    I got a TSQL question here. My dilemma is that I need to collapse a set of overlapping date ranges (within one day of each other). Obviously one can use cursors and join methods, but they proved to be lacking in performance.

    After scouring the SQLsphere, I came up with this sample algorithm/query using an update statement, which takes milliseconds to complete against 1M+ rows. The only concern I have is that in order for this query to work consistently, the update statement must traverse the table in the same order as the clustered index, because the algorithm uses data from the previous row to determine which date ranges to collapse in the current row, and vice-versa.

    So my question is: will the update statement consistently pass over the table in the same order as the clustered index (since, as we know, the data is physically stored in the same order as the clustered index)?

    I say consistently, because so far this is working against tables with millions of rows. But I'm concerned that because this method is not necessarily "sanctioned", what are the chances of it breaking in the future?

    Thanks in advance!!!!

    -- prepare sample data

    if object_id('tempdb..#test') is not null

    drop table #test

    create table #test(

    a varbinary(20),

    d1 datetime,

    d2 datetime,

    seq int)

    CREATE CLUSTERED INDEX [ic_test] ON #test

    (

    a ASC,

    d1 asc,

    d2 asc

    ) ON [PRIMARY]

    GO

    insert into #test

    select 2, '03/20/2010', '03/31/2010', null union all

    select 1, '03/9/2010', '03/13/2010', null union all

    select 1, '03/15/2010', '03/30/2010', null union all

    select 2, '03/20/2010', '03/31/2010', null union all

    select 1, '03/8/2010', '03/12/2010', null union all

    select 1, '03/12/2010', '03/13/2010', null union all

    select 2, '04/01/2010', '04/02/2010', null

    declare @DateFrom datetime, @DateTo datetime, @PrevDate datetime, @a varbinary(20), @seq int

    select top 1

    @seq = 0

    , @DateFrom = d1

    , @DateTo = d2

    , @a = a

    from #test

    order

    by a, d1

    update #test

    set @seq = case

    when @a = a and (d1 between @DateFrom and dateadd(day, 1, @DateTo) or @PrevDate between d1 and dateadd(day, 1, d2)) then @seq

    else @seq + 1

    end

    , @a = a

    , @DateFrom = d1

    , @DateTo = d2

    , @PrevDate = d1

    , seq = @seq

    option (maxdop 1)

    select * from #test option (maxdop 1)

    select a, min(d1), max(d2), seq from #test

    group by a, seq

    order by seq

  • You are basically using the "quirky update" which is explained in detail by Jeff Moden in this article, http://qa.sqlservercentral.com/articles/T-SQL/68467/. I think he addresses your concerns in the article.

  • Excellent! Thank you for pointing me in the right direction. This article does address all my concerns. I did a search on this great site, but I used search terms relevant to my situation. I would have never thought to search for "Quirky Update" 🙂

  • Langston Montgomery (12/29/2010)


    Excellent! Thank you for pointing me in the right direction. This article does address all my concerns. I did a search on this great site, but I used search terms relevant to my situation. I would have never thought to search for "Quirky Update" 🙂

    ... which is pretty much the issue of all search engines (including BOL, the SQL Server help system): as long as you know what to look for, they are really useful. But until you kow the right term to search for, it's a slightly different story....

    Regarding the quirky update: there has been another improvement to make it even more bullet proof: Some refer to it as SafetyCheck.

    IIRC, there's another thread around where the safety check method is discussed in detail. But I can't find it at the moment.

    All I can remember is: it will make the quirky update even more secure than the method described in the article Jack pointed you at.



    Lutz
    A pessimist is an optimist with experience.

    How to get fast answers to your question[/url]
    How to post performance related questions[/url]
    Links for Tally Table [/url] , Cross Tabs [/url] and Dynamic Cross Tabs [/url], Delimited Split Function[/url]

  • I'd bet the SafetyCheck is in the discussion of Jeff's article which I linked to in my original post.

    It is definitely easier to find something if you know what you are looking for. I actually found it because I knew Jeff had written the article I was looking for so I found all the articles he has written using the Authors link on the left. Although a search for SQL Server Quirky Update returns a lot of rows.

  • After reading Jeff's full article on "Quirky Update", and by pure luck, I am following all the rules (including the Safety Check). Well, the MAXDOP 1 option just made sense to me naturally as I know parallelism can affect ordering. But the partitioning and TABLOCKX hint was not apparent to me. These gotchas and the others seem to mostly be solved by copying the data into a temp table, which is what I was already doing in the interest of modularity and code reuse. I guess coding from a "best-practices" standpoint has some advantages from time to time 🙂

    Thanks again for your help and dedication.

  • Langston Montgomery (12/29/2010)


    (since, as we know, the data is physically stored in the same order as the clustered index)?

    Myth, not true.

    Gail Shaw
    Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
    SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

    We walk in the dark places no others will enter
    We stand on the bridge and no one may pass
  • Jack Corbett (12/29/2010)


    I'd bet the SafetyCheck is in the discussion of Jeff's article which I linked to in my original post.

    ...

    Nope, not in the version I checked (will be published on 2011/03/04, so it's more than recent :-P).



    Lutz
    A pessimist is an optimist with experience.

    How to get fast answers to your question[/url]
    How to post performance related questions[/url]
    Links for Tally Table [/url] , Cross Tabs [/url] and Dynamic Cross Tabs [/url], Delimited Split Function[/url]

  • This post (in the discussion of the article) references both the safety check and the modification to it, and includes the resultant code sample with both applied.

    Wayne
    Microsoft Certified Master: SQL Server 2008
    Author - SQL Server T-SQL Recipes


    If you can't explain to another person how the code that you're copying from the internet works, then DON'T USE IT on a production system! After all, you will be the one supporting it!
    Links:
    For better assistance in answering your questions
    Performance Problems
    Common date/time routines
    Understanding and Using APPLY Part 1 & Part 2

  • LutzM (12/30/2010)


    Jack Corbett (12/29/2010)


    I'd bet the SafetyCheck is in the discussion of Jeff's article which I linked to in my original post.

    ...

    Nope, not in the version I checked (will be published on 2011/03/04, so it's more than recent :-P).

    Jack, I apologize. I completely overlooked your reference to the discussion of the article... :blush:



    Lutz
    A pessimist is an optimist with experience.

    How to get fast answers to your question[/url]
    How to post performance related questions[/url]
    Links for Tally Table [/url] , Cross Tabs [/url] and Dynamic Cross Tabs [/url], Delimited Split Function[/url]

  • LutzM (12/30/2010)


    LutzM (12/30/2010)


    Jack Corbett (12/29/2010)


    I'd bet the SafetyCheck is in the discussion of Jeff's article which I linked to in my original post.

    ...

    Nope, not in the version I checked (will be published on 2011/03/04, so it's more than recent :-P).

    Jack, I apologize. I completely overlooked your reference to the discussion of the article... :blush:

    No worries. If you read and processed everything I wrote you'd find all my mistakes...:w00t:

  • GilaMonster (12/30/2010)


    Langston Montgomery (12/29/2010)


    (since, as we know, the data is physically stored in the same order as the clustered index)?

    Myth, not true.

    Care to elaborate?

  • Langston Montgomery (12/30/2010)


    GilaMonster (12/30/2010)


    Langston Montgomery (12/29/2010)


    (since, as we know, the data is physically stored in the same order as the clustered index)?

    Myth, not true.

    Care to elaborate?

    Sure.

    The clustered index enforces the logical order of the data, the next and previous page pointers on each page provide a way to read the clustered index in the order of the clustering key (just as nonclustered indexes do for their key order). It does not in any way enforce the physical order. If the index is perfectly defragmented and the data file is not fragmented then the logical order will match the physical order of the data on disk, but that's a special case and far from normal.

    Indexes - logical order of data. There's many other things that influence the physical order.

    Gail Shaw
    Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
    SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

    We walk in the dark places no others will enter
    We stand on the bridge and no one may pass
  • GilaMonster (12/30/2010)


    Sure.

    The clustered index enforces the logical order of the data, the next and previous page pointers on each page provide a way to read the clustered index in the order of the clustering key (just as nonclustered indexes do for their key order). It does not in any way enforce the physical order. If the index is perfectly defragmented and the data file is not fragmented then the logical order will match the physical order of the data on disk, but that's a special case and far from normal.

    Indexes - logical order of data. There's many other things that influence the physical order.

    Thanks for checking me on that Gila. I used the wrong language.

Viewing 14 posts - 1 through 13 (of 13 total)

You must be logged in to reply to this topic. Login to reply