INSERT INTO SELECT WITH ORDER BY

  • Eugene Elutin (6/22/2010)


    The "quirky update" method is based on the assumption/fact that the simpe query is always retrieves or operates with data in the order of the clustered index by default.

    The 'quirky update' relies on the fact that SQL will always (in current versions) update rows in the order of the cluster providing there's no parallelism and the clustered index is forced (and a few more conditions). Under a whole load of conditions, the 'update in order' appears to hold. The Select in order however doesn't.

    I tend to agree with you about "but that's not a guarantee that it always will", however when fronted with Jeff Moden challenge to provide the example when it doesn't, it will be hard to do so.

    Even Jeff admits that SELECTs do not always return rows in order of the clustered index. It's the update that he argues about and that no one has yet been able to break on current versions of SQL.

    Do we have here any developer invloved in the development of SQL Server itself, who could put some light on this?

    One of the lead developers of the query optimiser perhaps? Check Conor Cunningham's bloc (Conor vs SQL), he's got posts on implied ordering.

    Gail Shaw
    Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
    SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

    We walk in the dark places no others will enter
    We stand on the bridge and no one may pass
  • Dave and Gail,

    I do totally agree with you on use of "ORDER BY" to ensure the order. Sorry, of cause data is stored logicaly and retrieved in the logical order.

    However,

    I have tried the Conor Cunningham's example and of cause it starts to use parallelism after 60000 rows inserted. But, as soon as I removed the WHERE condition from the query, plan reverted back to using only index scan and returning records in order of clustered index. Adding another 1,000,000 rows didn't changed anything as SQLServer is smart enough to understand that using parallelism will not make any change when returning all rows from table unconditionally (isn't not using WHERE clause, is one of the Jeff Moden rules for "quirky update").

    So, sample of simple unconditional select query where it doesn't work, anyone?

    I would love to see an example of unconditional joinless select query from a table with clustered index which returns the table data not in the index order. That whould be a killer for such discussions.

    Link to Conors vs SQL:

    http://blogs.msdn.com/b/conor_cunningham_msft/archive/2008/08/27/no-seatbelt-expecting-order-without-order-by.aspx

    _____________________________________________
    "The only true wisdom is in knowing you know nothing"
    "O skol'ko nam otkrytiy chudnyh prevnosit microsofta duh!":-D
    (So many miracle inventions provided by MS to us...)

    How to post your question to get the best and quick help[/url]

  • Eugene Elutin (6/22/2010)


    Sorry, of cause data is stored logicaly and retrieved in the logical order.

    There are conditions under which SQL can retrieve data in the physical order instead of the logical order (table lock, read uncommitted are the ones I know). If it does that and the index is fragmented, the data will be retrieved in the physical order, not the logical and, with fragmentation the two will not match.

    This, on my laptop, returns different orders, sometimes (not every time...). Yes, I've forced a table lock because SQL doesn't pick one automatically. Bear in mind that I'm running single user, single query on my laptop. On a busy server with heavier used databases, SQL may choose to do a table lock because of lock memory pressure.

    CREATE TABLE TestingOrder (

    id int identity,

    SomeGuid uniqueidentifier primary key, -- clustered index

    filler char(300) default ' '

    )

    insert into TestingOrder (SomeGuid)

    SELECT top (120000) NewID()

    FROM master.sys.columns a cross join master.sys.columns br

    update TestingOrder SET SomeGuid = newID() -- fragmentation central. 79% fragmentation after

    select * from TestingOrder with (tablock) option (maxdop 1) -- no parallelism, table lock

    select * from TestingOrder order by SomeGuid option (maxdop 1)

    drop table TestingOrder

    Gail Shaw
    Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
    SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

    We walk in the dark places no others will enter
    We stand on the bridge and no one may pass
  • Attached is the execution plan, if anyone's interested. Both are simple clustered index scans, no other operators are present.

    The notable difference is the property of the clustered index scan Ordered. False in the first case, true in the second.

    Gail Shaw
    Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
    SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

    We walk in the dark places no others will enter
    We stand on the bridge and no one may pass
  • Thank You, Gail.

    It almost there. The reason of getting data not in order is using TABLOCK hint.

    But now I have another question, when would SQL Server place a TABLOCK on a simple uncoditional select? Default isolation level in SQL is READ COMMITTED isn't it? Beeing in this level, SQL uses shared locks when reading the data. TABLOCK is a hint which also tels SQL to place shared lock on the table. But it looks like it does something extra as removing this hint from your query will make results returned in order of the clustered index isn't it?

    Actually, Conor Cunningham's example also creates highly fragmented table as it inserts random numbers into the slustered index column, so if I would add WITH (TABLOCK) into there I would have the same picture.

    So, what about simple "SELECT Col1, Col2, ... FROM MyTable" with no "WHERE", no joins and no hints.

    Let's try it together, I really want to find the example where it is possible.

    _____________________________________________
    "The only true wisdom is in knowing you know nothing"
    "O skol'ko nam otkrytiy chudnyh prevnosit microsofta duh!":-D
    (So many miracle inventions provided by MS to us...)

    How to post your question to get the best and quick help[/url]

  • Eugene Elutin (6/22/2010)

    Let's try it together, I really want to find the example where it is possible

    Why (other than curiosity ) ? It seems a lot simpler to me to add an Order by clause regardless as to weather the current code base requires it or not, and be certain that your data will be ordered in ALL circumstances now and in the future.



    Clear Sky SQL
    My Blog[/url]

  • Dave Ballantyne (6/22/2010)


    Eugene Elutin (6/22/2010)

    Let's try it together, I really want to find the example where it is possible

    Why (other than curiosity ) ? It seems a lot simpler to me to add an Order by clause regardless as to weather the current code base requires it or not, and be certain that your data will be ordered in ALL circumstances now and in the future.

    Don't worry, I do aways use Order By myself and I don't propogate the opposite So, it is nothig other than curiosity.

    I want to find what would make SQL Server to use TABLOCK on a simple select. I am playing with memory allotments for SQL now, hopefully I will get some answers...

    _____________________________________________
    "The only true wisdom is in knowing you know nothing"
    "O skol'ko nam otkrytiy chudnyh prevnosit microsofta duh!":-D
    (So many miracle inventions provided by MS to us...)

    How to post your question to get the best and quick help[/url]

  • Attmept to set max memory to its minimum value of 16Mb caused SQL to stop executing any query on midium sized tables :w00t:.

    Increased the max memory allotment to 32Mb, made select query to run as a slug, BUT: data returned in the order of the clustered index (on a highly fragmented table) :cool:.

    M-da, what else to try?

    _____________________________________________
    "The only true wisdom is in knowing you know nothing"
    "O skol'ko nam otkrytiy chudnyh prevnosit microsofta duh!":-D
    (So many miracle inventions provided by MS to us...)

    How to post your question to get the best and quick help[/url]

  • As I said

    GilaMonster (6/22/2010)


    Yes, I've forced a table lock because SQL doesn't pick one automatically. Bear in mind that I'm running single user, single query on my laptop. On a busy server with heavier used databases, SQL may choose to do a table lock because of lock memory pressure.

    Yes, I have seen SQL do that on busier servers (take a tablock for a query without any hints present). It's not something I can easily replicate on a personal machine, hence the hint.

    p.s. simply limiting the size of the buffer pool is not enough to cause lock memory pressure. You also need more than one set of locks, ie lots of concurrent queries.

    p.p.s One test is not enough to conclude that the results are 'ordered'. I ran the posted code 5 times before I got a result set that was noticeably out of order.

    Gail Shaw
    Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
    SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

    We walk in the dark places no others will enter
    We stand on the bridge and no one may pass
  • Gail,

    I've run test more than 5 times (and I did few concurrent queries as well). Please believe me I've tried the best (on my machine and on aour dev server). But I failed. Actually, adding TABLOCK hint makes dissorder straight away even for morderately small tables.

    I do feel that it may be achievable, but it is hard to do.

    I guess I should stop with "it is not easily replicatable in practice but teoreticaly possible" conclusion.

    _____________________________________________
    "The only true wisdom is in knowing you know nothing"
    "O skol'ko nam otkrytiy chudnyh prevnosit microsofta duh!":-D
    (So many miracle inventions provided by MS to us...)

    How to post your question to get the best and quick help[/url]

  • Eugene Elutin (6/22/2010)


    I do feel that it may be achievable, but it is hard to do.

    On an idle/barely used machine, yes it it. You need to simulate enough load for SQL to switch to table locks.

    I guess I should stop with "it is not easily replicatable in practice but teoreticaly possible" conclusion.

    Again, on an idle machine. On busy servers it's not theoretical, I've seen it happen and I've seen more than one forum thread complaining about order.

    Gail Shaw
    Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
    SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

    We walk in the dark places no others will enter
    We stand on the bridge and no one may pass
  • Joe Celko (6/23/2010)


    ...

    This is one of many reasons that good SQL programmers never use the IDENTITY property (it is not really a column at all).

    ...

    Of cause!

    Only very bad SQL programmers use it. Now I realise its purpose! Actually, they (MS) added this feature to SQL Server only to help to identify good ones from bad ones.:-D

    Next time, on iterview, I will need to ask candidate only one question: Did you ever used IDENTITY property? The prefered answer would be "Never heard of it" :hehe:

    Would love to know, what other reasons from "many reasons" to never use it?

    _____________________________________________
    "The only true wisdom is in knowing you know nothing"
    "O skol'ko nam otkrytiy chudnyh prevnosit microsofta duh!":-D
    (So many miracle inventions provided by MS to us...)

    How to post your question to get the best and quick help[/url]

  • {edit - comment withdrawn}

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.
    "Change is inevitable... change for the better is not".

    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)
    Intro to Tally Tables and Functions

  • Joe Celko (6/23/2010)


    ...

    If you have some of my books, you can read a few thousand words on this topic. I don't have my usual "cut & paste" here right now, but Microsoft did not put it into SQL Server. Sybase did. it comes from an old 16-bit UNIX file system in the 1980's. UNIX treats files as one long string and assumes sequential processing to extent that string on the end. No parallelism, no set processing.

    ...

    Hi Joe, since when are you back to a dusty Texas? Or how you managed to get internet connection in your Borneo hut?

    Ops, it's not you? Sorry mate. Could you please advise on the following: What percentage of SQL functionality should not be used as it was introduced by Sybase back in 80's?

    Joe's rants regarding identity columns and cry "columns are not fields, rows are not records ..." are well known to the people who are in the industry for quite a while. Believe me, if the rest of committee would support all Joe's opinions, MS would remove identity functionality from SQL Server long ago (as well as many other which has no play in data model theories).

    Identity column is a perfect candidate for the surrogate primary key.

    It doesn't need to be attribute of an entity or has meaning in a data model.

    Your example does misslead a lot of SQL beginer developers. It is not the best idea to use VIN number (or Social Security Number or any other such number) as a priamry key of the database table. You better to have it as alternate natural key (you can even make it unique, if of cause it is unique, just remember that even Tax numbers potentially can be reused and GUID can be duplicated in theory!). Identity columns do not prevent parallel processing in SQL Server (or Oracle or any other RDBs where it is implemented).

    About "your" books: Trees and Hierarchies is not bad for ones who like to know a lot of teoretical aspects. Was it printed around 2005?. Actually, I am aware of M.J. Kamfonas ideas since yearly 90's. This Joe's book is good for students but not very helpfull for practical implementations.

    _____________________________________________
    "The only true wisdom is in knowing you know nothing"
    "O skol'ko nam otkrytiy chudnyh prevnosit microsofta duh!":-D
    (So many miracle inventions provided by MS to us...)

    How to post your question to get the best and quick help[/url]

  • Eugene Elutin (6/22/2010)


    I guess I should stop with "it is not easily replicatable in practice but teoreticaly possible" conclusion.

    SET NOCOUNT ON;

    GO

    CREATE TABLE dbo.Example

    (

    row_id INTEGER IDENTITY NOT NULL PRIMARY KEY CLUSTERED,

    data AS CONVERT(VARCHAR(11), row_id) PERSISTED NOT NULL UNIQUE

    );

    GO

    INSERT dbo.Example (data)

    DEFAULT VALUES;

    GO 10

    -- Comes back in data order

    SELECT E.row_id, E.data

    FROM dbo.Example E;

    GO

    -- Comes back in row_id order

    SELECT E.row_id, E.data

    FROM dbo.Example E

    ORDER BY E.row_id;

Viewing 15 posts - 16 through 30 (of 39 total)

You must be logged in to reply to this topic. Login to reply