Optimize hint has huge implications. Why?

  • I was going to get back involved here, but after what appears to have happened over the weekend and this morning, I'll pass.

    Nico, did Ninja use the best phrasing in the world? No. You have blown this all out of consideration however. I would definately recommend reading the myths article on temp tables vs. table vars, but other than that, I'll step aside. There are a few other things that would definately help you here and could be shown, but I no longer think spending the time is worth it.

    I wish you well, sir. Hopefully when we next meet here you'll have calmed some. For starters... lay off the bold.


    - Craig Farrell

    Never stop learning, even if it hurts. Ego bruises are practically mandatory as you learn unless you've never risked enough to make a mistake.

    For better assistance in answering your questions[/url] | Forum Netiquette
    For index/tuning help, follow these directions.[/url] |Tally Tables[/url]

    Twitter: @AnyWayDBA

  • I'm pretty sure you meant

    I'm NOT trying to fight with you,

    😀

    Ah! the joys of not being f2f :hehe:.

  • nico van niekerk (12/5/2011)


    Grant Fritchey (12/5/2011)


    nico van niekerk (12/4/2011)


    ...Table variables are created in memory with the current batch as scope exclusively. #temp and other tables are explicitly created "on disk" as you put it, and they don't "spill" over onto disk as table variables do. That table variables can spill over onto disk is a true statement, but with a table of about a fixed 30K, one can safely assume that this table would not spill on to disk, which was a major consideration in this solution. [/b]

    Hey Nico,

    One point. Table variables get disk allocation out of the gate. Check this blog post[/url]. Use the check to see where the table variable is stored. I suspect you'll be surprised.

    The blog you cited states "Other than that, [temporary tables and table variables] will reside completely in memory or will swap out to the disk through tempdb, depending on their size." In my particular case study that I posted on SSC, I was very clear on where the table variable would reside because of its very small size. In memory, as your reference clearly confirms. That was my only consideration. It had to be a table variable and not a temporary table, either, not because of memory, but overhead advantages that a table variable has, some of which are logging and statistics.

    Incidentally, the very small size of the table also makes indexes on it a burden rather than a benefit. Now there might be a lure in that statement for some to go off on a tangent again.

    But the same size temp table would also stay in memory then. I was just talking about this specific statement:

    Table variables are created in memory with the current batch as scope exclusively. #temp and other tables are explicitly created "on disk" as you put it, and they don't "spill" over onto disk as table variables do.

    Table variables are created "on disk" in the same way as temporary tables. That's all I'm trying to say.

    ----------------------------------------------------The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood... Theodore RooseveltThe Scary DBAAuthor of: SQL Server 2017 Query Performance Tuning, 5th Edition and SQL Server Execution Plans, 3rd EditionProduct Evangelist for Red Gate Software

  • Ninja's_RGR'us (12/5/2011)


    I'm pretty sure you meant

    I'm NOT trying to fight with you,

    😀

    Ah! the joys of not being f2f :hehe:.

    Oh crap!

    I edited that. Oops. I'm sure that didn't help at all.

    ----------------------------------------------------The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood... Theodore RooseveltThe Scary DBAAuthor of: SQL Server 2017 Query Performance Tuning, 5th Edition and SQL Server Execution Plans, 3rd EditionProduct Evangelist for Red Gate Software

  • Ninja's_RGR'us (12/5/2011)


    There are probably 1000 myths about @t vs #t.

    The article I linked to list all the pros & cons of each.

    http://qa.sqlservercentral.com/articles/Temporary+Tables/66720/

    No offense taken.

    We're good. Thanks for clarifying.

  • Grant Fritchey (12/5/2011)


    Ninja's_RGR'us (12/5/2011)


    I'm pretty sure you meant

    I'm NOT trying to fight with you,

    😀

    Ah! the joys of not being f2f :hehe:.

    Oh crap!

    I edited that. Oops. I'm sure that didn't help at all.

    Authors!, they always need a tech reivew! 😀

    Pre-dit. Yes I saw the typo before hitting psot. But it's funnier that way! :w00t:

  • Ninja's_RGR'us (12/5/2011)


    Grant Fritchey (12/5/2011)


    Ninja's_RGR'us (12/5/2011)


    I'm pretty sure you meant

    I'm NOT trying to fight with you,

    😀

    Ah! the joys of not being f2f :hehe:.

    Oh crap!

    I edited that. Oops. I'm sure that didn't help at all.

    Authors!, they always need a tech reivew! 😀

    Pre-dit. Yes I saw the typo before hitting psot. But it's funnier that way! :w00t:

    Oh gosh, don't say review. I'm through the roof with one of my editors. They're asking for crazy stuff. I'm just stopping work on their book for a while & going to work on the other two (which have appropriately mean, but fair, editors).

    ----------------------------------------------------The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood... Theodore RooseveltThe Scary DBAAuthor of: SQL Server 2017 Query Performance Tuning, 5th Edition and SQL Server Execution Plans, 3rd EditionProduct Evangelist for Red Gate Software

  • Grant Fritchey (12/5/2011)


    Ninja's_RGR'us (12/5/2011)


    Grant Fritchey (12/5/2011)


    Ninja's_RGR'us (12/5/2011)


    I'm pretty sure you meant

    I'm NOT trying to fight with you,

    😀

    Ah! the joys of not being f2f :hehe:.

    Oh crap!

    I edited that. Oops. I'm sure that didn't help at all.

    Authors!, they always need a tech reivew! 😀

    Pre-dit. Yes I saw the typo before hitting psot. But it's funnier that way! :w00t:

    Oh gosh, don't say review. I'm through the roof with one of my editors. They're asking for crazy stuff. I'm just stopping work on their book for a while & going to work on the other two (which have appropriately mean, but fair, editors).

    What the heck can they be asking for that is SO outrageous?

  • Ninja's_RGR'us (12/5/2011)


    Grant Fritchey (12/5/2011)


    Ninja's_RGR'us (12/5/2011)


    Grant Fritchey (12/5/2011)


    Ninja's_RGR'us (12/5/2011)


    I'm pretty sure you meant

    I'm NOT trying to fight with you,

    😀

    Ah! the joys of not being f2f :hehe:.

    Oh crap!

    I edited that. Oops. I'm sure that didn't help at all.

    Authors!, they always need a tech reivew! 😀

    Pre-dit. Yes I saw the typo before hitting psot. But it's funnier that way! :w00t:

    Oh gosh, don't say review. I'm through the roof with one of my editors. They're asking for crazy stuff. I'm just stopping work on their book for a while & going to work on the other two (which have appropriately mean, but fair, editors).

    What the heck can they be asking for that is SO outrageous?

    Here you go. Draw a picture of the Configuration Manager. And no, not a picture of the GUI. Draw a picture of what it does.

    ----------------------------------------------------The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood... Theodore RooseveltThe Scary DBAAuthor of: SQL Server 2017 Query Performance Tuning, 5th Edition and SQL Server Execution Plans, 3rd EditionProduct Evangelist for Red Gate Software

  • Nico, I didn't intend to insult you personally. I'm sure you are a bright guy like most everyone else who hangs out here.

    And yet, your statements were just wrong.

    one cannot create indexes on table definitions

    Clearly wrong. Authority was cited and examples of how to do it were posted.

    Its residence in memory makes it faster than an indexed table

    Also wrong, or to be kind, it might be correct under some circumstances. It should never be an absolute assertion. The optimizer may be crippled by use of table variables because of the lack of statistics kept on them. As Grant pointed out, statistics are used by the optimizer to choose optimal execution plans. You say you only had 2500 rows in it, but you don't know where the breaking point will come that will throw the optimizer into a poor choice. Is it at 10,000 rows? 5,000? 3,000?

    The danger with others reading blanket assertions like these is that they stick in people's minds and turn into superstitions. That's why statements get challenged. I say this with the utmost humility as I've had more than my own share of statements shot down. It made me paranoid. It made me start reading and testing until I understood thoroughly what was actually happening. It was a good thing.

    When a number of knowledgeable people disagree with you, and point you to where documentation differs from your statements, at least consider the possibility that you made a statement or two in error.

    __________________________________________________

    Against stupidity the gods themselves contend in vain. -- Friedrich Schiller
    Stop, children, what's that sound? Everybody look what's going down. -- Stephen Stills

  • bteraberry (12/5/2011)


    Ninja's_RGR'us (12/5/2011)


    There are probably 1000 myths about @t vs #t.

    The article I linked to list all the pros & cons of each.

    http://qa.sqlservercentral.com/articles/Temporary+Tables/66720/

    Good article, but I would highlight a point that could easily be overlooked in the body of it: "SQL cannot build statistics against a table variable, like it can against temporary tables." This isn't a difference between the two, it's the main difference that should guide usage choices in a majority of cases. (Of course, there are always exceptions to every rule.)

    So the server inherently spends the overhead on the histogram for the temp table, which, if you need it, is great and otherwise is a waste. I've seen the choice of use be determined by so many irrelevant questions without this key issue being addressed. If you need the statistics then use a temp table. If you're going to scan the whole thing and use all the records in it anyway, save the overhead and stick to a table variable.

    Yes, exactly my point.

    Thanks.

  • Grant Fritchey (12/5/2011)


    nico van niekerk (12/2/2011)


    The original query's construct, somehow, allowed SS to rely on the statistics alone and not look at the query from a new perspective. The OPTIMIZE hint steered it in the right direction. The object of this exercise was to see if a query with a different construct would steer SS to look at the query rather than the statistics.

    Just some comments on the feedback: Pre-aggregation wasn't an option, although it's a good strategy of which I am very fond, but we were chasing an execution time of around 10 seconds and pre-aggregating 420 million rows would trash that expectation before we even got close to the resultset.

    The temp table was not #tmp but @Tmp, a table definition and one cannot create indexes on table definitions. Its residence in memory makes it faster than an indexed table, which is why I favored it in the final solution, and because it is very narrow and only contains about 2,500 rows. My previous use of a table definition to host FileSummaries, didn't change the query's construct and still left SS in the dark as to the intent of the query.

    Here is the solution that I came up with which runs in between 9 and 12 seconds without using OPTIMIZE FOR with the same statistics.

    DECLARE @Time datetime2

    Set @Time = '2009-01-01'

    declare @filesummaries table (starttime datetime null, [endtime] datetime null,mshtq_in_cml bit not null)

    insert into @filesummaries ([starttime],[endtime],mshtq_in_cml) select starttime, endtime,mshtq_in_cml from filesummaries where mshtq_in_cml & 1 = 1 and StartTime < @time

    select count(*) as Occurences,case when r.mshtq < 0 then cast(r.mshtq/10 as int)*10-5 else cast(r.mshtq/10 as int)*10+5 end as BinMean from rawimc r join (select [starttime],[endtime] from @filesummaries) f on r.devicetimestamp between f.[starttime] and f.[endtime]

    group by case when r.mshtq < 0 then cast(r.mshtq/10 as int)*10-5 else cast(r.mshtq/10 as int)*10+5 end

    order by BinMean

    My best guess is that the JOINs in the first query didn't give SS a clear indication of its intent and it reverted back to the statistics, while this query is less ambiguous.

    Thanks to everybody who contributed to this issue. It's much appreciated.

    Thanks

    Nico

    I'm not trying to fight with you, at all.

    I know, Grant. It's much appreciated.

    You refer several times to the optimizer relying too much on statistics and not looking enough at the intent of the query. I'm confused by what you're trying to say. The optimizer always looks at the query. It has too. And then it looks at the statistics of the objects in the query to make the decisions it makes regarding the objects included, how they're joined, scans vs. seeks, etc. This seperation that you're placing between statistics and the query, as if one operates without the other doesn't make sense to me.

    And you are exactly right; it didn't make any sense to me either, which precipitated this entire case study. That the query ran 9+ hours without the OPTIMIZE FOR hint and only 90 seconds with it, raised a red flag about the query construct or the statistics.

    I don't think you posted the new execution plan

    Correct. I'll rerun it and post it. I should have done it. Sorry. But since it took under 10 seconds to do a 9 hour run, I must've thought it didn't matter. Actually, I was so thrilled that the problem was solved, I didn't think about it. 😀

    From what I can see, reducing the data set by loading the table variable prior to running the main part of the query provides different filtering opportunities for the optimizer which resulted in faster execution.

    That was exactly the essence of my thinking when I really sat down with some quiet time and tried to figure what would benefit the query optimizer the most. Give it the least amount of rows as possible to process.

    Just guessing without the exec plan. But, you might get even better performance with a temp table since it will have statistics. Because you're loading ~2500 rows and you're joining to the table variable, I think that it's posible that it's still hurting your performance.

    Because the table variable is so small, I figured that without the overhead of statistics or minimal disk I/O, I might gain some time. It's an interesting thought, but in very large databases (several billion rows per table) I always had best performance by reducing overhead and even shifting the overhead to smaller tables that I create on the fly. My perception was that it would work here, too, especially if I wanted the main table with 420 million rows to join with this small table. It was just an educated guess and certainly open to debate and improvement.

    Sorry if you think my statements here are trivial or off topic

    none of the above.

    Actually, your observations are very astute and I appreciate them. However, you can see why this particular problem piqued my interest and why I chose to share it with others.

    Thanks,

    Nico

  • Evil Kraig F (12/5/2011)


    I was going to get back involved here, but after what appears to have happened over the weekend and this morning, I'll pass.

    Nico, did Ninja use the best phrasing in the world? No. You have blown this all out of consideration however. I would definately recommend reading the myths article on temp tables vs. table vars, but other than that, I'll step aside. There are a few other things that would definately help you here and could be shown, but I no longer think spending the time is worth it.

    I wish you well, sir. Hopefully when we next meet here you'll have calmed some. For starters... lay off the bold.

    Good advice. Typing the bold-tags became a drag.

    It's not Ninja's or others phrasing that bugged me, Kraig. It's their pursuit of trivial things while the essence of the case study was somehow not even on their radar screens. Truth be told, I was more frustrated and worried than bugged.

    No hard feelings.

  • nico van niekerk (12/5/2011)


    bteraberry (12/5/2011)


    Ninja's_RGR'us (12/5/2011)


    There are probably 1000 myths about @t vs #t.

    The article I linked to list all the pros & cons of each.

    http://qa.sqlservercentral.com/articles/Temporary+Tables/66720/

    Good article, but I would highlight a point that could easily be overlooked in the body of it: "SQL cannot build statistics against a table variable, like it can against temporary tables." This isn't a difference between the two, it's the main difference that should guide usage choices in a majority of cases. (Of course, there are always exceptions to every rule.)

    So the server inherently spends the overhead on the histogram for the temp table, which, if you need it, is great and otherwise is a waste. I've seen the choice of use be determined by so many irrelevant questions without this key issue being addressed. If you need the statistics then use a temp table. If you're going to scan the whole thing and use all the records in it anyway, save the overhead and stick to a table variable.

    Yes, exactly my point.

    Thanks.

    I'll humor this (as I'll agree to a point).

    So you get 2500 rows in that table?

    Let's call that 25 pages?

    How long to build stats on 25 pages for 1-3 columns? That's easier to count in micro seconds than in milli seconds assuming any recent (last 10 years) hardware.

    That would be an interesting test to do side by side to see what's the real cost of saving 1M records either in @t or #t.

    I'm sure there's a difference. The real point is how much difference (before starting to draw any sort of a conclusion).

  • Because the table variable is so small, I figured that without the overhead of statistics or minimal disk I/O, I might gain some time. It's an interesting thought, but in very large databases (several billion rows per table) I always had best performance by reducing overhead and even shifting the overhead to smaller tables that I create on the fly. My perception was that it would work here, too, especially if I wanted the main table with 420 million rows to join with this small table. It was just an educated guess and certainly open to debate and improvement.

    We're not talking about @t vs #t anymore. This is divide and conquer. A very different concept that has more than proven its worth.

    I don't work with anything as large as 420M rows. But from all the tests I've done, stats have never proven to be bad or detrimental to performance. They either helped the query or cost only a few MS to create with no apparent gain. That being said I know they help protect from bad plans so that 1-2 ms imho is well worth it.

    That being said I don't work on queries that run 1M times / day nor on 10 GB datasets. So my view might be skewed because of that.

Viewing 15 posts - 31 through 45 (of 46 total)

You must be logged in to reply to this topic. Login to reply