Trying to speed up this 'GetWorkingDays' function.

  • Jason A. Long - Sunday, October 29, 2017 3:15 AM

    Okay... where to begin???

    First... I haven't been putting this off... the exact opposite actually. I've been sucked in... It the posting updates, getting a full nights sleep and asking for help, that have been putt off for too long... 

    Heh... that's one of the things that I admire about you.  You get sucked in on trying to understand things as much as I do.  That and you're as sleep deprived as I am because of your intellectual curiosity. πŸ˜‰

    If the roles were reversed, and I was the one who stepped up and delivered a package of that caliber, to someone else... At the very least, I'd want SOME feedback and some test results. No excuses, I should have been back here with my preliminary results on Wednesday. Long story short, I saw something I've never seen before, and went down the rabbit hole after it... I can only hope that you can accept my apology, I really am sorry.


    Oh, don't be sorry.  If you're going to analyze something so strange, it takes a whole lot of time. 

    An explanation is in order...

    It's 4:30 AM so it won't a great explanation, but here goes...

    #1... The function you supplied is fast... It's really fast... When I put it against the last few that I'd been working with, it blew their doors off. anywhere from 0.9 - 1.3 secs vs 4.5 -5.5 secs... and that's the part that's been throwing me for a loop.

    I've seen my fair share of bad estimates. Sometimes it's takes longer than I'd like to figure it out, but I get there and usually learn something new along the way...

    What sucked me in, wasn't the fact that, your function was crushing mine... It's the fact that it shouldn't have.

    I'm making the assumption that you think that because of the cost estimates and the "% of Batch" values on the Actual Execution Plan.  Before I comment on that, let's see what else you're talking about...

    I can't tell you how much time I've killed staring at execution plan node values... Looking for bad estimates, mismatches between estimated and actual rows... any thing...

    And that's what's weird... the actuals and estimates were consistently a smack on match...
     

    Heh... I know, huh?  I found out a long time ago that this is one of the silent killers in performance tuning that frequently cause people to select the slower code rather than the faster code.

    The Estimated and Actual Execution plans are fantastic for seeing what is going on but even the Actual Execution plan is fraught with estimates when it comes to measuring cost, which a lot of people interpret to have a direct correlation with duration.  While that is a frequently exhibited trait, the cost estimations aren't always accurate because... they're estimates. 

    Someone like Grant Fritchey or Gail Shaw or Paul White could explain it a whole lot better but, in this case, I believe this disparity between cost and duration is because of one and only one thing... the optimizer over estimates the cost of looking at a table that resides in memory because of the reads two single lookups per row generate compared to the cost of building the inline lookup table and essentially creating a CROSS JOIN on it, which is mostly CPU (and not much of that!) and potentially zero reads.

    Still, I don't know for sure.  I only know how it performs in comparison to other methods and haven't had the time to figure out the exact reason.  I'm just a "user"  for this particular problem. πŸ˜‰

    Let me start by saying that, I make no claims about having mastered the art of reading execution plans... but I've never seen the optimized make a 100% accurate prediction about row count and an be so off when it comes to duration. So, I figured it should be easy enough figure out a 3 node trivial plan... No clues... I wanted to write it off as yous going parallel and mine not... but nothing was adding up.

    Fast forward... alot... I've got    DBCC SETCPUWEIGHT(1000); &  DBCC SETIOWEIGHT(1); the damned thing are going parallel and I see "something"...

    All of the actual numbers add up to roughly 1/4 second, until it hits the gather streams and then just stops hangs out, drinks a beer.
    So, where I'm at now... I started dumping sys.dm_os_wait_stats into a table with a time stamp and comparing the changes...
    capture_dt    wait_type    c_waiting_tasks_count    c_wait_time_sec
    2017-10-29 00:04:47.4415917    CXPACKET    NULL    NULL
    2017-10-29 00:51:42.3217744    CXPACKET    58    116.832000
    2017-10-29 01:05:48.6912441    CXPACKET    55    122.180000
    2017-10-29 01:21:27.8460532    CXPACKET    55    116.422000
    2017-10-29 01:30:08.0868170    CXPACKET    54    117.491000

    So... 2 mins per test round of CXPACKET waits...

    The problem is, the more research I do, the more this sounds like a symptom, than the actual problem... Meaning I'm still in the dark about the actual problem.

    I typically don't delve into waits of any nature for the very reason you mention.  It may be a fault on my part but, unless I've proven a hardware problem, my take on waits is that they are a symptom of the code rather than a cause of slower code.  Again, I'm more of just a "user" in that aspect but when people get into a list of waits, etc, etc, my eyes glaze over and my mind wanders to the code itself.  That's why I'll frequently respond to people posting about performance problems they think they've detected using waits and how to fix the waits to get better performance, etc, etc, ad infinitum with a simple "You need to fix the code.  See the 2nd link in my signature if you want help there".  πŸ˜‰

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.
    "Change is inevitable... change for the better is not".

    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)
    Intro to Tally Tables and Functions

  • Heh... that's one of the things that I admire about you. You get sucked in on trying to understand things as much as I do. That and you're as sleep deprived as I am because of your intellectual curiosity. πŸ˜‰


    I appreciate your understanding... The funny thing is, I always know, from the start, it's a "monkey trap"... If you're not familiar with the term, do a quick search... and know know immediately why I call it that.

    Oh, don't be sorry. If you're going to analyze something so strange, it takes a whole lot of time.


    You're not kidding... This thing is really giving me fits. In a way, I'm glad though... #1: It's been a solid reminder that, me "thinking" something is true, doesn't make it true. SQL Server has made it very clear that it doesn't care what I think nor does it seem interested in my very persuasive arguments.  #2: It's providing the motivation to take a deeper dive into a few areas that were already on my "things you need to learn" list... chief among them, Extended Events and in close 2nd, making better/smarter use of the DMVs.

    I'm making the assumption that you think that because of the cost estimates and the "% of Batch" values on the Actual Execution Plan. Before I comment on that, let's see what else you're talking about...


    I'll admit that I do look at those percentages, but the only time I find them to actually be useful is then doing an "apples to apples" comparison... If two queries are essentially doing the same thing, touching the same tables and returning the same data... That said, I most use them as a rough gauge to see if the optimizer is lying to me. So if there is a strong correlation between the percentages and the execution times, I have a higher confidence in the execution plan's accuracy than I do when they are (or at least appear to be) completely disconnected.

    For the most part, I try to used Grant's "fat pipes" method of working from left to right and follow the path of the fattest pipe and paying attention to the pipe (row counts) diameter changes as I move across to the right and down...

     It's the fact that it shouldn't have.

     Is based on the simple math... On one hand, you have a query that involves only a single scan of a million row table and compute scalar. According to the "actual" plan, it's only doing a total of 3,599 reads (using Sentry Plan Explorer) in total for the entire execution. Additionally, it's assigned 92% of that total plan cost to the large table scan and is reporting that only took 154 ms...

    On the other hand, you have another query, that is doing the exact same scan, plus adding two left, nested loop, joins (plus a 3rd nested loop to bring it all back together) and a it's own compute scalar (admittedly not even close to being the same compute scalar) a grand total of 4,003,645 reads.

    I've attached both plans and a SentryOne Plan Explorer plan. Odds are you'll see what I'm not seeing... I hope so anyway... Because, when I look at these side my side (ignoring total duration of course), I can't find anything that would lead me to think that the actual outcome would be what they are. 

    I think THAT is the part that has me wrapped around the axle on this... If you showed me both plans and told me both serve the same purpose and produced the same results, I'd throw it in the "no brainer" pile and ask you you even bother asking... And that thought alone, scares the hell out of me. How many times have I encountered this before and simply not noticed?

    Heh... I know, huh? I found out a long time ago that this is one of the silent killers in performance tuning that frequently cause people to select the slower code rather than the faster code.

    The Estimated and Actual Execution plans are fantastic for seeing what is going on but even the Actual Execution plan is fraught with estimates when it comes to measuring cost, which a lot of people interpret to have a direct correlation with duration. While that is a frequently exhibited trait, the cost estimations aren't always accurate because... they're estimates.

    Someone like Grant Fritchey or Gail Shaw or Paul White could explain it a whole lot better but, in this case, I believe this disparity between cost and duration is because of one and only one thing... the optimizer over estimates the cost of looking at a table that resides in memory because of the reads two single lookups per row generate compared to the cost of building the inline lookup table and essentially creating a CROSS JOIN on it, which is mostly CPU (and not much of that!) and potentially zero reads.

    Still, I don't know for sure. I only know how it performs in comparison to other methods and haven't had the time to figure out the exact reason. I'm just a "user" for this particular problem. πŸ˜‰


    I don't know how you do it or what you see that other people don't, but your "dark arts" stuff is the real deal...

    Speaking of which... I did take the time to play around with a few variations on your idea...The most successful version, involved making two new, identical, tables with just the DTSerial and WorkDayNumber columns... I then added clustered indexes  with "ALLOW_PAGE_LOCKS = OFF, ALLOW_ROW_LOCKS = OFF"... that yielded a decent improvement (~0.25 secs on average). I haven't tried to figure out if it's the separate tables or the fact that it's not issuing shared locks that it seems to like (my current $ in on the lack locks) but... as always... more testing to find out for sure. Plus I'd like to create a multi-session scenario where a few dozen jobs are running the function at the same time. 

    I typically don't delve into waits of any nature for the very reason you mention. It may be a fault on my part but, unless I've proven a hardware problem, my take on waits is that they are a symptom of the code rather than a cause of slower code. Again, I'm more of just a "user" in that aspect but when people get into a list of waits, etc, etc, my eyes glaze over and my mind wanders to the code itself. That's why I'll frequently respond to people posting about performance problems they think they've detected using waits and how to fix the waits to get better performance, etc, etc, ad infinitum with a simple "You need to fix the code. See the 2nd link in my signature if you want help there". πŸ˜‰


    I get what you're saying and you get no argument from me. As I see it... the execution plans, the wait stats or whatever measure were using... they're all just tools. The more tools you have, the the more options you have for fixing the the things that break.
    If you're my mechanic, and you have the ability to simply listen to the  motor and know exactly what's wrong... awesome! if you need $30K worth of diagnostic equipment to do the same thing... awesome!  I don't care which tool was used as long as the car get's fixed... 

    That's goal... fix the code. Unfortunately, it would seem that I'm lacking the necessary tools for this particular job... Heck at this point, I'm just trying to figure out which tools I need... after I figure that out, I'll still have to learn to use them.
    I suppose it's a good thing that I like new tools. πŸ˜€

    *** New wrinkle... I think I just found the problem while trying to upload the execution plans... I suspect that building a 3 node, 7MG execution plan might be a factor... I really do need to be punched in the face...

  • Jason A. Long - Sunday, October 29, 2017 5:41 PM

    *** New wrinkle... I think I just found the problem while trying to upload the execution plans... I suspect that building a 3 node, 7MG execution plan might be a factor... I really do need to be punched in the face...

    Sometimes it doesn't matter where the hints come from...as long as they arrive.  I've noticed that some of the ugliest, nastiest-looking plans sometimes make me take another look and find the problem.  So the question becomes: What's going on in there that made it 7 MB in size?

  • Ed Wagner - Sunday, October 29, 2017 6:49 PM

    Jason A. Long - Sunday, October 29, 2017 5:41 PM

    *** New wrinkle... I think I just found the problem while trying to upload the execution plans... I suspect that building a 3 node, 7MG execution plan might be a factor... I really do need to be punched in the face...

    Sometimes it doesn't matter where the hints come from...as long as they arrive.  I've noticed that some of the ugliest, nastiest-looking plans sometimes make me take another look and find the problem.  So the question becomes: What's going on in there that made it 7 MB in size?

    I don't want to say for certain... Looking at the XML itself, It certainly appears to be the 2 big CASE expressions.

    Of course this is also the output captured by using SET STATISTICS XML ON; ... not "Ctrl-M" version or what you can get from the DMVs.
    I'll check those tomorrow and see if the actual plans really are that big or if it's just in how I was capturing them.
    Of course, the two 27KB plans were captured at the exact same time in the exact same way... 

    Just one more reason for me to despise XML...

  • Jason A. Long - Sunday, October 29, 2017 5:41 PM

    I don't know how you do it or what you see that other people don't, but your "dark arts" stuff is the real deal...

    Heh... it's because I KISS (Keep It Super Simple) problems (that and the ability to quickly build shedloads of random constrained data to try things with).  I don't let my or other people's previous failures keep me from trying different things.  It's kind of like the old saying about the humble Bumble Bee.  Engineers say it has wings too small to fly it massive body.  The Bumble Bee doesn't know that and so it flies.

    To coin a phrase, "I know it's just a nail... All I have to do is find the right hammer to use", which sometimes does look like a screw driver or wings that are too small to fly. πŸ˜‰

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.
    "Change is inevitable... change for the better is not".

    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)
    Intro to Tally Tables and Functions

  • Jeff Moden - Sunday, October 29, 2017 7:50 PM

    Jason A. Long - Sunday, October 29, 2017 5:41 PM

    I don't know how you do it or what you see that other people don't, but your "dark arts" stuff is the real deal...

    Heh... it's because I KISS (Keep It Super Simple) problems (that and the ability to quickly build shedloads of random constrained data to try things with).  I don't let my or other people's previous failures keep me from trying different things.  It's kind of like the old saying about the humble Bumble Bee.  Engineers say it has wings too small to fly it massive body.  The Bumble Bee doesn't know that and so it flies.

    To coin a phrase, "I know it's just a nail... All I have to do is find the right hammer to use", which sometimes does look like a screw driver or wings that are too small to fly. πŸ˜‰

    Well... judging by your work... You've got one hell of an assortment of hammers to choose from... πŸ˜‰

  • The important part is to NOT use hammers that look like this... πŸ˜‰

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.
    "Change is inevitable... change for the better is not".

    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)
    Intro to Tally Tables and Functions

  • Jeff Moden - Monday, October 30, 2017 8:23 AM

    The important part is to NOT use hammers that look like this... πŸ˜‰

    Reminds me of my first true love... πŸ˜€

  • By the way... for those interested in batch runs, the inline code I wrote avoids the Cartesian Product altogether and only reads what necessary from the Calendar table.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.
    "Change is inevitable... change for the better is not".

    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)
    Intro to Tally Tables and Functions

  • Jeff Moden - Wednesday, November 8, 2017 10:26 PM

    By the way... for those interested in batch runs, the inline code I wrote avoids the Cartesian Product altogether and only reads what necessary from the Calendar table.

    Jeff - I'm actually in the process of getting your function ready for production. I've got a couple minor variations I want to try out.
    Also, I've been meaning to get the code together for the "dual clustered index / no row or page locks" version. 

    I don't know if you use Sentry Plan Explorer, here are the original and the variant going head to head... Plus their individual execution plans.
    I'll get the SQL together tomorrow and get it added as well.

  • Thanks for the followup, Jason.  This has been one of those interesting threads.  Maybe I missed something in the conversation, but what do you mean by "dual clustered index" version?

  • Ed Wagner - Thursday, November 9, 2017 5:36 AM

    Thanks for the followup, Jason.  This has been one of those interesting threads.  Maybe I missed something in the conversation, but what do you mean by "dual clustered index" version?

    If you look at Jeff's original function, you'll see that it's two selects making two separate equity joins to the calendar table. My idea / questions were, a) would the optimizer do anything differently if the two selects were each given their own copy of the table? and b) since updates to a calendar table are so rare, would denying row & page locks at the index level (preventing sql server from taking shared locks) speed things up? 
    One them is showing a benefit, but I haven't dug in to see which one it was.

  • Jason A. Long - Wednesday, November 8, 2017 11:55 PM

    Jeff Moden - Wednesday, November 8, 2017 10:26 PM

    By the way... for those interested in batch runs, the inline code I wrote avoids the Cartesian Product altogether and only reads what necessary from the Calendar table.

    Jeff - I'm actually in the process of getting your function ready for production. I've got a couple minor variations I want to try out.
    Also, I've been meaning to get the code together for the "dual clustered index / no row or page locks" version. 

    I don't know if you use Sentry Plan Explorer, here are the original and the variant going head to head... Plus their individual execution plans.
    I'll get the SQL together tomorrow and get it added as well.

    Here are the scripts for the "DCI" version of Jeff's design...

    SET NOCOUNT ON;
    GO

    GO
    DBCC FLUSHPROCINDB(16) WITH NO_INFOMSGS;
    DBCC DROPCLEANBUFFERS WITH NO_INFOMSGS;
    PRINT(CONCAT(CHAR(13), CHAR(10), N'   wait a moment...', CHAR(13), CHAR(10)));
    WAITFOR DELAY '00:00:01';
    GO
    -- SET STATISTICS XML ON;
    GO
    DECLARE @_clock_start DATETIME2(7) = SYSDATETIME(), @_test_name NVARCHAR(70) = N'GetBusinessDays_DCI';
    PRINT(CONCAT(N'â•”', REPLICATE(N'â•', 148), N'â•—
    â•‘', LEFT(CONCAT(N'    Start Time: ', @_clock_start, N'   Test Name: ', ISNULL(NULLIF(@_test_name, N' '), N'Test Query 1'),
    REPLICATE(N' ', 100)), 148), N'â•‘', CHAR(13), CHAR(10), N'â•š', REPLICATE(N'â•', 148), N'â•')); SET STATISTICS IO ON;
    -- ____________________________________________________________________________________________________________________________________________
    -- ⇩⇩⇩⇩⇩⇩ place tsql here ⇩⇩⇩⇩⇩⇩⇩⇩⇩⇩⇩⇩⇩⇩⇩⇩⇩⇩⇩⇩⇩⇩⇩⇩⇩⇩⇩⇩⇩⇩⇩⇩⇩⇩⇩⇩⇩⇩⇩⇩⇩⇩⇩⇩⇩⇩⇩⇩⇩⇩⇩⇩⇩⇩⇩⇩⇩⇩⇩⇩
    DECLARE @_data_dump_int INT;
    SELECT
        @_data_dump_int = dci.BusinessDays
    FROM
        dbo.TestData td
        CROSS APPLY dbo.GetBusinessDays_DCI(td.beg_dt, td.end_dt) dci

    -- ⇧⇧⇧⇧⇧⇧ place tsql here ⇧⇧⇧⇧⇧⇧⇧⇧⇧⇧⇧⇧⇧⇧⇧⇧⇧⇧⇧⇧⇧⇧⇧⇧⇧⇧⇧⇧⇧⇧⇧⇧⇧⇧⇧⇧⇧⇧⇧⇧⇧⇧⇧⇧⇧⇧⇧⇧⇧⇧⇧⇧⇧⇧⇧⇧⇧⇧⇧⇧
    -- ‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾
    DECLARE @_clock_stop DATETIME2(7) = SYSDATETIME(); SET STATISTICS IO OFF; PRINT(CONCAT(N'â•”', REPLICATE(N'â•', 148), N'â•—
    â•‘', LEFT(STUFF(CONCAT(N'    Finish Time: ', @_clock_stop, N'   Duration: ', DATEDIFF(mcs, @_clock_start, @_clock_stop)
    / 1000000.0, N' secs.  ', DATEDIFF(mcs, @_clock_start, @_clock_stop) / 1000.0, N' ms.', REPLICATE(N' ', 100)), 76, 3, N''), 148), N'â•‘
    â•š', REPLICATE(N'â•', 148), N'â•'));
    GO
    -- SET STATISTICS XML OFF;
    GO

    DBCC FLUSHPROCINDB(16) WITH NO_INFOMSGS;
    DBCC DROPCLEANBUFFERS WITH NO_INFOMSGS;
    PRINT(CONCAT(CHAR(13), CHAR(10), N'   wait a moment...', CHAR(13), CHAR(10)));
    WAITFOR DELAY '00:00:01';
    GO
    -- SET STATISTICS XML ON;
    GO
    DECLARE @_clock_start DATETIME2(7) = SYSDATETIME(), @_test_name NVARCHAR(70) = N'';
    PRINT(CONCAT(N'â•”', REPLICATE(N'â•', 148), N'â•—                              
    â•‘', LEFT(CONCAT(N'    Start Time: ', @_clock_start, N'   Test Name: ', ISNULL(NULLIF(@_test_name, N' '), N'Test Query 2'),
    REPLICATE(N' ', 100)), 148), N'â•‘', CHAR(13), CHAR(10), N'â•š', REPLICATE(N'â•', 148), N'â•')); SET STATISTICS IO ON;    
    -- ____________________________________________________________________________________________________________________________________________
    -- ⇩⇩⇩⇩⇩⇩ place tsql here ⇩⇩⇩⇩⇩⇩⇩⇩⇩⇩⇩⇩⇩⇩⇩⇩⇩⇩⇩⇩⇩⇩⇩⇩⇩⇩⇩⇩⇩⇩⇩⇩⇩⇩⇩⇩⇩⇩⇩⇩⇩⇩⇩⇩⇩⇩⇩⇩⇩⇩⇩⇩⇩⇩⇩⇩⇩⇩⇩⇩
    DECLARE @_data_dump_int INT;
    SELECT
        @_data_dump_int = gbd.BusinessDays
    FROM
        dbo.TestData td
        CROSS APPLY dbo.GetBusinessDays(td.beg_dt, td.end_dt) gbd

    -- ⇧⇧⇧⇧⇧⇧ place tsql here ⇧⇧⇧⇧⇧⇧⇧⇧⇧⇧⇧⇧⇧⇧⇧⇧⇧⇧⇧⇧⇧⇧⇧⇧⇧⇧⇧⇧⇧⇧⇧⇧⇧⇧⇧⇧⇧⇧⇧⇧⇧⇧⇧⇧⇧⇧⇧⇧⇧⇧⇧⇧⇧⇧⇧⇧⇧⇧⇧⇧
    -- ‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾
    DECLARE @_clock_stop DATETIME2(7) = SYSDATETIME(); SET STATISTICS IO OFF; PRINT(CONCAT(N'â•”', REPLICATE(N'â•', 148), N'â•—
    â•‘', LEFT(STUFF(CONCAT(N'    Finish Time: ', @_clock_stop, N'   Duration: ', DATEDIFF(mcs, @_clock_start, @_clock_stop)
    / 1000000.0, N' secs.  ', DATEDIFF(mcs, @_clock_start, @_clock_stop) / 1000.0, N' ms.', REPLICATE(N' ', 100)), 76, 3, N''), 148), N'â•‘
    â•š', REPLICATE(N'â•', 148), N'â•'));
    GO
    -- SET STATISTICS XML OFF;
    GO

    Cold cache results...
       wait a moment...
    â•”â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•—
    â•‘    Start Time: 2017-11-10 03:02:23.9570082   Test Name: GetBusinessDays_DCI                     â•‘
    â•šâ•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•
    Table 'Calendar_DTSerial_WorkDayNumber_Begin'. Scan count 0, logical reads 2000000, physical reads 5, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
    Table 'Calendar_DTSerial_WorkDayNumber_End'. Scan count 0, logical reads 2000000, physical reads 5, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
    Table 'TestData'. Scan count 5, logical reads 3645, physical reads 1, read-ahead reads 3593, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
    Table 'Worktable'. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
    â•”â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•—
    â•‘    Finish Time: 2017-11-10 03:02:25.1220728   Duration: 1.165064 secs.  1165.064000 ms.                â•‘
    â•šâ•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•

       wait a moment...
    â•”â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•—                              
    â•‘    Start Time: 2017-11-10 03:02:26.9281669   Test Name: Test Query 2                       â•‘
    â•šâ•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•
    Table 'Calendar_JM'. Scan count 0, logical reads 4000000, physical reads 5, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
    Table 'TestData'. Scan count 5, logical reads 3645, physical reads 1, read-ahead reads 3593, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
    Table 'Worktable'. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
    â•”â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•—
    â•‘    Finish Time: 2017-11-10 03:02:28.2082467   Duration: 1.280080 secs.  1280.080000 ms.                â•‘
    â•šâ•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•

    Warm cache results...
    â•”â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•—
    â•‘    Start Time: 2017-11-10 03:05:26.2064342   Test Name: GetBusinessDays_DCI                     â•‘
    â•šâ•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•
    Table 'Calendar_DTSerial_WorkDayNumber_Begin'. Scan count 0, logical reads 2000000, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
    Table 'Calendar_DTSerial_WorkDayNumber_End'. Scan count 0, logical reads 2000000, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
    Table 'TestData'. Scan count 5, logical reads 3645, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
    Table 'Worktable'. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
    â•”â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•—
    â•‘    Finish Time: 2017-11-10 03:05:26.8604721   Duration: 0.654038 secs.  654.038000 ms.                â•‘
    â•šâ•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•
    â•”â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•—                              
    â•‘    Start Time: 2017-11-10 03:05:27.1734870   Test Name: Test Query 2                       â•‘
    â•šâ•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•
    Table 'Calendar_JM'. Scan count 0, logical reads 4000000, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
    Table 'TestData'. Scan count 5, logical reads 3645, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
    Table 'Worktable'. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
    â•”â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•—
    â•‘    Finish Time: 2017-11-10 03:05:28.0085340   Duration: 0.835047 secs.  835.047000 ms.                â•‘
    â•šâ•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•

  • Jeff Moden - Sunday, October 22, 2017 11:44 AM

    ...

    With that in mind, I'll suggest that the function should be relegated only to single row use.  If you have to resolve Business Days for a whole table, the following approach (which is very similar to the code in the function) runs in about half the time and uses 3 orders of magnitude fewer reads because there is no implied CROSS JOIN despite appearances.

    ...

    Here's how to encapsulate your code into a function which exhibits the same superquick performance characteristics as the original:
    CREATE FUNCTION [dbo].[IF_GetWorkingDays]
            (@beg_dt DATETIME,
      @end_dt DATETIME)
    RETURNS TABLE WITH SCHEMABINDING AS
    RETURN
     SELECT BusinessDays = c2.WorkDayNumber - c1.WorkDayNumber
     FROM dbo.Calendar c1
     CROSS APPLY (
      SELECT WorkDayNumber
      FROM dbo.Calendar 
      WHERE DTSerial = DATEDIFF(dd, 0, @end_dt)
     ) c2 
     WHERE c1.DTSerial = DATEDIFF(dd, 0, @beg_dt)
    Here's usage:

    SELECT  @beg_dt        = td.beg_dt
            ,@end_dt        = td.end_dt
            ,@BusinessDays = x.BusinessDays
    FROM TestData td
    CROSS APPLY [dbo].[IF_GetWorkingDays] (td.beg_dt, td.end_dt) x

    The plan for this is virtually identical to the original query with the two correlated subqueries in the SELECT list.
    Whilst messing around with Jeff's awesome test harness, I noticed a vast improvement when datatypes were exactly matched to [date] throughout, i.e. the start and end dates in the target table were dates, and the DT in the calendar table was date: execution time dropped to about 60ms for the original query (with the two correlated subqueries in the SELECT list).

    β€œWrite the query the simplest way. If through testing it becomes clear that the performance is inadequate, consider alternative query forms.” - Gail Shaw

    For fast, accurate and documented assistance in answering your questions, please read this article.
    Understanding and using APPLY, (I) and (II) Paul White
    Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden

  • Awesome.  I bloody well missed the Implicit Conversion problem, Chris. :blush::blush::blush:  Thanks for the follow up, the function modification, and the retesting, Chris.

    Now... let's all get together and co-author an article on this bad boy. :discuss:

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.
    "Change is inevitable... change for the better is not".

    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)
    Intro to Tally Tables and Functions

Viewing 15 posts - 76 through 90 (of 92 total)

You must be logged in to reply to this topic. Login to reply