proc long running after plan cache

  • First off. What triggers a plan re-cache? I always thought it was simply a timing thing. Say, if it hasn't been called in N number of minutes or hasn’t been cached in N number of days.

    The reason I ask is because I have a proc that calls 10 other procs. None of which have input params. I'm trying to analyze why every now and again it takes ten times longer than it should. I'm storing off every statistic under the sun for all stored procedure calls over time so I can see when the issue occurs. Just don’t know why yet. Today I happened to notice that these long run time happen to be right after the procs exe plan is cached. I looked around and most sites say I should use recompile. I don’t think that will solve my issue. I suspect that would make every run longer

    I'm going to attempt to attach the stats related to the giving proc in question. Each record represents one of the long running examples of the proc running long. You will notice the last_elapsed_time is greater than 900000000 microseconds. A normal run is around 200000000 microseconds.

  • Plans can be aged out of cache, how long that takes is dependant on memory pressure, among other things. Plans also can be removed due to data changes on tables they refer to (via stats updates), or table definition changes, procedure definition changes, index changes, some server config changes, database restores.

    That said, a compile is not likely to cause a 10x increase in duration, compiles are usually fast. Could be data caching too (every entry in your spreadsheet has physical reads > 0, but I have nothing to compare it to). Could also be an inappropriate plan, bad parameter sniffing, that kind of thing, can't tell without the plans and/or the procedure's definition.

    If you have access to Pluralsight (or if you don't, are willing to take a trial membership), I have a course on bad parameter sniffing that may help you with that. Alternately, if you're happy posting more information here, we may be able to help pinpoint the problem.

    Gail Shaw
    Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
    SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

    We walk in the dark places no others will enter
    We stand on the bridge and no one may pass
  • Attaching last 1000 calls good and bad. I highlighted a few that cause a timeout who’s last elapsed time was over 900000000. I don’t see anything that stands out.

    I thought param sniffing was from local variables or input params. These procs have neither. This stored proc calls 8 other stored procs. Each of them is doing an insert then update to a table. In the interim I’m going to write off the records affected by each update and insert to see if I can find a pattern. I would have thought I would have seen a spike in reads or writes if that was the issues thought

  • Yeah, parameter sniffing is parameters or variables. Bad plans is still possible, though less likely.

    I suspect you're going to have to look at the plans, see if they're different at different times, look at the waits the queries incur, you might be getting lock waits, you might be getting IO waits, etc.

    There's not enough information in the query stats DMV do make a diagnosis, you're going to need to drill in in more detail. Especially since the ones you highlighted as timing out had lower than average CPU time and lower than average logical and physical reads.

    Gail Shaw
    Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
    SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

    We walk in the dark places no others will enter
    We stand on the bridge and no one may pass
  • That was my though so I moved the data and log file on their own drives. I had several other dbs that were on the same data and log drive as this one that were pegging the io at times. Still getting the same timeouts. The db in question is sales force data. Physical drives by the way.

    I'm storing the sp whoisactive stats and not seeing any blocking. I think I may be storing wait stats over time. I need to check

  • Changing cost threshold for parallelism could also invalidate plans.

    Let us see the code and/or plan.

    If anything, I am guessing the query may not be optimum.

    I have seen option(recompile) work wonders as it resolves and simplifies the query logic based on the parameters provided.

    Since your query takes ages to execute and executes once an hour, an option(recompile) may not be the worst thing to try.

    Note, this is not sp_recompile, it is an option put at the end of the query.

    Many physical reads per execution, which could mean that the high duration is due to having to hit disk.

    Probably bad advice, but the best I can do with extremely limited flying blind information.

    Catch-all queries done right [/url]
    Gail Shaw's Performance Blog[/url]

Viewing 6 posts - 1 through 5 (of 5 total)

You must be logged in to reply to this topic. Login to reply