Execution plan behaving strangely

  • Been working on a problem for the last 2 weeks where an about twice a day one page in our application which usually takes about half a second can take up to 6 seconds. It's a .Net web app running connection pooling with SQL server authenication.

    Been watching syscacheobjects and the query has a single execution plan with the usecount increasing with each call with about 900 calls per day. The usecount does not reset when I execute the stored procedure using with recompile. Not sure thats right but that may be a clue.

    By using SET STATISTICS IO On, I see something strange. When running normally the results look like this:

    Table 'SystemDefault'. Scan count 1, logical reads 2, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

    Table 'Workflow'. Scan count 1, logical reads 2, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

    Table 'Person'. Scan count 1, logical reads 3, physical reads 1, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

    Table 'Message'. Scan count 5, logical reads 29, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

    Table 'Worktable'. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

    Table 'Worktable'. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

    Table 'WorkflowStep'. Scan count 0, logical reads 124, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

    Table 'WorkflowStepRole'. Scan count 2, logical reads 4, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

    Table 'Order'. Scan count 2, logical reads 2058, physical reads 0, read-ahead reads 57, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

    Table 'Request'. Scan count 2, logical reads 1663, physical reads 0, read-ahead reads 184, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

    Table 'Organization'. Scan count 2, logical reads 47, physical reads 0, read-ahead reads 44, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

    Table 'Attachment'. Scan count 1, logical reads 5, physical reads 0, read-ahead reads 485, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

    Table 'Service'. Scan count 4, logical reads 8, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

    When the query is misbehaving the results look like this:

    Table 'SystemDefault'. Scan count 1, logical reads 2, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

    Table 'Workflow'. Scan count 1, logical reads 2, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

    Table 'Person'. Scan count 1, logical reads 3, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

    Table 'Message'. Scan count 5, logical reads 17, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

    Table 'Worktable'. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

    Table 'Service'. Scan count 10, logical reads 20, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

    Table 'Attachment'. Scan count 5, logical reads 25, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

    Table 'Organization'. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

    Table 'WorkflowStep'. Scan count 0, logical reads 188220, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

    Table 'WorkflowStepRole'. Scan count 295732, logical reads 1182928, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

    Table 'Order'. Scan count 73332, logical reads 1034504, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

    Table 'Request'. Scan count 2, logical reads 110370, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

    The scan count and logical reads go crazy on the last 4 tables. Also seen it go crazy on just the workflowstep table and normal on Order and Request.

    Workflowstep is a table with 31 row and average size of 234 bytes. Can you have scan count of 0 and logical reads of 188220 ?

    WorkflowStepRole is a table with 279 rows, To scan that table end to end it's 4 to 5 reads which is consistent with the results when the query is behaving and the size of the table/clustered index reported by sql. The numbers above appear to suggest it scan the 4 pages 295732 times.

    The order table has 149100 rows, average row size of 604 and uses 11784 pages. Request table has 36669 rows, average row size of 308 and uses 1528 pages.

    When things are going well profiler shows about 600cpu, 4000 reads and 700ms duration. When things are going bad, profiler shows 8300 cpu, 2,510,000 reads and 8000ms duration.

    The execution plan is not changing as the single row in syscacheobjects and syscacheobjects.usecount is showing the same execution plan being used. The issue appears to be in the execution plan.

    Once it starts it does not clear itself until I do something. The problem can be cleared by using DBCC FREEPROCCACHE or by updating statistics on one or more tables. DBCC FREEPROCCACHE Will obviously force the recompile but updating statistics does not seem to alter the syscacheobjects.usecount. Executing the stored procedure WITH RECOMPILE does not seem to alter the syscacheobjects.usecount either. I would expect to reset the usecount. Perhaps it works out the plan and decides its the same as the one already there so it does not update it.

    issue does not appear at any specific time of day so I don't think it relates to busy server although I usually find it first thing in the morning where the server has been idle for some time. The statistics are updated every night at 1am and can come on the problem an hour later. Auto Create and Auto Update Statistics are turned on.

    I'm not able to recreate the problem but I've had profiler running and seen where at 2:03:03 the query ran perfectly and at 2:03:17 the query went south. I've reviewed every event log and sql log I could find and there is nothing crazy going on. The server is often practically idle when it happens although it will happen during busy times also.

    SQL 2005 64 bit running on Windows 2003, 4 processors, 12 GB RAM. DB was originally a SQL2000 db and moved to SQL2005, running happy for several months with this issue arrising only recently. Seens to be only this one stored procedure although it could be others. This is one of the most frequently used queries and also the one that has to work the hardest.

    Anyone seen this type of activity before? Any suggestions on where to look next?

     

  • does the stored procedure have parameters? Maybe its using an innapropriate (cached) execution plan for specific parameters. e.g. wider date range?

    How about using a bigger sample when gathering statistics?

  • There procedure has parameters and I've made a few changes trying to split it out into separate queries but this is one of those catchall type queries which allows the user to find rows based on a combination of fields. There is a single execution plan created and this plan could work perfectly 500 times an then the same plan (does not recompile) running the same paramater values goes south and never recovers.

    I'll try a full scan statistics update. Can't hurt. Thanks for the feedback.

Viewing 3 posts - 1 through 2 (of 2 total)

You must be logged in to reply to this topic. Login to reply