Increasing CPUs hurts Perf

  • Eddie Wuerch (7/4/2011)


    A CXPACKET wait is experienced by a thread in a parallel query after it has finished its portion of the work, and is waiting for the other threads to complete theirs.

    With CXPACKET waits as high as you're seeing, I would be concerned that the statistics may be out of date, and the optimizer is splitting the work between the threads improperly. The result is a single thread performing a disproportionate amount of the work, while the other threads are done with theirs, and logging CXPACKET waits.

    You can check the age of statistics on indexes with the STATS_DATE() function:

    SELECT STATS_DATE(object_id, index_id), OBJECT_NAME(object_id), name

    FROM sys.indexes

    Apply filters to limit the results to the big tables that are hit most in those queries to see specifics (WHERE object_id = object_id('dbo.YourTableName').

    -Eddie

    Another [maybe easier] test is to do a update stats with fullscan on all tables and reset the waitstats to see if the waits for that day are better. I'm sure it will help, but since you already tried that option it won't be the end all be all tune up for you.

  • Thanks to both of you, but stats are in good shape, we run a full update weekly on all indexes, plus we ran a midweek one last week as well, but didn't help.

  • repicurus (7/4/2011)


    ...abbreviated...

    I would check to see if HYPER-THREADING (not "Hyper-V") is enabled in the BIOS for these servers.

    If so - disable it.

    ...abbreviated...

    I have disabled Hyperthreading on a server suffering from Parallelism issues reducing the effective CPU count and it actually improved SQL Server performance. That said, this is an Opteron machine so disabling Hyperthreading would not apply here.

    There are no special teachers of virtue, because virtue is taught by the whole community.
    --Plato

  • as2higpark (7/4/2011)


    Thanks to both of you, but stats are in good shape, we run a full update weekly on all indexes, plus we ran a midweek one last week as well, but didn't help.

    Just something else to keep in mind is that stats can become stale in as little as 15 minutes, or even 1 single mass insert / update.

    So once weekly + once in a great while might not be enough for 100% of the stats you have.

    This is a very beta version of a stats analysis script I started working on. It should you the stats along with estimated change %. You can filter based on pretty much everything you want. I'm working on adding datatypes and data distribution to the mix to pick out the best candidates for early stats updates.

    Yes I know about the old objects, but rowcountmdr is not the in the views afaik.

    SELECT

    O.name AS ObjName

    , s.name AS StatsName

    , i.id AS IndexID

    , STATS_DATE(i.id , i.indid) AS StatsDate

    , dtRows.rows

    , CASE WHEN i.rowmodctr > dtRows.rows THEN dtRows.rows ELSE i.rowmodctr END AS RowsModified

    , CONVERT(DECIMAL(18,1), CASE WHEN i.rowmodctr > dtRows.rows THEN dtRows.rows ELSE i.rowmodctr END * 100.0 / dtRows.rows) AS Percentage_RowsModified

    , dtRows.data_pages

    FROM

    sys.sysindexes i

    INNER JOIN sys.objects O

    ON i.id = O.object_id

    INNER JOIN (

    SELECT

    OBJECT_NAME(object_id) AS ObjectName

    , object_id

    , SUM(rows) AS rows

    , SUM(data_pages) data_pages

    FROM

    sys.partitions p

    INNER JOIN sys.allocation_units a

    ON p.partition_id = a.container_id

    WHERE

    index_id IN (0, 1)

    GROUP BY

    object_id

    HAVING

    SUM(rows) > 0

    ) dtRows

    ON O.object_id = dtRows.object_id

    INNER JOIN sys.stats s ON i.id = s.object_id AND i.indid = s.stats_id

    WHERE

    i.indid > 0

    AND i.rowmodctr > 0

    AND O.is_ms_shipped = 0

    AND s.no_recompute = 0

    AND dtRows.data_pages > 1000 --just to flush out the small stuff... making assumption that it may matter less

    --33401

    ORDER BY

    Percentage_RowsModified DESC, dtRows.rows DESC

  • Thanks Ninja, I just added that one to my list of queries to try.

    I ran this on the problem DB and no stats are more than 0.3% modified, so I think I am good for now, but I will check this throughout the day.

  • as2higpark (7/5/2011)


    Thanks Ninja, I just added that one to my list of queries to try.

    I ran this on the problem DB and no stats are more than 0.3% modified, so I think I am good for now, but I will check this throughout the day.

    Well like I said this is like for-thought-pre-beta version.

    The problem is that even 0.3% could be enough if the leading column of the stats is a datetime with a default value of getdate() and that you frequently ask for today's transactions.

    The optimizer will wrongly estimate 0 rows (or verrrrrrrry low number) and plan accordingly. While in fact you may have 1000 of rows which would require a completely different plan. A single regular query could spike the cpus for maybe just half a second more than it needs to. But if that query runs 500 times / hour that could explain all your problems.

    Here's the full story on that one.

    http://sqlinthewild.co.za/index.php/2011/03/22/statistics-row-estimations-and-the-ascending-date-column/

    I'm working with an erp that has over 30K stats and 140K+ columns in the system. So I need to get this done for myself. I'm just not finished with this project!

  • Thanks Ninja,

    I'll try to keep and eye on the tables that have been causing the most problems.

  • as2higpark (7/5/2011)


    Thanks Ninja, I just added that one to my list of queries to try.

    I ran this on the problem DB and no stats are more than 0.3% modified, so I think I am good for now, but I will check this throughout the day.

    0.3% of what 1M rows?? >>> that would mean 3K new rows. If the server expects only 0 to 100 rows but finds 3000, that's enough to completely screw up the plan.

    P.S. The server will kick in auto-updatestats only at 200K... which is way too late.

  • Ok, so I have been monitoring this for about 90 min and am still seeing the same issues. Performance has not moved, still about 200% higher than on the previous machine and am showing around 73% of WAIT_types are CXPACKET.

    I have set the Cost Threshold for Parallelism to 40, but it hasn't seemed to have any effect. I am wondering if I might be hurt by the number of adhoc queries that I am seeing. Below is the breakdown of my plan cache.

    Cache Store TypeTotal Num Of PlansTotal SizeAve Use CountUse count = 1(size)Use count = 1

    Adhoc 7388674.58368652253.1147154334

    Prepared 101781548.3671878326.79687591

    Trigger 2928.0468754911.0390624

    Proc 10926.24218764580.4218754

    Thanks for any suggestions.

  • From Jonathan's blog (link posted way earlier in the thread).

    What does this return?

    SET TRANSACTION ISOLATION LEVEL READ UNCOMMITTED;

    WITH XMLNAMESPACES

    (DEFAULT 'http://schemas.microsoft.com/sqlserver/2004/07/showplan')

    SELECT

    query_plan AS CompleteQueryPlan,

    n.value('(@StatementText)[1]', 'VARCHAR(4000)') AS StatementText,

    n.value('(@StatementOptmLevel)[1]', 'VARCHAR(25)') AS StatementOptimizationLevel,

    n.value('(@StatementSubTreeCost)[1]', 'VARCHAR(128)') AS StatementSubTreeCost,

    n.query('.') AS ParallelSubTreeXML,

    ecp.usecounts,

    ecp.size_in_bytes

    FROM sys.dm_exec_cached_plans AS ecp

    CROSS APPLY sys.dm_exec_query_plan(plan_handle) AS eqp

    CROSS APPLY query_plan.nodes('/ShowPlanXML/BatchSequence/Batch/Statements/StmtSimple') AS qn(n)

    WHERE n.query('.').exist('//RelOp[@PhysicalOp="Parallelism"]') = 1

    Have you run the traces to track the high use queries by cpu? Could be missing indexes or just flat out bad plans as well...

  • P.S. I'm toying with this in my system as well and I've found out that the % are very frale right after reseting the stats to 0.

    I just had to run 1-2 fairly small reports and the cpackets went from 0 to 80 % on no time flat. I'm personally giving this 1 day before reconsidering.

    I also had another project on hold that calculates the cpu time to generate the execution plan. AFAIK and please correct me if I'm wrong, but that process is single threaded so if you do that very often in the day and you went from 3 ghz to 2.x then you possibily shot yourself pretty hard in the foot (but it shouldn't explain 200% difference al by itself).

    If I can ever get that to you (days / weeks from now), be warning that that trace is really expansive to run (requires to track actual stats + actual plan), so I'm not even sure you could really use it if the cpus are already pegged.

  • So, forgive my ignorance, but that query returns 750 rows. The overwhelming vast majority(>90%) are the same query that also shows as being the highest CPU offender. It is shown below(edits made to remove specificity), but I am stuck with this one for now at least, because it is buried in our source code that cannot be edited quickly.

    Any ideas on how to deal with this offender or am I left to try to convince the developers to change it?

    select tb1.uwHID as col_0_0_

    from TABLE1 tb1

    left outer join TABLE2 tb2

    on tb1.uwHID = tb2.uwHID

    where (tb1.uwHID=49845

    or tb2.customText2='6487114')

    and tb1.isDeleted=0

  • as2higpark (7/5/2011)


    So, forgive my ignorance, but that query returns 750 rows. The overwhelming vast majority(>90%) are the same query that also shows as being the highest CPU offender. It is shown below(edits made to remove specificity), but I am stuck with this one for now at least, because it is buried in our source code that cannot be edited quickly.

    Any ideas on how to deal with this offender or am I left to try to convince the developers to change it?

    select tb1.uwHID as col_0_0_

    from TABLE1 tb1

    left outer join TABLE2 tb2

    on tb1.uwHID = tb2.uwHID

    where (tb1.uwHID=49845

    or tb2.customText2='6487114')

    and tb1.isDeleted=0

    Please post actual execution plan along with the keys and indexes for all tables involved (in case those are views).

    We really need the actual plan, not estimated.

    You can attach it as a file.

    TIA.

  • Thanks Ninja.

    attached as a txt. (small changes made to DB, schema and tables names for anonymity.) If that doesn't work, I can send directly.

  • This is an estimated plan. I need the REAL plan. That's where all the nuggest are found.

    I also need the index definitions to really help out.

    I understand you want to obfuscate data, the meta data has little real value in the world (unless you invented the formula for cold fusion and you put in the column names).

    I understand if you want / have to obfuscate the column names but I need the real plan (run the query from SSMS with actual plan, then save that plan as sqlplan and upload it).

Viewing 15 posts - 31 through 45 (of 94 total)

You must be logged in to reply to this topic. Login to reply