Determining cost threshold for parallelism

  • I have this query to find the cost of plans that use parallelism, but not sure how to proceed as far as setting the cost threshold?? QA and prod are still at the default of 5, but I find 984 plans in production with cost related to parallelism greater than 50.

    SET TRANSACTION ISOLATION LEVEL READ UNCOMMITTED;

    WITH XMLNAMESPACES

    (DEFAULT 'http://schemas.microsoft.com/sqlserver/2004/07/showplan')

    SELECT top (150)

    query_plan AS CompleteQueryPlan,

    n.value('(@StatementText)[1]', 'VARCHAR(4000)') AS StatementText,

    n.value('(@StatementOptmLevel)[1]', 'VARCHAR(25)') AS StatementOptimizationLevel,

    n.value('(@StatementSubTreeCost)[1]', 'VARCHAR(128)') AS StatementSubTreeCost,

    n.query('.') AS ParallelSubTreeXML,

    ecp.usecounts,

    ecp.size_in_bytes

    FROM sys.dm_exec_cached_plans AS ecp

    CROSS APPLY sys.dm_exec_query_plan(plan_handle) AS eqp

    CROSS APPLY query_plan.nodes('/ShowPlanXML/BatchSequence/Batch/Statements/StmtSimple') AS qn(n)

    WHERE n.query('.').exist('//RelOp[@PhysicalOp="Parallelism"]') = 1

    and cast(n.value('(@StatementSubTreeCost)[1]', 'VARCHAR(128)') as float)>50

    order by cast(n.value('(@StatementSubTreeCost)[1]', 'VARCHAR(128)') as float) desc

  • The default of 5 for cost threshold for parallelism is almost universally too low. I don't know of a formula for setting a higher value, but I know pretty much every machine in existence needs to adjust it. Same for MAXDOP's default of 0. Quite a few bad defaults in a SQL Server instance, come to think of it!! :w00t:

    I usually go with 15 for an OLTP box and 40 for an OLAP box for CTFP. Physical core count on each NUMA node as a max for MAXDOP. You can override the latter with a query option, which is nice when you want a particular query to run faster (or SLOWER!!) than it otherwise might given the current configuration.

    Best,
    Kevin G. Boles
    SQL Server Consultant
    SQL MVP 2007-2012
    TheSQLGuru on googles mail service

  • Thanks Guru. The trick is "proving" to management that we need to increase the value, AND that it won't hurt anything. I have a number of QA environments, just not sure about the proving part. ( maxdop is already at 8 ) Also the prod load is a mix of oltp / olap (report type queries ) if I had to guess I'd say 75% OLTP

    This is our production setup: 4 numa nodes

    number_of_physical_cpus - 4number_of_cores_per_cpu-12 total_number_of_cores- 48 number_of_virtual_cpus 64 cpu_category x64

    QA: 281632x64

    TheSQLGuru (4/12/2016)


    The default of 5 for cost threshold for parallelism is almost universally too low. I don't know of a formula for setting a higher value, but I know pretty much every machine in existence needs to adjust it. Same for MAXDOP's default of 0. Quite a few bad defaults in a SQL Server instance, come to think of it!! :w00t:

    I usually go with 15 for an OLTP box and 40 for an OLAP box for CTFP. Physical core count on each NUMA node as a max for MAXDOP. You can override the latter with a query option, which is nice when you want a particular query to run faster (or SLOWER!!) than it otherwise might given the current configuration.

  • TheSQLGuru (4/12/2016)


    The default of 5 for cost threshold for parallelism is almost universally too low. I don't know of a formula for setting a higher value, but I know pretty much every machine in existence needs to adjust it. Same for MAXDOP's default of 0. Quite a few bad defaults in a SQL Server instance, come to think of it!! :w00t:

    I usually go with 15 for an OLTP box and 40 for an OLAP box for CTFP. Physical core count on each NUMA node as a max for MAXDOP. You can override the latter with a query option, which is nice when you want a particular query to run faster (or SLOWER!!) than it otherwise might given the current configuration.

    Agreed that 5 is very low, find 15 quite low to for OLTP, how did you come to that conclusion? Normally would have OLTP boxes higher than OLAP given the OLAP results are cached. Also the difference between VMs and Tins is something one must factor into the equation.

    😎

  • We don't run production on VMs ( yet ). The query came from here:

    [/url]

  • Eirikur Eiriksson (4/12/2016)


    TheSQLGuru (4/12/2016)


    The default of 5 for cost threshold for parallelism is almost universally too low. I don't know of a formula for setting a higher value, but I know pretty much every machine in existence needs to adjust it. Same for MAXDOP's default of 0. Quite a few bad defaults in a SQL Server instance, come to think of it!! :w00t:

    I usually go with 15 for an OLTP box and 40 for an OLAP box for CTFP. Physical core count on each NUMA node as a max for MAXDOP. You can override the latter with a query option, which is nice when you want a particular query to run faster (or SLOWER!!) than it otherwise might given the current configuration.

    Agreed that 5 is very low, find 15 quite low to for OLTP, how did you come to that conclusion? Normally would have OLTP boxes higher than OLAP given the OLAP results are cached. Also the difference between VMs and Tins is something one must factor into the equation.

    😎

    I almost never want my OLAP queries cached if that is what you are saying.

    And I rarely have a client with a properly indexed (or coded) app, so some of those OLTP queries really do need parallelism and also pretty much everyone runs a "mixed-mode" server where they do reports on same system (usually same database #FACEPALM). 🙂

    So no magic formula - just a number I pulled out of my butt. 😀 I do monitor for (and teach my clients how to monitor for) issues with too much or too little parallelism regardless of their settings.

    Best,
    Kevin G. Boles
    SQL Server Consultant
    SQL MVP 2007-2012
    TheSQLGuru on googles mail service

  • Sounds like, based on our hardware and oltp/reporting mix in the same database, I can just try CTFP 25 in QA and see how it goes. Of course, QA never gets anywhere the kind of load prod does.

    I'll start by getting a count of plans using parallelism, with cost above 50 and usecount <30

    SET TRANSACTION ISOLATION LEVEL READ UNCOMMITTED;

    WITH XMLNAMESPACES

    (DEFAULT 'http://schemas.microsoft.com/sqlserver/2004/07/showplan')

    SELECT count(*)

    FROM sys.dm_exec_cached_plans AS ecp

    CROSS APPLY sys.dm_exec_query_plan(plan_handle) AS eqp

    CROSS APPLY query_plan.nodes('/ShowPlanXML/BatchSequence/Batch/Statements/StmtSimple') AS qn(n)

    WHERE n.query('.').exist('//RelOp[@PhysicalOp="Parallelism"]') = 1

    and cast(n.value('(@StatementSubTreeCost)[1]', 'VARCHAR(128)') as float)>50

    and ecp.usecounts<30

    SET TRANSACTION ISOLATION LEVEL READ UNCOMMITTED;

    WITH XMLNAMESPACES

    (DEFAULT 'http://schemas.microsoft.com/sqlserver/2004/07/showplan')

    SELECT

    query_plan AS CompleteQueryPlan,

    n.value('(@StatementText)[1]', 'VARCHAR(4000)') AS StatementText,

    n.value('(@StatementOptmLevel)[1]', 'VARCHAR(25)') AS StatementOptimizationLevel,

    n.value('(@StatementSubTreeCost)[1]', 'VARCHAR(128)') AS StatementSubTreeCost,

    n.query('.') AS ParallelSubTreeXML,

    ecp.usecounts,

    ecp.size_in_bytes

    FROM sys.dm_exec_cached_plans AS ecp

    CROSS APPLY sys.dm_exec_query_plan(plan_handle) AS eqp

    CROSS APPLY query_plan.nodes('/ShowPlanXML/BatchSequence/Batch/Statements/StmtSimple') AS qn(n)

    WHERE n.query('.').exist('//RelOp[@PhysicalOp="Parallelism"]') = 1

    and cast(n.value('(@StatementSubTreeCost)[1]', 'VARCHAR(128)') as float)>50

    and ecp.usecounts<30

    order by cast(n.value('(@StatementSubTreeCost)[1]', 'VARCHAR(128)') as float) desc

  • To prove the need to raise cost threshold for parallelism ( CTP ), I took a query from the prod plan cache with a cost of 35. It has been using the maxdop of 8. 187 records retrieved in about a second.

    With the query hint option ( maxdop 1 ), it takes about the same amount of time and also shows StatementSubTreeCost="35.122"

    DegreeOfParallelism="0" NonParallelPlanReason="MaxDOPSetToOne"

    So this doesn't appear to be proving any need to raise CTP ??

    I can try some with a cost of ~10 I suppose. With cxpacket wait at 50% of waits, I thought we needed adjustments, but I realize not all cxpacket wait is bad.

  • I did a post on this a couple of years ago here. It really is a good guideline, I have found.

    Jared
    CE - Microsoft

Viewing 9 posts - 1 through 8 (of 8 total)

You must be logged in to reply to this topic. Login to reply