Composite indexes; performance

  • Jonathan AC Roberts - Thursday, November 8, 2018 10:36 AM

    Jeff Moden - Thursday, November 8, 2018 10:25 AM

    Jonathan AC Roberts - Thursday, November 8, 2018 10:15 AM

    Jeff Moden - Thursday, November 8, 2018 10:12 AM

    AER - Thursday, November 8, 2018 10:05 AM

    Thanks for all your replies.
    I know that is not the simple question; that's why I was asking help from DBAs.

    I am myself is a production DBA and know what happens in theory but wanted some good practical advice on having 1 or 2 indexes in my case.
    So when I mentioned that both queries are heavily used I really meant it. And when I mentioned the columns they are all used in the order that I placed them in my question.

    I have scripts to identify the needed indexes, but not sure if there are good scripts out there, that would tell you that any particular index degrades the performance.

    That's why I was asking if I need both indexes.

    I kind of agree with John's answer.
    Any additional thoughts? Maybe anybody can point to the good script that will show the indexes that harm the database performance?

    Thank you, again

    Yes... having scripts to identify the needed indexes sometimes don't tell the whole story.  What does the ACTUAL EXCECUTION PLAN state for these two queries insofar as whether or not the indexes are being used effectively (that key lookup thing and seek/scan thing again) and what is the actual performance measurements for reads, CPU, writes, and, most importantly, duration?

    You could run some test calls with the index with columns 1,2 existing and after dropping it.
    Use this at the top of your code:SET STATISTICS IO, TIME ON
    Then look at the results. It may well be that there is not enough of a performance difference to justify two indexes.

    Just don't use SET STATISTICS if any scalar or mTVF functions are involved.  Either will cause SET STATISTICS to report absolute horror.  Please see the following article for demonstrable proof of that.
    http://qa.sqlservercentral.com/articles/T-SQL/91724/

    I'm also not sure how any XML functionality will affect SET STATISTICS because I try to avoid XML like the plague.

    Even if you have a scalar function or mTVF and you are comparing tests both with and without the index you should still see if there is a difference as they both would be slowed by the same amount by the set statistics.

    The trouble is that SET STATISTICS skews things so badly that the differences will usually not be apparent.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.
    "Change is inevitable... change for the better is not".

    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)
    Intro to Tally Tables and Functions

  • In your shoes I'd begin by gathering information about index usage. This should help you do that. Holler if you require assistance in interpreting the results:


    /* Indexing Queries
    Nicked from Glenn Berry with minor mods.
    If you want to study a particular group of tables,
    enter their names in the INSERT statement and set @UseFilter = 1
    */

    SET TRANSACTION ISOLATION LEVEL READ UNCOMMITTED
    -- Use the filter table or not
    DECLARE @UseFilter BIT = 0
    IF OBJECT_ID('tempdb..#TableList') IS NOT NULL DROP TABLE #TableList; CREATE TABLE #TableList (FQN VARCHAR(100));
    INSERT INTO #TableList (FQN) VALUES
     ('[Database].[Schema].[Table1]'),
     ('[Database].[Schema].[Table2]');

    -- (Query 57) (Missing Indexes) Missing Indexes for current database by Index Advantage 
    SELECT DISTINCT
     [FQN] = mid.[statement],
     [Table Rows] = FORMAT(p.rows,'###,###,###'),
     mid.equality_columns, mid.inequality_columns, mid.included_columns,
     migs.user_seeks,
     migs.last_user_seek,
     migs.unique_compiles,
     [index_advantage] = CONVERT(decimal(18,2), user_seeks * avg_total_user_cost * (avg_user_impact * 0.01)),
     migs.avg_total_user_cost,
     migs.avg_user_impact,
     [Table Name] = OBJECT_NAME(mid.[object_id])
    FROM sys.dm_db_missing_index_group_stats AS migs
    INNER JOIN sys.dm_db_missing_index_groups AS mig
     ON migs.group_handle = mig.index_group_handle
    INNER JOIN sys.dm_db_missing_index_details AS mid
     ON mig.index_handle = mid.index_handle
    OUTER APPLY (
     SELECT [rows] = SUM(p.[rows])
     FROM sys.partitions AS p
     WHERE p.[object_id] = mid.[object_id]
    ) p
    WHERE mid.database_id = DB_ID()
     AND (@UseFilter = 0 OR mid.[statement] IN (SELECT FQN FROM #TableList))
    ORDER BY index_advantage DESC
    OPTION (RECOMPILE);

    --- (Query 67) (Overall Index Usage - Writes) Index Read/Write stats (all user tables in current DB) ordered by Writes 
    SELECT *
    FROM (
     SELECT
      x.[FQN],
      [Table Rows] = FORMAT(p.[Table Rows],'###,###,###'),
      [Index Name] = i.[name],
      [FillFactor] = i.fill_factor,
      [Writes] = s.user_updates,
      [Reads] = (s.user_seeks + s.user_scans + s.user_lookups),
      [IndexType] = i.[type_desc], 
      i.filter_definition,
      s.last_user_update
     FROM sys.indexes AS i
     LEFT JOIN sys.dm_db_index_usage_stats AS s
         ON i.[object_id] = s.[object_id]
         AND i.index_id = s.index_id
     OUTER APPLY (
      SELECT SUM(p.rows) AS [Table Rows]
      FROM sys.partitions AS p
      WHERE p.[object_id] = i.[object_id]
     ) p
     CROSS APPLY (
      SELECT [FQN] = '[' + DB_NAME(DB_ID()) + '].[' + OBJECT_SCHEMA_NAME(i.[object_id]) + '].[' + OBJECT_NAME(i.[object_id]) + ']'
     ) x
     WHERE OBJECTPROPERTY(i.[object_id],'IsUserTable') = 1
      AND (@UseFilter = 0 OR x.FQN IN (SELECT FQN FROM #TableList))
    ) d
    ORDER BY [FQN], [Index Name], [Reads]/ISNULL(NULLIF(Writes*1.0,0),1) DESC, Writes -- good ones at the top
    OPTION (RECOMPILE);

    “Write the query the simplest way. If through testing it becomes clear that the performance is inadequate, consider alternative query forms.” - Gail Shaw

    For fast, accurate and documented assistance in answering your questions, please read this article.
    Understanding and using APPLY, (I) and (II) Paul White
    Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden

  • You also want to know what indexes are currently used by those stored procedures. If their plans are cached then you can use this:


    /*
    ChrisM 2013-2016
    Get stored procedure execution stats from sys.dm_exec_procedure_stats,
    augment with useful information. Filters can be applied in two places
    to reduce the size of the result set.
    - see when something was last executed
    - see how often something is executed
    - find out how expensive something is
    - identify the most expensive statements in a stored procedure
    - check the plan for tuning opportunities
    */

    SET TRANSACTION ISOLATION LEVEL READ UNCOMMITTED;
    IF OBJECT_ID('tempdb..#Store') IS NOT NULL DROP TABLE #Store;
    SELECT
     ps.database_id,
     DBName = DB_NAME(ps.database_id),
     x.SPname,
     x.[SQL Statement],
     ps.plan_handle,
     st.creation_time,
     st.last_execution_time,
     st.execution_count,    
     t.AVG_elapsed_time,
     t.AVG_worker_time,
     t.total_elapsed_time,
     t.total_worker_time,
     st.total_physical_reads,
     st.total_logical_reads,
     st.total_logical_writes,
     total_elapsed_sp_time = DATEADD(millisecond,SUM(st.total_elapsed_time) OVER(PARTITION BY x.SPname)/1000,CAST('00:00' AS TIME(3)))
    INTO #Store
    FROM sys.dm_exec_procedure_stats ps
    INNER JOIN sys.dm_exec_query_stats st
           ON st.plan_handle = ps.plan_handle
    CROSS APPLY (
     SELECT
      total_elapsed_time = DATEADD(millisecond,st.total_elapsed_time/1000,CAST('00:00' AS TIME(3))),
      total_worker_time = DATEADD(millisecond,st.total_worker_time/1000,CAST('00:00' AS TIME(3))),
      AVG_elapsed_time = DATEADD(millisecond,(st.total_elapsed_time/st.execution_count)/1000,CAST('00:00' AS TIME(3))),
      AVG_worker_time = DATEADD(millisecond,(st.total_worker_time/st.execution_count)/1000,CAST('00:00' AS TIME(3)))
    ) t
    CROSS APPLY sys.dm_exec_sql_text(st.sql_handle) qt
    CROSS APPLY (
     SELECT
      [SPname] = object_schema_name(ps.object_id, ps.database_id) + N'.' + object_name(ps.object_id, ps.database_id),
      [SQL Statement] = substring(qt.[text],(st.statement_start_offset+2)/2,
                         (CASE WHEN st.statement_end_offset = -1 THEN len(convert(nvarchar(max),qt.[text]))*2
                         ELSE st.statement_end_offset END - st.statement_start_offset) /2 )
    ) x
    WHERE 1 = 1
     --AND ps.database_id IN (8,14,15)
     --AND x.[SQL Statement] LIKE '%switching%'
     --AND st.last_execution_time > '2018-03-20 11:50:00.000'
     --AND (x.SPname LIKE '%RecalculateSpend%')
    OPTION (RECOMPILE);

    WITH xmlnamespaces (default 'http://schemas.microsoft.com/sqlserver/2004/07/showplan')
    SELECT
     s.database_id,
     s.DBName,
     s.SPname,
     s.[SQL Statement],
     s.creation_time,
     s.last_execution_time,
     s.execution_count,    
     s.AVG_elapsed_time,
     s.AVG_worker_time,
     s.total_elapsed_time,
     s.total_worker_time,
     s.total_physical_reads,
     s.total_logical_reads,
     s.total_logical_writes,
     total_elapsed_sp_time,
     qp.query_plan
    FROM #Store s
    OUTER APPLY sys.dm_exec_query_plan(s.plan_handle) AS qp
     --WHERE s.[SQL Statement] LIKE '%partitionbymonthfunction%'
     --WHERE s.last_execution_time > '2018-06-13 01:58:00.000' and s.last_execution_time < '2018-06-13 03:05:00.000'
    ORDER BY total_elapsed_sp_time DESC,
     avg_elapsed_time DESC
    “Write the query the simplest way. If through testing it becomes clear that the performance is inadequate, consider alternative query forms.” - Gail Shaw

    For fast, accurate and documented assistance in answering your questions, please read this article.
    Understanding and using APPLY, (I) and (II) Paul White
    Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden

  • To make the task a bit shorter, you could do the proverbial "Hail Mary" play... disable (not drop) the shorter index and simply see what happens.  As they say, "One good test is worth a thousand expert opinions".  Just make sure that it's not a UNIQUE index, which may have some other tentacles for other things.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.
    "Change is inevitable... change for the better is not".

    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)
    Intro to Tally Tables and Functions

  • Everybody,
    Thanks for your thorough and valid advices.
    Your help is greatly appreciated.
    Alex

  • https://sqlinthewild.co.za/index.php/2009/01/19/index-columns-selectivity-and-equality-predicates/

    Gail Shaw
    Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
    SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

    We walk in the dark places no others will enter
    We stand on the bridge and no one may pass
  • AER - Thursday, November 8, 2018 10:05 AM

    Thanks for all your replies.
    I know that is not the simple question; that's why I was asking help from DBAs.

    I am myself is a production DBA and know what happens in theory but wanted some good practical advice on having 1 or 2 indexes in my case.
    So when I mentioned that both queries are heavily used I really meant it. And when I mentioned the columns they are all used in the order that I placed them in my question.

    I have scripts to identify the needed indexes, but not sure if there are good scripts out there, that would tell you that any particular index degrades the performance.

    That's why I was asking if I need both indexes.

    I kind of agree with John's answer.
    Any additional thoughts? Maybe anybody can point to the good script that will show the indexes that harm the database performance?

    Thank you, again

    I see a lot of great suggestions over here!
    I would also suggest you to use Brent Ozar's sp_BlitzCache https://www.brentozar.com/blitzcache/ 
    it is supplying you heap of useful info including exec plans.
    goodluck

  • Jeff Moden - Thursday, November 8, 2018 10:43 AM

    The trouble is that SET STATISTICS skews things so badly that the differences will usually not be apparent.

    100% agreement with Jeff on this. I only use SET STATISTICS IO when I'm worried about individual object IO behavior. Otherwise, I always use extended events to capture query performance. Based on lots and lots of experimentation, it has the lowest observer overhead (depending on what you're capturing, the system, the transactions, your filtering, etc.).

    ----------------------------------------------------The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood... Theodore RooseveltThe Scary DBAAuthor of: SQL Server 2017 Query Performance Tuning, 5th Edition and SQL Server Execution Plans, 3rd EditionProduct Evangelist for Red Gate Software

  • Grant Fritchey - Monday, November 12, 2018 5:52 AM

    Jeff Moden - Thursday, November 8, 2018 10:43 AM

    The trouble is that SET STATISTICS skews things so badly that the differences will usually not be apparent.

    100% agreement with Jeff on this. I only use SET STATISTICS IO when I'm worried about individual object IO behavior. Otherwise, I always use extended events to capture query performance. Based on lots and lots of experimentation, it has the lowest observer overhead (depending on what you're capturing, the system, the transactions, your filtering, etc.).

    Just to be clear, I've not given up on SET STATISTICS TIME, IO.  It works great (especially for performance comparisons of two or more sets of code to compare) if you don't have the likes of Scalar or mTVF functions in the mix.  There may be other things that also cause such a skew but I've not found them yet.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.
    "Change is inevitable... change for the better is not".

    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)
    Intro to Tally Tables and Functions

Viewing 9 posts - 16 through 23 (of 23 total)

You must be logged in to reply to this topic. Login to reply