Composite indexes; performance

Question

Composite indexes; performance

Viewing 9 posts - 16 through 23 (of 23 total)

You must be logged in to reply to this topic. Login to reply

Jeff Moden SSC Guru Points: 1000539 More actions · Answer 1

Jonathan AC Roberts - Thursday, November 8, 2018 10:36 AM
Jeff Moden - Thursday, November 8, 2018 10:25 AM
Jonathan AC Roberts - Thursday, November 8, 2018 10:15 AM
Jeff Moden - Thursday, November 8, 2018 10:12 AM
AER - Thursday, November 8, 2018 10:05 AM
Thanks for all your replies.
I know that is not the simple question; that's why I was asking help from DBAs.
I am myself is a production DBA and know what happens in theory but wanted some good practical advice on having 1 or 2 indexes in my case.
So when I mentioned that both queries are heavily used I really meant it. And when I mentioned the columns they are all used in the order that I placed them in my question.
I have scripts to identify the needed indexes, but not sure if there are good scripts out there, that would tell you that any particular index degrades the performance.
That's why I was asking if I need both indexes.
I kind of agree with John's answer.
Any additional thoughts? Maybe anybody can point to the good script that will show the indexes that harm the database performance?
Thank you, again
Yes... having scripts to identify the needed indexes sometimes don't tell the whole story. What does the ACTUAL EXCECUTION PLAN state for these two queries insofar as whether or not the indexes are being used effectively (that key lookup thing and seek/scan thing again) and what is the actual performance measurements for reads, CPU, writes, and, most importantly, duration?
You could run some test calls with the index with columns 1,2 existing and after dropping it.
Use this at the top of your code:SET STATISTICS IO, TIME ON
Then look at the results. It may well be that there is not enough of a performance difference to justify two indexes.
Just don't use SET STATISTICS if any scalar or mTVF functions are involved. Either will cause SET STATISTICS to report absolute horror. Please see the following article for demonstrable proof of that.
http://qa.sqlservercentral.com/articles/T-SQL/91724/
I'm also not sure how any XML functionality will affect SET STATISTICS because I try to avoid XML like the plague.
Even if you have a scalar function or mTVF and you are comparing tests both with and without the index you should still see if there is a difference as they both would be slowed by the same amount by the set statistics.

The trouble is that SET STATISTICS skews things so badly that the differences will usually not be apparent.

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.
"Change is inevitable... change for the better is not".

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)
Intro to Tally Tables and Functions

ChrisM@Work SSC Guru Points: 186127 More actions · Answer 2

In your shoes I'd begin by gathering information about index usage. This should help you do that. Holler if you require assistance in interpreting the results:

/* Indexing Queries Nicked from Glenn Berry with minor mods. If you want to study a particular group of tables, enter their names in the INSERT statement and set @UseFilter = 1 */

SET TRANSACTION ISOLATION LEVEL READ UNCOMMITTED

-- Use the filter table or not
DECLARE @UseFilter BIT = 0

IF OBJECT_ID('tempdb..#TableList') IS NOT NULL DROP TABLE #TableList; CREATE TABLE #TableList (FQN VARCHAR(100));
INSERT INTO #TableList (FQN) VALUES
('[Database].[Schema].[Table1]'),
('[Database].[Schema].[Table2]');

-- (Query 57) (Missing Indexes) Missing Indexes for current database by Index Advantage
SELECT DISTINCT
[FQN] = mid.[statement],
[Table Rows] = FORMAT(p.rows,'###,###,###'),
mid.equality_columns, mid.inequality_columns, mid.included_columns,
migs.user_seeks,
migs.last_user_seek,
migs.unique_compiles,
[index_advantage] = CONVERT(decimal(18,2), user_seeks * avg_total_user_cost * (avg_user_impact * 0.01)),
migs.avg_total_user_cost,
migs.avg_user_impact,
[Table Name] = OBJECT_NAME(mid.[object_id])
FROM sys.dm_db_missing_index_group_stats AS migs
INNER JOIN sys.dm_db_missing_index_groups AS mig
ON migs.group_handle = mig.index_group_handle
INNER JOIN sys.dm_db_missing_index_details AS mid
ON mig.index_handle = mid.index_handle
OUTER APPLY (
SELECT [rows] = SUM(p.[rows])
FROM sys.partitions AS p
WHERE p.[object_id] = mid.[object_id]
) p
WHERE mid.database_id = DB_ID()
AND (@UseFilter = 0 OR mid.[statement] IN (SELECT FQN FROM #TableList))
ORDER BY index_advantage DESC
OPTION (RECOMPILE);

--- (Query 67) (Overall Index Usage - Writes) Index Read/Write stats (all user tables in current DB) ordered by Writes
SELECT *
FROM (
SELECT
  x.[FQN],
  [Table Rows] = FORMAT(p.[Table Rows],'###,###,###'),
  [Index Name] = i.[name],
  [FillFactor] = i.fill_factor,
  [Writes] = s.user_updates,
  [Reads] = (s.user_seeks + s.user_scans + s.user_lookups),
  [IndexType] = i.[type_desc],
  i.filter_definition,
  s.last_user_update
FROM sys.indexes AS i
LEFT JOIN sys.dm_db_index_usage_stats AS s
     ON i.[object_id] = s.[object_id]
     AND i.index_id = s.index_id
OUTER APPLY (
  SELECT SUM(p.rows) AS [Table Rows]
  FROM sys.partitions AS p
  WHERE p.[object_id] = i.[object_id]
) p
CROSS APPLY (
  SELECT [FQN] = '[' + DB_NAME(DB_ID()) + '].[' + OBJECT_SCHEMA_NAME(i.[object_id]) + '].[' + OBJECT_NAME(i.[object_id]) + ']'
) x
WHERE OBJECTPROPERTY(i.[object_id],'IsUserTable') = 1
  AND (@UseFilter = 0 OR x.FQN IN (SELECT FQN FROM #TableList))
) d
ORDER BY [FQN], [Index Name], [Reads]/ISNULL(NULLIF(Writes*1.0,0),1) DESC, Writes -- good ones at the top
OPTION (RECOMPILE);

^{“Write the query the simplest way. If through testing it becomes clear that the performance is inadequate, consider alternative query forms.” - Gail Shaw}

For fast, accurate and documented assistance in answering your questions, please read this article.
Understanding and using APPLY, (I) and (II) Paul White
Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden

ChrisM@Work SSC Guru Points: 186127 More actions · Answer 3

You also want to know what indexes are currently used by those stored procedures. If their plans are cached then you can use this:

/* ChrisM 2013-2016 Get stored procedure execution stats from sys.dm_exec_procedure_stats, augment with useful information. Filters can be applied in two places to reduce the size of the result set. - see when something was last executed - see how often something is executed - find out how expensive something is - identify the most expensive statements in a stored procedure - check the plan for tuning opportunities */

SET TRANSACTION ISOLATION LEVEL READ UNCOMMITTED;

IF OBJECT_ID('tempdb..#Store') IS NOT NULL DROP TABLE #Store;
SELECT
ps.database_id,
DBName = DB_NAME(ps.database_id),
x.SPname,
x.[SQL Statement],
ps.plan_handle,
st.creation_time,
st.last_execution_time,
st.execution_count,
t.AVG_elapsed_time,
t.AVG_worker_time,
t.total_elapsed_time,
t.total_worker_time,
st.total_physical_reads,
st.total_logical_reads,
st.total_logical_writes,
total_elapsed_sp_time = DATEADD(millisecond,SUM(st.total_elapsed_time) OVER(PARTITION BY x.SPname)/1000,CAST('00:00' AS TIME(3)))
INTO #Store
FROM sys.dm_exec_procedure_stats ps
INNER JOIN sys.dm_exec_query_stats st
       ON st.plan_handle = ps.plan_handle
CROSS APPLY (
SELECT
  total_elapsed_time = DATEADD(millisecond,st.total_elapsed_time/1000,CAST('00:00' AS TIME(3))),
  total_worker_time = DATEADD(millisecond,st.total_worker_time/1000,CAST('00:00' AS TIME(3))),
  AVG_elapsed_time = DATEADD(millisecond,(st.total_elapsed_time/st.execution_count)/1000,CAST('00:00' AS TIME(3))),
  AVG_worker_time = DATEADD(millisecond,(st.total_worker_time/st.execution_count)/1000,CAST('00:00' AS TIME(3)))
) t
CROSS APPLY sys.dm_exec_sql_text(st.sql_handle) qt
CROSS APPLY (
SELECT
  [SPname] = object_schema_name(ps.object_id, ps.database_id) + N'.' + object_name(ps.object_id, ps.database_id),
  [SQL Statement] = substring(qt.[text],(st.statement_start_offset+2)/2,
                     (CASE WHEN st.statement_end_offset = -1 THEN len(convert(nvarchar(max),qt.[text]))*2
                     ELSE st.statement_end_offset END - st.statement_start_offset) /2 )
) x
WHERE 1 = 1
--AND ps.database_id IN (8,14,15)
--AND x.[SQL Statement] LIKE '%switching%'
--AND st.last_execution_time > '2018-03-20 11:50:00.000'
--AND (x.SPname LIKE '%RecalculateSpend%')
OPTION (RECOMPILE);

WITH xmlnamespaces (default 'http://schemas.microsoft.com/sqlserver/2004/07/showplan')
SELECT
s.database_id,
s.DBName,
s.SPname,
s.[SQL Statement],
s.creation_time,
s.last_execution_time,
s.execution_count,
s.AVG_elapsed_time,
s.AVG_worker_time,
s.total_elapsed_time,
s.total_worker_time,
s.total_physical_reads,
s.total_logical_reads,
s.total_logical_writes,
total_elapsed_sp_time,
qp.query_plan
FROM #Store s
OUTER APPLY sys.dm_exec_query_plan(s.plan_handle) AS qp
--WHERE s.[SQL Statement] LIKE '%partitionbymonthfunction%'
--WHERE s.last_execution_time > '2018-06-13 01:58:00.000' and s.last_execution_time < '2018-06-13 03:05:00.000'
ORDER BY total_elapsed_sp_time DESC,
avg_elapsed_time DESC

^{“Write the query the simplest way. If through testing it becomes clear that the performance is inadequate, consider alternative query forms.” - Gail Shaw}

For fast, accurate and documented assistance in answering your questions, please read this article.
Understanding and using APPLY, (I) and (II) Paul White
Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden

Jeff Moden SSC Guru Points: 1000539 More actions · Answer 4

To make the task a bit shorter, you could do the proverbial "Hail Mary" play... disable (not drop) the shorter index and simply see what happens. As they say, "One good test is worth a thousand expert opinions". Just make sure that it's not a UNIQUE index, which may have some other tentacles for other things.

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.
"Change is inevitable... change for the better is not".

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)
Intro to Tally Tables and Functions

AER SSCrazy Points: 2700 More actions · Answer 5

Everybody,
Thanks for your thorough and valid advices.
Your help is greatly appreciated.
Alex

Gail Shaw SSC Guru Points: 1004494 More actions · Answer 6

https://sqlinthewild.co.za/index.php/2009/01/19/index-columns-selectivity-and-equality-predicates/

Gail Shaw
Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

We walk in the dark places no others will enter
We stand on the bridge and no one may pass

1974lg Old Hand Points: 345 More actions · Answer 7

AER - Thursday, November 8, 2018 10:05 AM
Thanks for all your replies.
I know that is not the simple question; that's why I was asking help from DBAs.
I am myself is a production DBA and know what happens in theory but wanted some good practical advice on having 1 or 2 indexes in my case.
So when I mentioned that both queries are heavily used I really meant it. And when I mentioned the columns they are all used in the order that I placed them in my question.
I have scripts to identify the needed indexes, but not sure if there are good scripts out there, that would tell you that any particular index degrades the performance.
That's why I was asking if I need both indexes.
I kind of agree with John's answer.
Any additional thoughts? Maybe anybody can point to the good script that will show the indexes that harm the database performance?
Thank you, again

I see a lot of great suggestions over here!
I would also suggest you to use Brent Ozar's sp_BlitzCache https://www.brentozar.com/blitzcache/
it is supplying you heap of useful info including exec plans.
goodluck

Grant Fritchey SSC Guru Points: 397721 More actions · Answer 8

Jeff Moden - Thursday, November 8, 2018 10:43 AM
The trouble is that SET STATISTICS skews things so badly that the differences will usually not be apparent.

100% agreement with Jeff on this. I only use SET STATISTICS IO when I'm worried about individual object IO behavior. Otherwise, I always use extended events to capture query performance. Based on lots and lots of experimentation, it has the lowest observer overhead (depending on what you're capturing, the system, the transactions, your filtering, etc.).

----------------------------------------------------The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood... Theodore RooseveltThe Scary DBAAuthor of: SQL Server 2017 Query Performance Tuning, 5th Edition and SQL Server Execution Plans, 3rd EditionProduct Evangelist for Red Gate Software

Jeff Moden SSC Guru Points: 1000539 More actions · Answer 9

Grant Fritchey - Monday, November 12, 2018 5:52 AM
Jeff Moden - Thursday, November 8, 2018 10:43 AM
The trouble is that SET STATISTICS skews things so badly that the differences will usually not be apparent.
100% agreement with Jeff on this. I only use SET STATISTICS IO when I'm worried about individual object IO behavior. Otherwise, I always use extended events to capture query performance. Based on lots and lots of experimentation, it has the lowest observer overhead (depending on what you're capturing, the system, the transactions, your filtering, etc.).

Just to be clear, I've not given up on SET STATISTICS TIME, IO. It works great (especially for performance comparisons of two or more sets of code to compare) if you don't have the likes of Scalar or mTVF functions in the mix. There may be other things that also cause such a skew but I've not found them yet.

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.
"Change is inevitable... change for the better is not".

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)
Intro to Tally Tables and Functions