Performance in joining annual totals to a monthy subset for a report

Question

Performance in joining annual totals to a monthy subset for a report

Oscar Janson

Old Hand

Points: 341
More actions
May 3, 2012 at 10:05 am

#255265

Hi all!
For performance purposes I'm trying to improve a view used to attach to a resultset of daily totals for a report.
It contains some salary information, which is somewhat more complex than the example I'm putting down below, but the idea should be somewhat clear. My goal is to reduce the amount of executions where needed, and end up with a faster end result.
First table declarations and population.
IF EXISTS(SELECT TABLE_NAME FROM INFORMATION_SCHEMA.TABLES WHERE TABLE_NAME = 'DailyTotal' )
DROP TABLE DailyTotal
GO
IF EXISTS(SELECT TABLE_NAME FROM INFORMATION_SCHEMA.TABLES WHERE TABLE_NAME = 'AnnualTotal' )
DROP VIEW AnnualTotal
GO
CREATE TABLE DailyTotal (
DT_Date DATETIME,
DT_Employee INT,
DT_Work DECIMAL(11,4),
DT_Vacation DECIMAL(11,4),
DT_Vacation_Added DECIMAL(11,4)
)
GO
INSERT INTO DailyTotal
SELECT '2012-01-01', 1, 0, 0, 100 UNION ALL
SELECT '2012-01-02', 1, 8, 0, 0 UNION ALL
SELECT '2012-01-03', 1, 8, 0, 0 UNION ALL
SELECT '2012-01-04', 1, 8, 0, 0 UNION ALL
SELECT '2012-01-05', 1, 8, 0, 0 UNION ALL
SELECT '2012-01-06', 1, 0, 0, 0 UNION ALL
SELECT '2012-01-07', 1, 0, 0, 0 UNION ALL
SELECT '2012-01-08', 1, 0, 8, 0 UNION ALL
SELECT '2012-01-09', 1, 0, 8, 0 UNION ALL
SELECT '2012-01-10', 1, 0, 8, 0 UNION ALL
SELECT '2012-01-11', 1, 0, 8, 0 UNION ALL
SELECT '2012-01-12', 1, 0, 8, 0 UNION ALL
SELECT '2012-01-13', 1, 0, 0, 0 UNION ALL
SELECT '2012-01-14', 1, 0, 0, 0 UNION ALL
SELECT '2012-01-15', 1, 8, 0, 0 UNION ALL
SELECT '2012-01-16', 1, 8, 0, 0
GO
CREATE VIEW AnnualTotal AS SELECT
DATEPART(YEAR, DT_Date) AS AT_Year,
DT_Employee AS AT_Employee,
SUM(DT_Work) AS AT_Work,
SUM(DT_Vacation) AS AT_Vacation,
SUM(DT_Vacation_Added) AS AT_Vacation_Added
FROM DailyTotal
GROUP BY DATEPART(YEAR, DT_Date), DT_Employee
GO
The report will display the data as above, and supply a running total. Of course I can have the report do it itself, but there's usually only a month worth of details per page per employee, and the running total also contains data of the previous five years to handle left over vacation.
Thus, I join a view with these totals and display them at group break. I do so with a query like this:
SELECT DT.*, AT.* FROM DailyTotal DT
INNER JOIN AnnualTotal AT ON AT.AT_Year = DATEPART(YEAR, DT.DT_Date) AND AT.AT_Employee = DT.DT_Employee
When I do the above, (bigger result set, view contains some table valued functions etc.) just retrieving the table of 5000 records takes 2 seconds, just retrieving the view takes about 5 seconds. With the join, it takes 20 seconds. Of course, my real life table does contain an index on ID and Date and employee. When I do a SELECT INTO on a test table, my total joined query only requires 2 seconds to return. I've considered indexed views, but there's way too many things I do need that aren't allowed with that.
Can anyone of you think of anything else? Thanks in advance!
Kind regards,
Oscar Janson

Viewing 11 posts - 1 through 10 (of 10 total)

You must be logged in to reply to this topic. Login to reply

GSquared SSC Guru Points: 260824 More actions · Answer 1

Have you looked at the execution plans and compared them?

- Gus "GSquared", RSVP, OODA, MAP, NMVP, FAQ, SAT, SQL, DNA, RNA, UOI, IOU, AM, PM, AD, BC, BCE, USA, UN, CF, ROFL, LOL, ETC
Property of The Thread

"Nobody knows the age of the human race, but everyone agrees it's old enough to know better." - Anon

Oscar Janson Old Hand Points: 341 More actions · Answer 2

Hi GSquared,

The main costs in both execution plans are in the joins. They both do a clustered index scan on the primary key, which is the usual autoincremented ID field. There is a non-clustered index on the date field on which the actual join is done, but it appears to be used in neither.

Cost percentage when joining on the view is 31% on index scan on the ID, 16% cost on a employee table (which we use for display purposes).

Takes 20 seconds.

Cost percentage when joining on the test table is 68% on index scan on the ID.

Takes 2 seconds.

Apart from the fact that I find it strange my date index isn't taken (or at least, I don't see a glaringly obvious reason why it shouldn't) I find it hard to tell why one and the same lookup takes a factor 10 in difference. I'm willing to bet a shiny quarter that the cause might be somewhere deeper.

GSquared SSC Guru Points: 260824 More actions · Answer 3

Please post both execution plans. You can upload the sqlplan files using the Attachments options for posts.

- Gus "GSquared", RSVP, OODA, MAP, NMVP, FAQ, SAT, SQL, DNA, RNA, UOI, IOU, AM, PM, AD, BC, BCE, USA, UN, CF, ROFL, LOL, ETC
Property of The Thread

"Nobody knows the age of the human race, but everyone agrees it's old enough to know better." - Anon

Oscar Janson Old Hand Points: 341 More actions · Answer 4

Here they are. The view has grown in between, and so the execution plan has changed somewhat, but the resulting amount of time is still the same.

vicdileo SSC Veteran Points: 293 More actions · Answer 5

Hi Oscar,

Thanks for providing the query plans.

Could you also provide

1) The full DDL of the view,

2) Plus the full DDL of the tables referenced by the view?

Please include all indexes and primary keys as well.

3) Also, I noticed that many of the estimated vs. actual rowcounts seem to be off. This might be because update statistics is not being run frequently enough on your tables. When were your statistics last updated? The queries in this link can tell you:

http://stackoverflow.com/questions/970632/when-were-index-statistics-last-updated

It might be interesting to re-run the query right after updating the statistics to see if there's any improvement.

4) Also, if your view is defined as

SELECT A.*, B.*

FROM table_a A, table_b B

and your tables have lots of columns, then the query will, of course, return all columns from both tables. The query optimizer may actually be choosing the most efficient query plan in order to give you all of those columns you asked for. Try re-defining your view to select only the columns you need from the base tables. Doing so might require a great deal less I/O and the optimizer can then choose a more efficient plan for your query.

Thanks,

- victor di leo

Oscar Janson Old Hand Points: 341 More actions · Answer 6

Hi Victor,

Providing a full DDL might be a bit problematic, as there is several pages of fields in there, so I'll try to keep it short while still providing useful information.

Asking only one field with the same FROM and WHERE clauses in the view appears drastically reduces execution time, so it appears at least part of the problem is in there.

Unfortunately though, I need all the fields in the view, and most of the fields in the table.

My view:

ALTER VIEW dbo.mutkop_opbouw_totalen AS

SELECT W.WCode Wkm_Code, W.Aktief Aktief, ISNULL(CJ.Cao_Code, '') Cao_Code,ISNULL(CJ.Jaar, 0) Jaartal, ISNULL(CJ.IngangsDatum, 0) CJ_Ingangsdatum,

ISNULL(CJ.Werkdagen_PJ, 0) Cao_Dagen_DitJaar, ISNULL(ST.MaxDatum, 0) Shf_Datum,

ISNULL(dbo.GetNormHoursCAO(CJ.Cao_Code, CJ.Jaar, CJ.IngangsDatum) * (ST.Perc_Voltijd / 100.00), 0) Norm_Hours,

ISNULL(GVP.Vak0, 0) Vak0,

ISNULL(GVP.Vak_opbouw0, 0) Vak_Opbouw_w0,

ISNULL(GVP.Vak_opbouw_bw0, 0) Vak_Opbouw_bw0,

/*

couple of pages of GVP fields, then a bunch of sum fields.

*/

SUM(ISNULL(MK.Laat_Vrg,0)) Laat_Vrg,

SUM(ISNULL(MK.Afwezig,0)) Afwezig,

ISNULL(DR.DayStart, 0) DayStart, ISNULL(DR.DayStop, 0) DayStop

FROM dbo.WerkNm W

LEFT OUTER JOIN dbo.CaoReg CR ON CR.CCode = W.Cao_Code

LEFT OUTER JOIN dbo.CaoJaar CJ ON CJ.Cao_Code = W.Cao_Code

INNER JOIN (SELECT Cao_Code, Jaar, MAX(IngangsDatum) IngangsDatum

FROM dbo.CaoJaar

WHERE IngangsDatum <= dbo.dt2cldate(GETDATE())

GROUP BY Cao_Code, Jaar) CJMaxDat

ON CJMaxDat.Cao_Code = CJ.Cao_Code AND

CJMaxDat.Jaar = CJ.Jaar AND

CJMaxDat.IngangsDatum = CJ.IngangsDatum

LEFT OUTER JOIN (SELECT LS.Wkm_Code, LS.MaxDatum, ST1.Perc_Voltijd FROM (

SELECT Wkm_Code, Max(Datum) MaxDatum FROM dbo.ShfTbl

WHERE Datum <= dbo.dt2cldate(GETDATE())

GROUP BY Wkm_Code) LS

INNER JOIN dbo.ShfTbl ST1 ON ST1.Wkm_Code = LS.Wkm_Code AND ST1.Datum = LS.MaxDatum) ST ON ST.Wkm_Code = W.WCode

LEFT OUTER JOIN dbo.GetVakantiePools() GVP ON GVP.Wkm = W.WCode

LEFT OUTER JOIN (SELECT CJ1.Cao_Code,Jaar, dbo.dt2cldate(dbo.GetFirstDate(CJ1.Jaar, CASE WHEN CJ1.Periode = 'M' THEN 0 ELSE 1 END )) DayStart,

dbo.dt2cldate(dbo.GetFirstDate(CJ1.Jaar + 1, CASE WHEN CJ1.Periode = 'M' THEN 0 ELSE 1 END )) - 1 DayStop

FROM dbo.CaoJaar CJ1 ) DR ON DR.Cao_Code = W.Cao_Code AND Dr.Jaar = CJ.Jaar

LEFT OUTER JOIN dbo.MutKop MK ON MK.Wkm_Code = W.WCode AND MK.Datum >= DR.DayStart AND MK.Datum <= DR.DayStop

--WHERE W.WCode = '1004'

GROUP BY W.WCode, W.Aktief, CJ.Cao_Code, CJ.Jaar, CJ.IngangsDatum, CJ.Werkdagen_PJ, ST.MaxDatum, ST.Perc_Voltijd,

GVP.Vak0, GVP.VakHst0, GVP.VakAct0, GVP.VakPla0, GVP.Vak_bw0, GVP.VakHst_bw0, GVP.VakAct_bw0, GVP.VakPla_bw0, GVP.Vak_Saldo0, GVP.Vak_bw_Saldo0, GVP.Vak_bij0, GVP.Vak_Opbouw0, GVP.Vak_bij_bw0, GVP.Vak_Opbouw_bw0, GVP.Vak_bw_Saldo0, GVP.Vak_Opbouw_w_Saldo0, GVP.Vak_Opbouw_bw_Saldo0,

GVP.Vak1, GVP.VakHst1, GVP.VakAct1, GVP.VakPla1, GVP.Vak_bw1, GVP.VakHst_bw1, GVP.VakAct_bw1, GVP.VakPla_bw1, GVP.Vak_Saldo1, GVP.Vak_bw_Saldo1, GVP.Vak_bij1, GVP.Vak_Opbouw1, GVP.Vak_bij_bw1, GVP.Vak_Opbouw_bw1, GVP.Vak_bw_Saldo1, GVP.Vak_Opbouw_w_Saldo1, GVP.Vak_Opbouw_bw_Saldo1,

GVP.Vak2, GVP.VakHst2, GVP.VakAct2, GVP.VakPla2, GVP.Vak_bw2, GVP.VakHst_bw2, GVP.VakAct_bw2, GVP.VakPla_bw2, GVP.Vak_Saldo2, GVP.Vak_bw_Saldo2, GVP.Vak_bij2, GVP.Vak_Opbouw2, GVP.Vak_bij_bw2, GVP.Vak_Opbouw_bw2, GVP.Vak_bw_Saldo2, GVP.Vak_Opbouw_w_Saldo2, GVP.Vak_Opbouw_bw_Saldo2,

GVP.Vak3, GVP.VakHst3, GVP.VakAct3, GVP.VakPla3, GVP.Vak_bw3, GVP.VakHst_bw3, GVP.VakAct_bw3, GVP.VakPla_bw3, GVP.Vak_Saldo3, GVP.Vak_bw_Saldo3, GVP.Vak_bij3, GVP.Vak_Opbouw3, GVP.Vak_bij_bw3, GVP.Vak_Opbouw_bw3, GVP.Vak_bw_Saldo3, GVP.Vak_Opbouw_w_Saldo3, GVP.Vak_Opbouw_bw_Saldo3,

GVP.Vak4, GVP.VakHst4, GVP.VakAct4, GVP.VakPla4, GVP.Vak_bw4, GVP.VakHst_bw4, GVP.VakAct_bw4, GVP.VakPla_bw4, GVP.Vak_Saldo4, GVP.Vak_bw_Saldo4, GVP.Vak_bij4, GVP.Vak_Opbouw4, GVP.Vak_bij_bw4, GVP.Vak_Opbouw_bw4, GVP.Vak_bw_Saldo4, GVP.Vak_Opbouw_w_Saldo4, GVP.Vak_Opbouw_bw_Saldo4,

GVP.Vak5, GVP.VakHst5, GVP.VakAct5, GVP.VakPla5, GVP.Vak_bw5, GVP.VakHst_bw5, GVP.VakAct_bw5, GVP.VakPla_bw5, GVP.Vak_Saldo5, GVP.Vak_bw_Saldo5, GVP.Vak_bij5, GVP.Vak_Opbouw5, GVP.Vak_bij_bw5, GVP.Vak_Opbouw_bw5, GVP.Vak_bw_Saldo5, GVP.Vak_Opbouw_w_Saldo5, GVP.Vak_Opbouw_bw_Saldo5,

DR.DayStart, DR.DayStop

--ORDER BY W.WCode

My primary table on which to report:

CREATE TABLE [dbo].[MutKop](

[ID] [int] NOT NULL,

[HstFlag] [smallint] NULL,

[Wkm_Code] [varchar](20) NULL,

[Datum] [int] NULL,

[WeekNr] [tinyint] NULL,

[Afd_Code] [varchar](30) NULL,

[Cao_Code] [varchar](8) NULL,

[Fun_Code] [varchar](8) NULL,

[Grp_Code] [varchar](8) NULL,

[Utz_Code] [varchar](8) NULL,

[Dgr_Code] [varchar](8) NULL,

[Dgr_Uren] [int] NULL,

[Dgr_Code_Org] [varchar](8) NULL,

[Controleur] [varchar](8) NULL,

[Akkoord_Dat] [int] NULL,

[Ingeklokt] [int] NULL,

[Aanvang] [int] NULL,

[Einde] [int] NULL,

[Uitgeklokt] [int] NULL,

[Onderh_Afw] [int] NULL,

[Aanwezig] [decimal](11, 4) NULL,

[Afwezig] [decimal](11, 4) NULL,

[Laat_Vrg] [decimal](11, 4) NULL,

[UrenCU] [decimal](11, 4) NULL,

[UrenCU_Betaald] [decimal](11, 4) NULL,

[TvTCU] [decimal](11, 4) NULL,

[Overw1] [decimal](11, 4) NULL,

/*

Bunch of other fields like these.

*/

[Vak_Jaar] [int] NULL,

[Flex_Bij] [decimal](11, 4) NULL,

[Flex_Af] [decimal](11, 4) NULL,

CONSTRAINT [Muk_ID_Key] PRIMARY KEY CLUSTERED

(

[ID] ASC

)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY],

CONSTRAINT [Muk_Wkm_Key] UNIQUE NONCLUSTERED

(

[Wkm_Code] ASC,

[Datum] ASC

)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON, FILLFACTOR = 95) ON [PRIMARY]

) ON [PRIMARY]

All other indexes on the MutKop table:

CREATE NONCLUSTERED INDEX [Muk_Afd_Key] ON [dbo].[MutKop]

(

[Afd_Code] ASC,

[Wkm_Code] ASC,

[Datum] ASC

)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, SORT_IN_TEMPDB = OFF, IGNORE_DUP_KEY = OFF, DROP_EXISTING = OFF, ONLINE = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]

GO

CREATE NONCLUSTERED INDEX [Muk_AfwZik_ID_Key] ON [dbo].[MutKop]

(

[AfwZik_ID] ASC

)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, SORT_IN_TEMPDB = OFF, IGNORE_DUP_KEY = OFF, DROP_EXISTING = OFF, ONLINE = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]

GO

CREATE NONCLUSTERED INDEX [Muk_Datum_Correctie_Key] ON [dbo].[MutKop]

(

[Datum] ASC,

[Wkm_Code] ASC,

[CorrectionIndex] ASC

)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, SORT_IN_TEMPDB = OFF, IGNORE_DUP_KEY = OFF, DROP_EXISTING = OFF, ONLINE = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]

GO

CREATE NONCLUSTERED INDEX [Muk_Datum_Key] ON [dbo].[MutKop]

(

[Datum] ASC,

[Wkm_Code] ASC

)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, SORT_IN_TEMPDB = OFF, IGNORE_DUP_KEY = OFF, DROP_EXISTING = OFF, ONLINE = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]

GO

CREATE NONCLUSTERED INDEX [Muk_Hafd_key] ON [dbo].[MutKop]

(

[HstFlag] ASC,

[Afd_Code] ASC,

[Wkm_Code] ASC,

[Datum] ASC

)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, SORT_IN_TEMPDB = OFF, IGNORE_DUP_KEY = OFF, DROP_EXISTING = OFF, ONLINE = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]

GO

CREATE NONCLUSTERED INDEX [Muk_HDatum_Key] ON [dbo].[MutKop]

(

[HstFlag] ASC,

[Datum] ASC,

[Wkm_Code] ASC

)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, SORT_IN_TEMPDB = OFF, IGNORE_DUP_KEY = OFF, DROP_EXISTING = OFF, ONLINE = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]

GO

CREATE NONCLUSTERED INDEX [Muk_Hwkm_key] ON [dbo].[MutKop]

(

[HstFlag] ASC,

[Wkm_Code] ASC,

[Datum] ASC

)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, SORT_IN_TEMPDB = OFF, IGNORE_DUP_KEY = OFF, DROP_EXISTING = OFF, ONLINE = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]

GO

The end query I'm testing with:

SELECT MK.*, MOT.*

FROM MutKop MK

INNER JOIN mutkop_opbouw_totalen MOT ON

--INNER JOIN mutkop_opbouw_totalen_Solid MOT ON

MOT.Wkm_Code = MK.Wkm_Code AND MOT.DayStart <= MK.Datum AND MOT.DayStop >= MK.Datum

I've also taken a look at the statistics and updated them. Last update was about a month ago on my main table, but updating them didn't appear to change much.

That should (more or less) answer whatever you wanted to know. If I missed anything, or if you need additional clarification, please let me know.

Thank you for your time so far!

Kind regards,

Oscar

vicdileo SSC Veteran Points: 293 More actions · Answer 7

That's a lot of columns... which brings me back to my previous post, where I mentioned SELECT A.*, B.*...

I found a well-written post which explains this - please scroll down to the section titled, "Return Only the Rows and Columns Needed":

http://msdn.microsoft.com/en-us/library/ff647793.aspx

Do you really need all, or even most, of those columns?

I'd be curious to know if performance improves if you select only the columns you need.

As the author explains, with SELECT * you are more likely to get a table scan. As a rule of thumb, whenever you select > 20% of the data from a table it often becomes more efficient to to a table scan (or a clustered index scan, which is bascially the same thing).

Hopefully you only need a fraction of those columns and can thus reduce the volume of data you are retrieving.

Oscar Janson Old Hand Points: 341 More actions · Answer 8

Hello Victor,

Sadly enough, yes I need all of those columns. The view has been specifically built for a salary report, and there's a lot of fields for vacation, overtime types, absence, etc. We need to show those on a daily basis to monitor which employees have been for example been sick very often in a month, and we need to reflect the annual totals to do something similar. The end result is a landscape report with a lot of columns and a summary field below.

ScottPletcher SSC Guru Points: 100262 More actions · Answer 9

Can anyone of you think of anything else?

Yes, cluster the table by date and/or employee; choose the column(s) and the order based on your other query needs.

Typically the clustered index on id is useless, unless you uncharacteristically do a lot of large joins based on id. Add a non-clus index on id if you think you'll need it.

SQL DBA,SQL Server MVP(07, 08, 09) "Money can't buy you happiness." Maybe so, but it can make your unhappiness a LOT more comfortable!

Evil Kraig F SSC Guru Points: 100851 More actions · Answer 10

The keys to this being properly 'thinned' is werknum table and the DR alias... which includes this lovely inline function to obfuscate any chance at using an index:

dbo.GetFirstDate(CJ1.Jaar, CASE WHEN CJ1.Periode = 'M' THEN 0 ELSE 1 END ))

It's from an Outer Join though so it's not an amazing problem, it's just not helping.

You're pretty hosed with that as one of your components in your where clause. The incomplete indexing list is not helping, we'd need to see the schema for everything inclusive all the way to the base tables.

There's no real way for us to help optimize this from this side, I fear, at least not in a reasonable amount of time, and especially not without all the schema and underlying definitions. You'll have to explore each one of the pieces.

That said:

The majority of the time this query is taking is in 3 table spools and the joining between them. This occassionally indicates to me that using temp tables as staging points for the query where those spools come into play have helped performance because usually you restrict the data in your own temp tables much further and more effectively than a 'bulk load' from the database that gets restricted later.

However, there's nothing in the with_view plan that shows where those spools are being formed from. If I had to guess it was probably something to do with the GetFirstDate function however. Get that cleaner and you'll help everything downstream in this. Remember that function is being called twice for *every row*.

- Craig Farrell

Never stop learning, even if it hurts. Ego bruises are practically mandatory as you learn unless you've never risked enough to make a mistake.

For better assistance in answering your questions[/url] | Forum Netiquette
For index/tuning help, follow these directions.[/url] |Tally Tables[/url]
Twitter: @AnyWayDBA