Performance optimization of UNIOn

Question

Performance optimization of UNIOn

Viewing 15 posts - 16 through 29 (of 29 total)

You must be logged in to reply to this topic. Login to reply

Gail Shaw SSC Guru Points: 1004494 More actions · Answer 1

Is that the only index?

Gail Shaw
Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

We walk in the dark places no others will enter
We stand on the bridge and no one may pass

Mohammed Imtiaz SSC Veteran Points: 254 More actions · Answer 2

Mohammed Imtiaz

SSC Veteran

Points: 254

December 10, 2008 at 2:32 am

#910169

yes..

Gail Shaw SSC Guru Points: 1004494 More actions · Answer 3

mohd.imtiaz (12/10/2008)
yes..

Why did you delete the post with the table and index definitions? Please post them again. I can't suggest indexes without having the table definition at the very least.

Gail Shaw
Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

We walk in the dark places no others will enter
We stand on the bridge and no one may pass

Mohammed Imtiaz SSC Veteran Points: 254 More actions · Answer 4

Hi,

is there any other way to write the following query??

SELECT

'D_Account' AS SourceName

, 'Dimension' AS SourceType

, Fact.Account_MemberId_Label AS MissingLabel

, Fact.Datasource_MemberId_Label AS DataSource

, Fact.Scenario_MemberId_Label AS Scenario

, sum(Fact.[Value]) AS [Value]

FROM dbo.COI_OSB_WINDOWS_FactData Fact

WHERE NOT EXISTS(SELECT D.Label FROM dbo.D_Account D

WHERE Fact.Account_MemberId_Label = D.Label)

AND Fact.Account_MemberId_Label != 'None'

GROUP BY Fact.Account_MemberId_Label, Fact.Datasource_MemberId_Label, Fact.Scenario_MemberId_Label

Mohammed Imtiaz SSC Veteran Points: 254 More actions · Answer 5

USE [COI_AppDB]

GO

/****** Object: Table [dbo].[COI_OSB_WINDOWS_FactData] Script Date: 12/10/2008 19:23:16 ******/

SET ANSI_NULLS ON

GO

SET QUOTED_IDENTIFIER ON

GO

CREATE TABLE [dbo].[COI_OSB_WINDOWS_FactData](

[Scenario_MemberId_Label] [nvarchar](40) COLLATE SQL_Latin1_General_CP1_CI_AS NULL,

[Time_Month_Label] [nvarchar](40) COLLATE SQL_Latin1_General_CP1_CI_AS NULL,

[Allocation_MemberId_Label] [nvarchar](40) COLLATE SQL_Latin1_General_CP1_CI_AS NULL,

[Channel_MemberId_Label] [nvarchar](40) COLLATE SQL_Latin1_General_CP1_CI_AS NULL,

[Currencytype_MemberId_Label] [nvarchar](40) COLLATE SQL_Latin1_General_CP1_CI_AS NULL,

[Datasource_MemberId_Label] [nvarchar](40) COLLATE SQL_Latin1_General_CP1_CI_AS NULL,

[Entity_MemberId_Label] [nvarchar](40) COLLATE SQL_Latin1_General_CP1_CI_AS NULL,

[Functionexec_MemberId_Label] [nvarchar](40) COLLATE SQL_Latin1_General_CP1_CI_AS NULL,

[InternalOrder_MemberId_Label] [nvarchar](40) COLLATE SQL_Latin1_General_CP1_CI_AS NULL,

[PricingLevel_MemberId_Label] [nvarchar](40) COLLATE SQL_Latin1_General_CP1_CI_AS NULL,

[Product_MemberId_Label] [nvarchar](40) COLLATE SQL_Latin1_General_CP1_CI_AS NULL,

[Revsum_MemberId_Label] [nvarchar](40) COLLATE SQL_Latin1_General_CP1_CI_AS NULL,

[RevType_MemberId_Label] [nvarchar](40) COLLATE SQL_Latin1_General_CP1_CI_AS NULL,

[SAPCompany_MemberId_Label] [nvarchar](40) COLLATE SQL_Latin1_General_CP1_CI_AS NULL,

[SECFuncArea_MemberId_Label] [nvarchar](40) COLLATE SQL_Latin1_General_CP1_CI_AS NULL,

[Version_MemberId_Label] [nvarchar](40) COLLATE SQL_Latin1_General_CP1_CI_AS NULL,

[Account_MemberId_Label] [nvarchar](40) COLLATE SQL_Latin1_General_CP1_CI_AS NULL,

[COIFlag_MemberId_Label] [nvarchar](40) COLLATE SQL_Latin1_General_CP1_CI_AS NULL,

[OEMCustomerType_MemberId_Label] [nvarchar](40) COLLATE SQL_Latin1_General_CP1_CI_AS NULL,

[CalcAllocation_MemberId_Label] [nvarchar](40) COLLATE SQL_Latin1_General_CP1_CI_AS NULL,

[SaleType_MemberId_Label] [nvarchar](40) COLLATE SQL_Latin1_General_CP1_CI_AS NULL,

[COIFilter_MemberId_Label] [nvarchar](40) COLLATE SQL_Latin1_General_CP1_CI_AS NULL CONSTRAINT [DF_COI_OSB_WINDOWS_FactData_COIFilter_MemberId_Label] DEFAULT (N'None'),

[Time_Month] [bigint] NULL,

[Value] [float] NOT NULL,

[RowId] [bigint] IDENTITY(1,1) NOT NULL,

[CreateDatetime] [datetime] NOT NULL CONSTRAINT [DF_COI_OSB_WINDOWS_FactData_CreateDatetime] DEFAULT (getdate())

) ON [PRIMARY]

*******************************************************

USE [COI_AppDB]

GO

/****** Object: Index [Index_DataSource_Scenario] Script Date: 12/10/2008 19:25:26 ******/

CREATE NONCLUSTERED INDEX [Index_DataSource_Scenario] ON [dbo].[COI_OSB_WINDOWS_FactData]

(

[Datasource_MemberId_Label] ASC,

[Scenario_MemberId_Label] ASC

)WITH (SORT_IN_TEMPDB = OFF, DROP_EXISTING = OFF, IGNORE_DUP_KEY = OFF, ONLINE = OFF) ON [PRIMARY]

*******************************************

is there any other way to write the given query

SELECT

'D_Account' AS SourceName

, 'Dimension' AS SourceType

, Fact.Account_MemberId_Label AS MissingLabel

, Fact.Datasource_MemberId_Label AS DataSource

, Fact.Scenario_MemberId_Label AS Scenario

--, sum(Fact.[Value]) AS [Value]

FROM dbo.COI_OSB_WINDOWS_FactData Fact

WHERE NOT EXISTS(SELECT D.Label FROM dbo.D_Account D

WHERE Fact.Account_MemberId_Label = D.Label)

AND Fact.Account_MemberId_Label != 'None'

GROUP BY Fact.Account_MemberId_Label, Fact.Datasource_MemberId_Label, Fact.Scenario_MemberId_Label

arup chakraborty Hall of Fame Points: 3640 More actions · Answer 6

Dear Adi,

It was my notion that while I am using "SELECT *", it returns whole resultset. Naturally, taking more system resources rather "SELECT 1" because it returns only 1, though the executionpan is same for the both.

Hence, "SELECT 1" gives a performance boost.

I have tested it with a table user, which has more than 500 records.

From Profiler, it gives 0(CPU), 18 (READ), 0(WRITE) , 443 (DURATION) for "SELECT *"

AND 0(CPU), 2 (READ), 0(WRITE) , 1 (DURATION) for "SELECT 1"

Gail Shaw SSC Guru Points: 1004494 More actions · Answer 7

arup_kc (12/10/2008)
Dear Adi,
It was my notion that while I am using "SELECT *", it returns whole resultset. Naturally, taking more system resources rather "SELECT 1" because it returns only 1, though the executionpan is same for the both.

Selects within an exists don't return resultsets at all. They're just used for matching rows. Hence there's no difference what it returns.

As I mentioned, during the parsing phase of a query execution, all columns are stripped out of the select within the exists. By the time the query reaches the query optimiser and query processor, there are no columns referred to in the exists at all.

I have tested it with a table user, which has more than 500 records.
From Profiler, it gives 0(CPU), 18 (READ), 0(WRITE) , 443 (DURATION) for "SELECT *"
AND 0(CPU), 2 (READ), 0(WRITE) , 1 (DURATION) for "SELECT 1"

Post your test please. Without seeing what you've done, I can't verify it.

See the test code that I posted yesterday. All variants within the exists did the same reads in the same time. Where does your test differ from mine?

Gail Shaw
Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

We walk in the dark places no others will enter
We stand on the bridge and no one may pass

SwePeso SSC-Dedicated Points: 39725 More actions · Answer 8

Haven't anyone noticed that column alias SOURCENAME has different values for the two UNIONed queries?

Change UNION to UNION ALL, because the two queries will never return same record values.

N 56°04'39.16"
E 12°55'05.25"

arup chakraborty Hall of Fame Points: 3640 More actions · Answer 9

Dear Gail and Adi,

Here is my testing code on SQL Server 2005

set statistics io on

go

set statistics time on

go

DBCC DROPCLEANBUFFERS

--EXISTS for all columns

begin

if exists(select * from users)

print '*'

end

--EXISTS for 1

begin

if exists(select 1 from users)

print '1'

end

And the result is:

SQL Server parse and compile time:

CPU time = 0 ms, elapsed time = 4 ms.

DBCC execution completed. If DBCC printed error messages, contact your system administrator.

SQL Server Execution Times:

CPU time = 0 ms, elapsed time = 1 ms.

Table 'Users'. Scan count 1, logical reads 2, physical reads 1, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

SQL Server Execution Times:

CPU time = 0 ms, elapsed time = 17 ms.

*

SQL Server Execution Times:

CPU time = 0 ms, elapsed time = 1 ms.

Table 'Users'. Scan count 1, logical reads 2, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

SQL Server Execution Times:

CPU time = 0 ms, elapsed time = 1 ms.

1

SQL Server Execution Times:

CPU time = 0 ms, elapsed time = 1 ms.

SQL Server parse and compile time:

CPU time = 0 ms, elapsed time = 1 ms.

It's showing that it takes 17 ms while using "SELECT *" but only 1 ms while using "SELECT 1"

Gail Shaw SSC Guru Points: 1004494 More actions · Answer 10

Peso (12/11/2008)
Haven't anyone noticed that column alias SOURCENAME has different values for the two UNIONed queries?
Change UNION to UNION ALL, because the two queries will never return same record values.

Sure. See my reply back on page 1 of the thread

Since there's no way that the resultsets can overlap (there's a literal string that's different in each one), use UNION ALL instead of union. Union means concatenate, sort resulting rows and eliminate duplicates. Union all just concatenates rows.

Gail Shaw
Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

We walk in the dark places no others will enter
We stand on the bridge and no one may pass

Gail Shaw SSC Guru Points: 1004494 More actions · Answer 11

arup_kc (12/11/2008)
It's showing that it takes 17 ms while using "SELECT *" but only 1 ms while using "SELECT 1"

Well, yes. For the first one, the data has to be read from disk (as shown by the two physical reads) because you dropped clean buffers. he second one doesn't have a drop clean buffers before it, and hence is reading the data from memory, which is significanly faster.

In other words, the initial conditions for the two queries are different. Physical reads (which only the first has to do) are significantly slower than logical reads (which is all the second has to do)

Put a drop clean buffers before the EXISTS (select 1) and then run it again.

btw, when dealing with elapsed times under 50ms or so, other processes running on the machine can affect the results. It's difficult to draw meaningfull conclusions about such small changes in elapsed time, especially when the values are very small.

Gail Shaw
Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

We walk in the dark places no others will enter
We stand on the bridge and no one may pass

arup chakraborty Hall of Fame Points: 3640 More actions · Answer 12

Hi Gail,

I have done the same as per ur advice. Now scan count, logical read, physical read is same for both, but "SELECT *" is taking 12 ms but "SELECT 1" is taking 1 ms.

SQL Server parse and compile time:

CPU time = 0 ms, elapsed time = 1 ms.

SQL Server Execution Times:

CPU time = 0 ms, elapsed time = 1 ms.

SQL Server parse and compile time:

CPU time = 0 ms, elapsed time = 1 ms.

SQL Server Execution Times:

CPU time = 0 ms, elapsed time = 1 ms.

SQL Server parse and compile time:

CPU time = 0 ms, elapsed time = 5 ms.

DBCC execution completed. If DBCC printed error messages, contact your system administrator.

SQL Server Execution Times:

CPU time = 0 ms, elapsed time = 1 ms.

Table 'Users'. Scan count 1, logical reads 2, physical reads 1, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

SQL Server Execution Times:

CPU time = 0 ms, elapsed time = 12 ms.

*

SQL Server Execution Times:

CPU time = 0 ms, elapsed time = 1 ms.

DBCC execution completed. If DBCC printed error messages, contact your system administrator.

SQL Server Execution Times:

CPU time = 0 ms, elapsed time = 1 ms.

Table 'Users'. Scan count 1, logical reads 2, physical reads 1, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

SQL Server Execution Times:

CPU time = 0 ms, elapsed time = 1 ms.

1

SQL Server Execution Times:

CPU time = 0 ms, elapsed time = 1 ms.

SQL Server parse and compile time:

CPU time = 0 ms, elapsed time = 1 ms.

Gail Shaw SSC Guru Points: 1004494 More actions · Answer 13

I don't have your users table, and hence cannot reproduce your test. I tried one using similar queries and the temp table that I specified for an earlier test.

I ran each 10 times to test. Every single execution came back

CPU time = 0 ms, elapsed time = 0 ms.

These times are far too small to make any judgement about. Seriously, any other process running on the machine will interfear.

If you want to test this, test it similar to the way I did it, with 2 tables and the exists in the where clause of a query. That will give times meaningful enough to compare.

And be sure to post the creation and population of the table as well, otherwise it can't be verified.

Gail Shaw
Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

We walk in the dark places no others will enter
We stand on the bridge and no one may pass

Gail Shaw SSC Guru Points: 1004494 More actions · Answer 14

Back to the original problem.

There are two things causing the slowness.

First, the use of union. If you change all to union all, you'll get the same result (the resultsets can't overlap), much faster.

Second is the lack of indexes. This one's harder to fix. Each of the queries that's been unioned needs a different index to run quickly. If you can afford to create 24 nonclustered indexes, this query will be a lot faster. Problem is, that's going to slow down insert/updates/deletes.

Is this query likely to be run regularly, or is this a once-off? If it's a regular, consider creating indexes like the following

create index idx_Account_MemberId_Label on [COI_OSB_WINDOWS_FactData] (Account_MemberId_Label, Datasource_MemberId_Label, Scenario_MemberId_Label) INCLUDE (Value)

That will make the query that references Account fast. For the others, have different leading columns, but keep the rest the same.

If you do go that route, test, test, test and make sure that the data changes haven't been severely impacted.

You should also give some thought to a clustered index. Perhaps on the identity column, I can't say for sure.

Gail Shaw
Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

We walk in the dark places no others will enter
We stand on the bridge and no one may pass

arup chakraborty Hall of Fame Points: 3640 More actions · Answer 15

HI Gail,

I have tested with your script and here is the result:

SQL Server parse and compile time:

CPU time = 62 ms, elapsed time = 85 ms.

(157580 row(s) affected)

Table 'Worktable'. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

Table '#TestingExists1_____________________________________________________________________________________________________000000000005'. Scan count 1, logical reads 1863, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

Table '#testingExists2_____________________________________________________________________________________________________000000000007'. Scan count 1, logical reads 2, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

SQL Server Execution Times:

CPU time = 828 ms, elapsed time = 6553 ms.

(157580 row(s) affected)

Table 'Worktable'. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

Table '#TestingExists1_____________________________________________________________________________________________________000000000005'. Scan count 1, logical reads 1863, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

Table '#testingExists2_____________________________________________________________________________________________________000000000007'. Scan count 1, logical reads 2, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

SQL Server Execution Times:

CPU time = 907 ms, elapsed time = 6959 ms.

Still "SELECT *" takes 6959 ms but "SELECT somestring" takes 6553 ms.

I have tested with SQL Server 2005 Standard edition.