Performance optimization of UNIOn

  • Is that the only index?

    Gail Shaw
    Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
    SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

    We walk in the dark places no others will enter
    We stand on the bridge and no one may pass
  • yes..

  • mohd.imtiaz (12/10/2008)


    yes..

    Why did you delete the post with the table and index definitions? Please post them again. I can't suggest indexes without having the table definition at the very least.

    Gail Shaw
    Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
    SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

    We walk in the dark places no others will enter
    We stand on the bridge and no one may pass
  • Hi,

    is there any other way to write the following query??

    SELECT

    'D_Account' AS SourceName

    , 'Dimension' AS SourceType

    , Fact.Account_MemberId_Label AS MissingLabel

    , Fact.Datasource_MemberId_Label AS DataSource

    , Fact.Scenario_MemberId_Label AS Scenario

    , sum(Fact.[Value]) AS [Value]

    FROM dbo.COI_OSB_WINDOWS_FactData Fact

    WHERE NOT EXISTS(SELECT D.Label FROM dbo.D_Account D

    WHERE Fact.Account_MemberId_Label = D.Label)

    AND Fact.Account_MemberId_Label != 'None'

    GROUP BY Fact.Account_MemberId_Label, Fact.Datasource_MemberId_Label, Fact.Scenario_MemberId_Label

  • USE [COI_AppDB]

    GO

    /****** Object: Table [dbo].[COI_OSB_WINDOWS_FactData] Script Date: 12/10/2008 19:23:16 ******/

    SET ANSI_NULLS ON

    GO

    SET QUOTED_IDENTIFIER ON

    GO

    CREATE TABLE [dbo].[COI_OSB_WINDOWS_FactData](

    [Scenario_MemberId_Label] [nvarchar](40) COLLATE SQL_Latin1_General_CP1_CI_AS NULL,

    [Time_Month_Label] [nvarchar](40) COLLATE SQL_Latin1_General_CP1_CI_AS NULL,

    [Allocation_MemberId_Label] [nvarchar](40) COLLATE SQL_Latin1_General_CP1_CI_AS NULL,

    [Channel_MemberId_Label] [nvarchar](40) COLLATE SQL_Latin1_General_CP1_CI_AS NULL,

    [Currencytype_MemberId_Label] [nvarchar](40) COLLATE SQL_Latin1_General_CP1_CI_AS NULL,

    [Datasource_MemberId_Label] [nvarchar](40) COLLATE SQL_Latin1_General_CP1_CI_AS NULL,

    [Entity_MemberId_Label] [nvarchar](40) COLLATE SQL_Latin1_General_CP1_CI_AS NULL,

    [Functionexec_MemberId_Label] [nvarchar](40) COLLATE SQL_Latin1_General_CP1_CI_AS NULL,

    [InternalOrder_MemberId_Label] [nvarchar](40) COLLATE SQL_Latin1_General_CP1_CI_AS NULL,

    [PricingLevel_MemberId_Label] [nvarchar](40) COLLATE SQL_Latin1_General_CP1_CI_AS NULL,

    [Product_MemberId_Label] [nvarchar](40) COLLATE SQL_Latin1_General_CP1_CI_AS NULL,

    [Revsum_MemberId_Label] [nvarchar](40) COLLATE SQL_Latin1_General_CP1_CI_AS NULL,

    [RevType_MemberId_Label] [nvarchar](40) COLLATE SQL_Latin1_General_CP1_CI_AS NULL,

    [SAPCompany_MemberId_Label] [nvarchar](40) COLLATE SQL_Latin1_General_CP1_CI_AS NULL,

    [SECFuncArea_MemberId_Label] [nvarchar](40) COLLATE SQL_Latin1_General_CP1_CI_AS NULL,

    [Version_MemberId_Label] [nvarchar](40) COLLATE SQL_Latin1_General_CP1_CI_AS NULL,

    [Account_MemberId_Label] [nvarchar](40) COLLATE SQL_Latin1_General_CP1_CI_AS NULL,

    [COIFlag_MemberId_Label] [nvarchar](40) COLLATE SQL_Latin1_General_CP1_CI_AS NULL,

    [OEMCustomerType_MemberId_Label] [nvarchar](40) COLLATE SQL_Latin1_General_CP1_CI_AS NULL,

    [CalcAllocation_MemberId_Label] [nvarchar](40) COLLATE SQL_Latin1_General_CP1_CI_AS NULL,

    [SaleType_MemberId_Label] [nvarchar](40) COLLATE SQL_Latin1_General_CP1_CI_AS NULL,

    [COIFilter_MemberId_Label] [nvarchar](40) COLLATE SQL_Latin1_General_CP1_CI_AS NULL CONSTRAINT [DF_COI_OSB_WINDOWS_FactData_COIFilter_MemberId_Label] DEFAULT (N'None'),

    [Time_Month] [bigint] NULL,

    [Value] [float] NOT NULL,

    [RowId] [bigint] IDENTITY(1,1) NOT NULL,

    [CreateDatetime] [datetime] NOT NULL CONSTRAINT [DF_COI_OSB_WINDOWS_FactData_CreateDatetime] DEFAULT (getdate())

    ) ON [PRIMARY]

    *******************************************************

    USE [COI_AppDB]

    GO

    /****** Object: Index [Index_DataSource_Scenario] Script Date: 12/10/2008 19:25:26 ******/

    CREATE NONCLUSTERED INDEX [Index_DataSource_Scenario] ON [dbo].[COI_OSB_WINDOWS_FactData]

    (

    [Datasource_MemberId_Label] ASC,

    [Scenario_MemberId_Label] ASC

    )WITH (SORT_IN_TEMPDB = OFF, DROP_EXISTING = OFF, IGNORE_DUP_KEY = OFF, ONLINE = OFF) ON [PRIMARY]

    *******************************************

    is there any other way to write the given query

    SELECT

    'D_Account' AS SourceName

    , 'Dimension' AS SourceType

    , Fact.Account_MemberId_Label AS MissingLabel

    , Fact.Datasource_MemberId_Label AS DataSource

    , Fact.Scenario_MemberId_Label AS Scenario

    --, sum(Fact.[Value]) AS [Value]

    FROM dbo.COI_OSB_WINDOWS_FactData Fact

    WHERE NOT EXISTS(SELECT D.Label FROM dbo.D_Account D

    WHERE Fact.Account_MemberId_Label = D.Label)

    AND Fact.Account_MemberId_Label != 'None'

    GROUP BY Fact.Account_MemberId_Label, Fact.Datasource_MemberId_Label, Fact.Scenario_MemberId_Label

  • Dear Adi,

    It was my notion that while I am using "SELECT *", it returns whole resultset. Naturally, taking more system resources rather "SELECT 1" because it returns only 1, though the executionpan is same for the both.

    Hence, "SELECT 1" gives a performance boost.

    I have tested it with a table user, which has more than 500 records.

    From Profiler, it gives 0(CPU), 18 (READ), 0(WRITE) , 443 (DURATION) for "SELECT *"

    AND 0(CPU), 2 (READ), 0(WRITE) , 1 (DURATION) for "SELECT 1"

  • arup_kc (12/10/2008)


    Dear Adi,

    It was my notion that while I am using "SELECT *", it returns whole resultset. Naturally, taking more system resources rather "SELECT 1" because it returns only 1, though the executionpan is same for the both.

    Selects within an exists don't return resultsets at all. They're just used for matching rows. Hence there's no difference what it returns.

    As I mentioned, during the parsing phase of a query execution, all columns are stripped out of the select within the exists. By the time the query reaches the query optimiser and query processor, there are no columns referred to in the exists at all.

    I have tested it with a table user, which has more than 500 records.

    From Profiler, it gives 0(CPU), 18 (READ), 0(WRITE) , 443 (DURATION) for "SELECT *"

    AND 0(CPU), 2 (READ), 0(WRITE) , 1 (DURATION) for "SELECT 1"

    Post your test please. Without seeing what you've done, I can't verify it.

    See the test code that I posted yesterday. All variants within the exists did the same reads in the same time. Where does your test differ from mine?

    Gail Shaw
    Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
    SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

    We walk in the dark places no others will enter
    We stand on the bridge and no one may pass
  • Haven't anyone noticed that column alias SOURCENAME has different values for the two UNIONed queries?

    Change UNION to UNION ALL, because the two queries will never return same record values.


    N 56°04'39.16"
    E 12°55'05.25"

  • Dear Gail and Adi,

    Here is my testing code on SQL Server 2005

    set statistics io on

    go

    set statistics time on

    go

    DBCC DROPCLEANBUFFERS

    --EXISTS for all columns

    begin

    if exists(select * from users)

    print '*'

    end

    --EXISTS for 1

    begin

    if exists(select 1 from users)

    print '1'

    end

    And the result is:

    SQL Server parse and compile time:

    CPU time = 0 ms, elapsed time = 4 ms.

    DBCC execution completed. If DBCC printed error messages, contact your system administrator.

    SQL Server Execution Times:

    CPU time = 0 ms, elapsed time = 1 ms.

    Table 'Users'. Scan count 1, logical reads 2, physical reads 1, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

    SQL Server Execution Times:

    CPU time = 0 ms, elapsed time = 17 ms.

    *

    SQL Server Execution Times:

    CPU time = 0 ms, elapsed time = 1 ms.

    Table 'Users'. Scan count 1, logical reads 2, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

    SQL Server Execution Times:

    CPU time = 0 ms, elapsed time = 1 ms.

    1

    SQL Server Execution Times:

    CPU time = 0 ms, elapsed time = 1 ms.

    SQL Server parse and compile time:

    CPU time = 0 ms, elapsed time = 1 ms.

    It's showing that it takes 17 ms while using "SELECT *" but only 1 ms while using "SELECT 1"

  • Peso (12/11/2008)


    Haven't anyone noticed that column alias SOURCENAME has different values for the two UNIONed queries?

    Change UNION to UNION ALL, because the two queries will never return same record values.

    Sure. See my reply back on page 1 of the thread

    Since there's no way that the resultsets can overlap (there's a literal string that's different in each one), use UNION ALL instead of union. Union means concatenate, sort resulting rows and eliminate duplicates. Union all just concatenates rows.

    Gail Shaw
    Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
    SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

    We walk in the dark places no others will enter
    We stand on the bridge and no one may pass
  • arup_kc (12/11/2008)


    It's showing that it takes 17 ms while using "SELECT *" but only 1 ms while using "SELECT 1"

    Well, yes. For the first one, the data has to be read from disk (as shown by the two physical reads) because you dropped clean buffers. he second one doesn't have a drop clean buffers before it, and hence is reading the data from memory, which is significanly faster.

    In other words, the initial conditions for the two queries are different. Physical reads (which only the first has to do) are significantly slower than logical reads (which is all the second has to do)

    Put a drop clean buffers before the EXISTS (select 1) and then run it again.

    btw, when dealing with elapsed times under 50ms or so, other processes running on the machine can affect the results. It's difficult to draw meaningfull conclusions about such small changes in elapsed time, especially when the values are very small.

    Gail Shaw
    Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
    SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

    We walk in the dark places no others will enter
    We stand on the bridge and no one may pass
  • Hi Gail,

    I have done the same as per ur advice. Now scan count, logical read, physical read is same for both, but "SELECT *" is taking 12 ms but "SELECT 1" is taking 1 ms.

    SQL Server parse and compile time:

    CPU time = 0 ms, elapsed time = 1 ms.

    SQL Server Execution Times:

    CPU time = 0 ms, elapsed time = 1 ms.

    SQL Server parse and compile time:

    CPU time = 0 ms, elapsed time = 1 ms.

    SQL Server Execution Times:

    CPU time = 0 ms, elapsed time = 1 ms.

    SQL Server parse and compile time:

    CPU time = 0 ms, elapsed time = 5 ms.

    DBCC execution completed. If DBCC printed error messages, contact your system administrator.

    SQL Server Execution Times:

    CPU time = 0 ms, elapsed time = 1 ms.

    Table 'Users'. Scan count 1, logical reads 2, physical reads 1, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

    SQL Server Execution Times:

    CPU time = 0 ms, elapsed time = 12 ms.

    *

    SQL Server Execution Times:

    CPU time = 0 ms, elapsed time = 1 ms.

    DBCC execution completed. If DBCC printed error messages, contact your system administrator.

    SQL Server Execution Times:

    CPU time = 0 ms, elapsed time = 1 ms.

    Table 'Users'. Scan count 1, logical reads 2, physical reads 1, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

    SQL Server Execution Times:

    CPU time = 0 ms, elapsed time = 1 ms.

    1

    SQL Server Execution Times:

    CPU time = 0 ms, elapsed time = 1 ms.

    SQL Server parse and compile time:

    CPU time = 0 ms, elapsed time = 1 ms.

  • I don't have your users table, and hence cannot reproduce your test. I tried one using similar queries and the temp table that I specified for an earlier test.

    I ran each 10 times to test. Every single execution came back

    CPU time = 0 ms, elapsed time = 0 ms.

    These times are far too small to make any judgement about. Seriously, any other process running on the machine will interfear.

    If you want to test this, test it similar to the way I did it, with 2 tables and the exists in the where clause of a query. That will give times meaningful enough to compare.

    And be sure to post the creation and population of the table as well, otherwise it can't be verified.

    Gail Shaw
    Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
    SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

    We walk in the dark places no others will enter
    We stand on the bridge and no one may pass
  • Back to the original problem.

    There are two things causing the slowness.

    First, the use of union. If you change all to union all, you'll get the same result (the resultsets can't overlap), much faster.

    Second is the lack of indexes. This one's harder to fix. Each of the queries that's been unioned needs a different index to run quickly. If you can afford to create 24 nonclustered indexes, this query will be a lot faster. Problem is, that's going to slow down insert/updates/deletes.

    Is this query likely to be run regularly, or is this a once-off? If it's a regular, consider creating indexes like the following

    create index idx_Account_MemberId_Label on [COI_OSB_WINDOWS_FactData] (Account_MemberId_Label, Datasource_MemberId_Label, Scenario_MemberId_Label) INCLUDE (Value)

    That will make the query that references Account fast. For the others, have different leading columns, but keep the rest the same.

    If you do go that route, test, test, test and make sure that the data changes haven't been severely impacted.

    You should also give some thought to a clustered index. Perhaps on the identity column, I can't say for sure.

    Gail Shaw
    Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
    SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

    We walk in the dark places no others will enter
    We stand on the bridge and no one may pass
  • HI Gail,

    I have tested with your script and here is the result:

    SQL Server parse and compile time:

    CPU time = 62 ms, elapsed time = 85 ms.

    (157580 row(s) affected)

    Table 'Worktable'. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

    Table '#TestingExists1_____________________________________________________________________________________________________000000000005'. Scan count 1, logical reads 1863, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

    Table '#testingExists2_____________________________________________________________________________________________________000000000007'. Scan count 1, logical reads 2, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

    SQL Server Execution Times:

    CPU time = 828 ms, elapsed time = 6553 ms.

    (157580 row(s) affected)

    Table 'Worktable'. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

    Table '#TestingExists1_____________________________________________________________________________________________________000000000005'. Scan count 1, logical reads 1863, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

    Table '#testingExists2_____________________________________________________________________________________________________000000000007'. Scan count 1, logical reads 2, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

    SQL Server Execution Times:

    CPU time = 907 ms, elapsed time = 6959 ms.

    Still "SELECT *" takes 6959 ms but "SELECT somestring" takes 6553 ms.

    I have tested with SQL Server 2005 Standard edition.

Viewing 15 posts - 16 through 29 (of 29 total)

You must be logged in to reply to this topic. Login to reply