Why do these similar queries run at such a different speed?

  • This one is bugging me. I cant figure out why one query runs extremly slow while the other is fast as lightning.

    Any suggestions would be extremly helpful!

    The queried table has about 1.8 mill rows. The inner select for both queries return about 50000 rows and if run by itself it finishes instantly in both queries.

    First of the fast one. It runs in about 1-4 secs:

    SELECT b.TransactionType_FK, b.TransactionVersion_FK, b.Department_FK, b.Account_FK, b.CounterpartCompany_FK, b.Project_FK,

    b.Company_FK, b.Currency_FK, b.Year, b.Month, sum(f.Amount)

    FROM Fact2012Base.Posting f

    INNER JOIN

    (select distinct tr.TransactionType_FK, innerP.TransactionVersion_FK, Department_FK, Account_FK, CounterpartCompany_FK, Project_FK, Company_FK, Currency_FK,

    Year, Month

    From Fact2012Base.Posting innerP

    inner join DimensionBase.[Transaction] tr

    on tr.TransactionVersion_FK = innerP.TransactionVersion_FK

    and tr.TransactionType_FK = @TransactionType_FK

    WHERE innerP.Handled = 0

    and innerP.SourceSystem_FK = @SourceSystem_FK

    and innerP.Year = @Year

    ) b

    ONb.TransactionVersion_FK = f.TransactionVersion_FK

    AND b.Department_FK = f.Department_FK

    AND b.Account_FK = f.Account_FK

    AND isnull(b.CounterpartCompany_FK,0) = isnull(f.CounterpartCompany_FK,0)

    AND isnull(b.Project_FK,0) = isnull(f.Project_FK,0)

    AND b.Company_FK = f.Company_FK

    AND b.Currency_FK = f.Currency_FK

    AND b.Year = f.Year

    AND b.Month = f.Month

    GROUP BY

    b.TransactionType_FK, b.TransactionVersion_FK, b.Department_FK, b.Account_FK, b.CounterpartCompany_FK, b.Project_FK, b.Company_FK, b.Currency_FK,b.Year, b.MONTH

    The second one is a variation of the first one. Basicly it groups by more columns. This one is extremly slow. It used to run i 7 mins. But after changing isnull() comparison to a different way, it now runs at 2,5 mins. Indexes are listed at the bottom. I have added all indexes suggested by the query optimizer.

    SELECT b.SourceSystem_FK, b.TransactionVersion_FK, b.TransactionType_FK, b.Company_FK, b.Department_FK, b.Account_FK, b.Currency_FK,

    b.Supplier_FK, b.Customer_FK, b.Vessel_FK, b.CounterpartCompany_FK, b.SFICode_FK, b.Project_FK, b.ProductGroupGeoTeam_FK, b.CurrencyOriginal_FK,

    b.DynamicsDocumentType_FK, b.StaffCategory_FK, b.Year, b.Month, sum(f.Amount)

    FROM Fact2012Base.Posting f

    INNER JOIN

    (SELECT DISTINCT SourceSystem_FK, innerP.TransactionVersion_FK, TransactionType_FK, Company_FK, Department_FK, Account_FK,Currency_FK,

    Supplier_FK, Customer_FK, Vessel_FK, CounterpartCompany_FK, SFICode_FK, Project_FK, ProductGroupGeoTeam_FK, CurrencyOriginal_FK,

    DynamicsDocumentType_FK, StaffCategory_FK, Year, Month

    From Fact2012Base.Posting innerP

    inner join DimensionBase.[Transaction] tr

    on tr.TransactionVersion_FK = innerP.TransactionVersion_FK

    and tr.TransactionType_FK = @TransactionType_FK

    WHERE innerP.Handled = 0

    and innerP.SourceSystem_FK = @SourceSystem_FK

    and innerP.Year = @Year

    ) b

    ONb.SourceSystem_FK = f.SourceSystem_FK

    AND b.TransactionVersion_FK = f.TransactionVersion_FK

    AND b.Year = f.Year

    AND b.Month = f.Month

    AND b.Company_FK = f.Company_FK

    AND b.Department_FK = f.Department_FK

    AND b.Account_FK = f.Account_FK

    AND b.Currency_FK = f.Currency_FK

    AND (b.Supplier_FK = f.supplier_FK OR (b.Supplier_FK is null and f.Supplier_FK is null))

    AND (b.Customer_FK = f.Customer_FK OR (b.Customer_FK is null and f.Customer_FK is null))

    AND (b.Vessel_FK = f.Vessel_FK OR (b.Vessel_FK is null and f.Vessel_FK is null))

    AND (b.CounterpartCompany_FK = f.CounterpartCompany_FK OR (b.CounterpartCompany_FK is null and f.CounterpartCompany_FK is null))

    AND (b.SFICode_FK = f.SFICode_FK OR (b.SFICode_FK is null and f.SFICode_FK is null))

    AND (b.Project_FK= f.Project_FK OR (b.Project_FK is null and f.Project_FK is null))

    AND (b.ProductGroupGeoteam_FK= f.ProductGroupGeoteam_FK OR (b.ProductGroupGeoteam_FK is null and f.ProductGroupGeoteam_FK is null))

    AND (b.CurrencyOriginal_FK= f.CurrencyOriginal_FK OR (b.CurrencyOriginal_FK is null and f.CurrencyOriginal_FK is null))

    AND (b.DynamicsDocumentType_FK= f.DynamicsDocumentType_FK OR (b.DynamicsDocumentType_FK is null and f.DynamicsDocumentType_FK is null))

    AND (b.StaffCategory_FK= f.StaffCategory_FK OR (b.StaffCategory_FK is null and f.StaffCategory_FK is null))

    GROUP BY

    b.SourceSystem_FK, b.TransactionVersion_FK, b.TransactionType_FK, b.Company_FK, b.Department_FK, b.Account_FK, b.Currency_FK,

    b.Supplier_FK, b.Customer_FK, b.Vessel_FK, b.CounterpartCompany_FK, b.SFICode_FK, b.Project_FK, b.ProductGroupGeoTeam_FK, b.CurrencyOriginal_FK,

    b.DynamicsDocumentType_FK, b.StaffCategory_FK, b.Year, b.Month

    Indexes:

    CREATE NONCLUSTERED INDEX [IX_SourceYearHandled] ON [Fact2012Base].[Posting]

    (

    [SourceSystem_FK] ASC,

    [Year] ASC,

    [Handled] ASC

    )

    INCLUDE ( [TransactionVersion_FK],[Company_FK],[Department_FK],[Account_FK],[Currency_FK],[Supplier_FK],[Customer_FK],[Vessel_FK],[CounterpartCompany_FK],[SFICode_FK],

    [Project_FK],[ProductGroupGeoteam_FK],[CurrencyOriginal_FK],[DynamicsDocumentType_FK],[Month],[StaffCategory_FK])

    GO

    CREATE NONCLUSTERED INDEX [IX_Source_Year] ON [Fact2012Base].[Posting]

    (

    [SourceSystem_FK] ASC,

    [Year] ASC

    )

    INCLUDE ( [TransactionVersion_FK],[Company_FK],[Department_FK],[Account_FK],[Currency_FK],[Supplier_FK],[Customer_FK],[Vessel_FK],[CounterpartCompany_FK],[SFICode_FK],

    [Project_FK],[ProductGroupGeoteam_FK],[Amount],[CurrencyOriginal_FK],[DynamicsDocumentType_FK],[Month],[StaffCategory_FK])

    GO

    CREATE NONCLUSTERED INDEX [IX_Source] ON [Fact2012Base].[Posting]

    (

    [SourceSystem_FK] ASC

    )

    INCLUDE ( [TransactionVersion_FK],[Year],[Company_FK],[Department_FK],[Account_FK],[Currency_FK],[Supplier_FK],[Customer_FK],[Vessel_FK],[CounterpartCompany_FK],[SFICode_FK],

    [Project_FK],[ProductGroupGeoteam_FK],[Amount],[CurrencyOriginal_FK],[DynamicsDocumentType_FK],[DynamicsDocumentID],[LinkToBaswareDocument],[Description],[Month],[StaffCategory_FK])

    GO

    CREATE NONCLUSTERED INDEX [IX_Hierarchy] ON [Fact2012Base].[Posting]

    (

    [TransactionVersion_FK] ASC,

    [Department_FK] ASC,

    [SourceSystem_FK] ASC,

    [Year] ASC,

    [Amount] ASC

    )

    INCLUDE ( [Company_FK],[Account_FK],[Currency_FK],[Vessel_FK],[CounterpartCompany_FK],[SFICode_FK],[Project_FK],[ProductGroupGeoteam_FK])

  • Second time today I've recommended Gail's blog, I should be on commission!

    Anyway, try this blog post[/url] which explains the issue you're experiencing.


    Forever trying to learn
    My blog - http://www.cadavre.co.uk/
    For better, quicker answers on T-SQL questions, click on the following...http://www.sqlservercentral.com/articles/Best+Practices/61537/
    For better, quicker answers on SQL Server performance related questions, click on the following...http://www.sqlservercentral.com/articles/SQLServerCentral/66909/

  • Thanks for your input.

    However i cant see how the blogpost relates to my queries.

    The inputed parameters is always provided (never nulls) so there is not much use to use recompile or dynamic sql. The null handling is only there because there is multiple columns that can have null values and these null values need to be compared to other null values for equality.

  • erikpoker (2/7/2012)


    Thanks for your input.

    However i cant see how the blogpost relates to my queries.

    The inputed parameters is always provided (never nulls) so there is not much use to use recompile or dynamic sql. The null handling is only there because there is multiple columns that can have null values and these null values need to be compared to other null values for equality.

    My apologies, I only glanced at your queries and assumed they were catch-all queries.

    OK, can you attach the actual execute plans please?


    Forever trying to learn
    My blog - http://www.cadavre.co.uk/
    For better, quicker answers on T-SQL questions, click on the following...http://www.sqlservercentral.com/articles/Best+Practices/61537/
    For better, quicker answers on SQL Server performance related questions, click on the following...http://www.sqlservercentral.com/articles/SQLServerCentral/66909/

  • I've added the executionplan of the slow running query

  • Can you attach the .sqlplan ? An image is no good.


    Forever trying to learn
    My blog - http://www.cadavre.co.uk/
    For better, quicker answers on T-SQL questions, click on the following...http://www.sqlservercentral.com/articles/Best+Practices/61537/
    For better, quicker answers on SQL Server performance related questions, click on the following...http://www.sqlservercentral.com/articles/SQLServerCentral/66909/

  • My apologies. Attached you will find the actual executionplans of the slow and fast running query.

  • erikpoker (2/7/2012)


    My apologies. Attached you will find the actual executionplans of the slow and fast running query.

    No problem.

    In the slow running query, your statistics look out of date (e.g. you have a hash match operator that is estimating 7 rows but the actual number of rows is 26,633,556 - that costs you nearly 35% of the query). They look out of date in the fast query as well, but it isn't having to deal with as much data.

    I'll be honest, I'm not exactly a query plan expert (one of the topics on my "to-do" list), so other than updating your statistics I'm out of ideas. I'll see if I can drum up any interest from the experts for you.


    Forever trying to learn
    My blog - http://www.cadavre.co.uk/
    For better, quicker answers on T-SQL questions, click on the following...http://www.sqlservercentral.com/articles/Best+Practices/61537/
    For better, quicker answers on SQL Server performance related questions, click on the following...http://www.sqlservercentral.com/articles/SQLServerCentral/66909/

  • Cadavre (2/7/2012)

    In the slow running query, your statistics look out of date (e.g. you have a hash match operator that is estimating 7 rows but the actual number of rows is 26,633,556 - that costs you nearly 35% of the query). They look out of date in the fast query as well, but it isn't having to deal with as much data.

    I'll be honest, I'm not exactly a query plan expert (one of the topics on my "to-do" list), so other than updating your statistics I'm out of ideas. I'll see if I can drum up any interest from the experts for you.

    Thanks! That would be great 🙂

    I just updated the statistics, but the time and query plan is still exactly the same. Actual and estimated rows on the hash match is still aprox 26 mill and estimated 9 rows.

    But why on earth would the hash match actual rows be 26 mill? The inner query should return 50000 rows and the table has only about 2 mill rows...

  • erikpoker (2/7/2012)


    Cadavre (2/7/2012)

    In the slow running query, your statistics look out of date (e.g. you have a hash match operator that is estimating 7 rows but the actual number of rows is 26,633,556 - that costs you nearly 35% of the query). They look out of date in the fast query as well, but it isn't having to deal with as much data.

    I'll be honest, I'm not exactly a query plan expert (one of the topics on my "to-do" list), so other than updating your statistics I'm out of ideas. I'll see if I can drum up any interest from the experts for you.

    Thanks! That would be great 🙂

    I just updated the statistics, but the time and query plan is still exactly the same. Actual and estimated rows on the hash match is still aprox 26 mill and estimated 9 rows.

    But why on earth would the hash match actual rows be 26 mill? The inner query should return 50000 rows and the table has only about 2 mill rows...

    Well, it looks like your DISTINCT isn't being being applied until after the data is joined, which explains the massive amount of extra data.


    Forever trying to learn
    My blog - http://www.cadavre.co.uk/
    For better, quicker answers on T-SQL questions, click on the following...http://www.sqlservercentral.com/articles/Best+Practices/61537/
    For better, quicker answers on SQL Server performance related questions, click on the following...http://www.sqlservercentral.com/articles/SQLServerCentral/66909/

  • The main difference I see is how the distinct sort is performed.

    In the "fast" plan the distinct sort is performed before the hash join and operates on 163508 rows.

    In the "slow" plan the distinct sort is performed after the hash join and has to work on 26633556 rows.

    A big difference.

    -- Gianluca Sartori

  • Cadavre (2/7/2012)


    erikpoker (2/7/2012)


    Cadavre (2/7/2012)

    In the slow running query, your statistics look out of date (e.g. you have a hash match operator that is estimating 7 rows but the actual number of rows is 26,633,556 - that costs you nearly 35% of the query). They look out of date in the fast query as well, but it isn't having to deal with as much data.

    I'll be honest, I'm not exactly a query plan expert (one of the topics on my "to-do" list), so other than updating your statistics I'm out of ideas. I'll see if I can drum up any interest from the experts for you.

    Thanks! That would be great 🙂

    I just updated the statistics, but the time and query plan is still exactly the same. Actual and estimated rows on the hash match is still aprox 26 mill and estimated 9 rows.

    But why on earth would the hash match actual rows be 26 mill? The inner query should return 50000 rows and the table has only about 2 mill rows...

    Well, it looks like your DISTINCT isn't being being applied until after the data is joined, which explains the massive amount of extra data.

    UH. You beat me at it. :blush:

    -- Gianluca Sartori

  • Ok. Well that explains a lot.

    So the natural followup would then be:

    What did i do wrong in the second query to get this behaviour. Or how can i change it?

  • Another interesting fact:

    If i run just the inner query with distinct i get 54940 rows

    Without distinct i get 163508 rows

    Im no expert, but from these numbers i suspect there must be something else we are missing. Even though the distinct is run at a later time, there is nowhere near 26 mill rows...

  • If you look at the properties on the select operator you'll notice that the query plan is timing out. I would move the inner query into a temp table if possible and join to it. This would also give you the option of creating indexes as needed on the temp table to improve performance.

Viewing 15 posts - 1 through 15 (of 24 total)

You must be logged in to reply to this topic. Login to reply