Need Help with this massive report

  • We have a report that the developer wrote and he has no idea what he did to untune the database. The query that ran 10 minutes is not taking 800 minutes. Here is the sql statement. I have indexed the query to its max. The maximum cost is at the bold section of the query below. Is there a better way to write this query? I am new to SQL Tuning and am clueless as to what should be the next steps. I tried to add a few additional indexes and it only increased the overall cost and included an expensive HASH MATCH/RIGHT ANTI SEMI JOIN. I would appreciate any help.

    Thank you

     

     

    select DISTINCT

    A.HH_ORIGIN_LOC ,

    A.BUSINESS_UNIT ,

    A.SO_ID ,

    B.NAME ,

    A.HH_COMP_DATE 

    from PS_RF_SO_HDR A, PS_RB_PERSON B, PS_HH_SOBILL_LINE C, PS_RF_SOLN_ASSIGN G

    where A.CUST_PERSON_ID = B.PERSON_ID AND

    A.SO_ID NOT IN (Select distinct(H.SO_ID) from PS_HH_NARDA_HIST H

                    where (H.HH_NARDA_STATUS in ('SB', 'RJ', 'RS')) and H.HH_NARDA_NUM <> ' ' and H.DTTM_STAMP < '2004-05-03') AND

    A.BUSINESS_UNIT = C.BUSINESS_UNIT AND

    A.SO_ID = C.SO_ID AND

    C.HH_BILL_TO IN ('M','E') AND

    A.SO_DATE >= '2002-JAN-01' AND

    A.SO_STATUS IN ('C','D','X') AND

    A.HH_COMP_DATE >= '2004-04-01' AND

    A.HH_COMP_DATE < '2004-05-01' AND

    A.HH_COMP_DATE <= '2004-05-03' AND

    G.BUSINESS_UNIT = A.BUSINESS_UNIT AND

    G.SO_ID = A.SO_ID AND

    G.SO_LINE = (Select max(E.SO_LINE) from PS_RF_SOLN_ASSIGN E

                where E.BUSINESS_UNIT = A.BUSINESS_UNIT and E.SO_ID = A.SO_ID

                and E.PERSON_ID <> '' and E.PRIMARY_FLAG = 'Y') AND

    G.PERSON_ID NOT IN (Select PERSON_ID from PS_RB_WORK_STORLOC where STORLOC_TYPE = '00' AND BUSINESS_UNIT_IN <> ' ')

    AND G.PERSON_ID <> ' '

    AND G.PRIMARY_FLAG = 'Y'

    order by A.HH_ORIGIN_LOC, A.SO_ID

  • Have you updated the stats?

    Check the Auto Update Stats option under Options - if it isn't set, then execute sp_updatestats in the database, from QA. Might take a long time running though, so run it at a quiet time.

  • Yes Auto updates have been turned on. So that is not the issue. I was able to reduce the cost a tiny bit by using the BETWEEN parameter for the Date range search and also by removing one of the IN and replacing it by EXISTS.

    select DISTINCT

    A.HH_ORIGIN_LOC ,

    A.BUSINESS_UNIT ,

    A.SO_ID ,

    B.NAME ,

    A.HH_COMP_DATE 

    from PS_RF_SO_HDR A, PS_RB_PERSON B, PS_HH_SOBILL_LINE C, PS_RF_SOLN_ASSIGN G

    where A.CUST_PERSON_ID = B.PERSON_ID AND

    A.SO_ID NOT IN (Select distinct(H.SO_ID) from PS_HH_NARDA_HIST H

                    where (H.HH_NARDA_STATUS in ('SB', 'RJ', 'RS')) and H.HH_NARDA_NUM <> ' ' and H.DTTM_STAMP < '2004-05-03') AND

    A.BUSINESS_UNIT = C.BUSINESS_UNIT AND

    A.SO_ID = C.SO_ID AND

    C.HH_BILL_TO IN ('M','E') AND

    A.SO_DATE >= '2002-JAN-01' AND

    EXISTS (SELECT A.SO_STATUS FROM PS_RF_SO_HDR A WHERE A.SO_STATUS IN ('C','D','X')) AND

    A.HH_COMP_DATE BETWEEN '2004-04-01' AND '2004-05-03'  AND

    G.BUSINESS_UNIT = A.BUSINESS_UNIT AND

    G.SO_ID = A.SO_ID AND

    G.SO_LINE = (Select max(E.SO_LINE) from PS_RF_SOLN_ASSIGN E

                where E.BUSINESS_UNIT = A.BUSINESS_UNIT and E.SO_ID = A.SO_ID

                and E.PERSON_ID IS NOT NULL and E.PRIMARY_FLAG = 'Y') AND

    G.PERSON_ID NOT IN (Select PERSON_ID from PS_RB_WORK_STORLOC where STORLOC_TYPE = '00' AND BUSINESS_UNIT_IN <> ' ')

    AND G.PERSON_ID <> ' '

    AND G.PRIMARY_FLAG = 'Y'

    order by A.HH_ORIGIN_LOC, A.SO_ID

    I definitely think that the solution is in rewriting the statement.

    Thank you for your help.

     

  • 1. You need to change your join types. Microsoft recomend that you use the explicit JOIN command, in fact Microsoft declare that the join syntax you are using returns inconsistant results.

    Transact-SQL Joins

    In earlier versions of Microsoft® SQL Server™ 2000, left and right outer join conditions were specified in the WHERE clause using the *= and =* operators. In some cases, this syntax results in an ambiguous query that can be interpreted in more than one way. SQL-92 compliant outer joins are specified in the FROM clause and do not result in this ambiguity. Because the SQL-92 syntax is more precise, detailed information about using the old Transact-SQL outer join syntax in the WHERE clause is not included with this release. The syntax may not be supported in a future version of SQL Server. Any statements using the Transact-SQL outer joins should be changed to use the SQL-92 syntax.

    The SQL-92 standard does support the specification of inner joins in either the FROM or WHERE clause. Inner joins specified in the WHERE clause do not have the same problems with ambiguity as the Transact-SQL outer join syntax.

    2. I'd recode the sub queries back in to the main query and use a left or right join

    regards

    Jeremy

     

     

  • Also...

    "A.SO_ID NOT IN (Select distinct(H.SO_ID) from PS_HH_NARDA_HIST H

                    where (H.HH_NARDA_STATUS in ('SB', 'RJ', 'RS')) and H.HH_NARDA_NUM <> ' ' and H.DTTM_STAMP < '2004-05-03') AND"

     

    The DISTINCT portion of this is unnecessary from a performance point of view...in that it requires a sort to produce the sub-list....one might as well be NOT IN (1,1,1,1,1,1,2) as NOT IN (1,2).

    Also I'd support the statements encouraging your developer to rewrite and use the new standard-join formats....

     

    You can also use some index hints....can you post your 'estimated query plan'?...and potentially the ddl of the tables used in the query....

    Your develoiper also uses a lot of "<>" which can prevent indices being used....any chance these portions could be reworked/restated (to get the same result...but allowing the query processor to use indices)

  • picking up on the previous comment. using comparisons like <> can cause table scans even when a suitable index is available.

    If table is a big table, I'd also look at (if possible) creating a covered index to enhance performance (put all of the fields in the select statement from table A in the same index along with any fields from a in WHERE conditions [) . This enables the query to bypass the table and retrieve all of the data from the index (as long as its not a clustered index)

    If the data volumes are large I'd also consider rewriting as a stored proc putting the subqueries into temporary tables first (even putting the H.HH_NARDA_STATUS in ('SB', 'RJ', 'RS') in a temp table and joining to it is faster than using IN for large data volumes.

     

     

  • picking up on the previous comment. using comparisons like <> can cause table scans even when a suitable index is available.

    If table is a big table, I'd also look at (if possible) creating a covered index to enhance performance (put all of the fields in the select statement from table A in the same index along with any fields from a in WHERE conditions [) . This enables the query to bypass the table and retrieve all of the data from the index (as long as its not a clustered index)

    If the data volumes are large I'd also consider rewriting as a stored proc putting the subqueries into temporary tables first (even putting the H.HH_NARDA_STATUS in ('SB', 'RJ', 'RS') in a temp table and joining to it is faster than using IN for large data volumes.

     

     

  • A couple of potential solutions. Sorry if there are any typos etc, but I obviously have nothing to test this against.

    First of all, replacing all but one of the subqueries with proper joins gives :

    select distinct

    A.HH_ORIGIN_LOC ,

    A.BUSINESS_UNIT ,

    A.SO_ID ,

    B.NAME ,

    A.HH_COMP_DATE

    from

    ps_rf_so_hdr as a

    join ps_rb_person as b on b.person_id = a.cust_person_id

    join ps_hh_sobill_line as c on c.so_id = a.so_id and c.business_unit = a.business_unit

    join ps_rf_soln_assign as g on g.so_id = a.so_id and g.business_unit = a.business_unit

    left join ps_rb_work_storloc as h on h.person_id = g.person_id and h.storloc_type = '00' and h.business_unit ' '

    left join ps_hh_narda_hist as i on i.so_id = a.so_id and i.hh_narda_status in ('SB', 'RJ', 'RS') and i.hh_narda_num ' ' and i.dttm_stamp = '2002-jan-01' and

    a.hh_comp_date between '2004-04-01' and '2004-05-03' and

    c.hh_bill_to in ('M','E') and

    g.person_id ' ' and

    g.prinmary_flag = 'Y' and

    h.person_id is null and

    i.so_id is null and

    G.SO_LINE = (Select max(E.SO_LINE) from PS_RF_SOLN_ASSIGN E

    where E.BUSINESS_UNIT = A.BUSINESS_UNIT and E.SO_ID = A.SO_ID

    and E.PERSON_ID IS NOT NULL and E.PRIMARY_FLAG = 'Y')

    order by

    a.hh_origin_loc,

    a.so_id

    I'm concerned that the remaining subquery might hinder performance, so the following retrieves the max() values first. You'd have to try both to see which is best for you, since it will depend on the size of your tables.

    select

    so_id,

    business_unit,

    max(so_line) 'so_line'

    into

    #so_tab

    from

    ps_rf_soln assign

    where

    person_id is not null and primary_flag = 'Y'

    group by

    so_id, business_unit

    select distinct

    A.HH_ORIGIN_LOC ,

    A.BUSINESS_UNIT ,

    A.SO_ID ,

    B.NAME ,

    A.HH_COMP_DATE

    from

    ps_rf_so_hdr as a

    join ps_rb_person as b on b.person_id = a.cust_person_id

    join ps_hh_sobill_line as c on c.so_id = a.so_id and c.business_unit = a.business_unit

    join ps_rf_soln_assign as g on g.so_id = a.so_id and g.business_unit = a.business_unit

    left join ps_rb_work_storloc as h on h.person_id = g.person_id and h.storloc_type = '00' and h.business_unit ' '

    left join ps_hh_narda_hist as i on i.so_id = a.so_id and i.hh_narda_status in ('SB', 'RJ', 'RS') and i.hh_narda_num ' ' and i.dttm_stamp = '2002-jan-01' and

    a.hh_comp_date between '2004-04-01' and '2004-05-03' and

    c.hh_bill_to in ('M','E') and

    g.person_id ' ' and

    g.prinmary_flag = 'Y' and

    h.person_id is null and

    i.so_id is null and

    G.SO_LINE = (Select max(E.SO_LINE) from PS_RF_SOLN_ASSIGN E

    where E.BUSINESS_UNIT = A.BUSINESS_UNIT and E.SO_ID = A.SO_ID

    and E.PERSON_ID IS NOT NULL and E.PRIMARY_FLAG = 'Y')

    order by

    a.hh_origin_loc,

    a.so_id

    Regards

    Rob

  • Sorry, the second query should read as :

    select

    so_id,

    business_unit,

    max(so_line) 'so_line'

    into

    #so_tab

    from

    ps_rf_soln assign

    where

    person_id is not null and primary_flag = 'Y'

    group by

    so_id, business_unit

    select distinct

    A.HH_ORIGIN_LOC ,

    A.BUSINESS_UNIT ,

    A.SO_ID ,

    B.NAME ,

    A.HH_COMP_DATE

    from

    ps_rf_so_hdr as a

    join ps_rb_person as b on b.person_id = a.cust_person_id

    join ps_hh_sobill_line as c on c.so_id = a.so_id and c.business_unit = a.business_unit

    join ps_rf_soln_assign as g on g.so_id = a.so_id and g.business_unit = a.business_unit

    left join ps_rb_work_storloc as h on h.person_id = g.person_id and h.storloc_type = '00' and h.business_unit ' '

    left join ps_hh_narda_hist as i on i.so_id = a.so_id and i.hh_narda_status in ('SB', 'RJ', 'RS') and i.hh_narda_num ' ' and i.dttm_stamp = '2002-jan-01' and

    a.hh_comp_date between '2004-04-01' and '2004-05-03' and

    c.hh_bill_to in ('M','E') and

    g.person_id ' ' and

    g.prinmary_flag = 'Y' and

    h.person_id is null and

    i.so_id is null and

    order by

    a.hh_origin_loc,

    a.so_id

    Regards

    Rob

  • Thank you everybody for your response, I am going to read all the comments very carefully and in a methodical manner try all of the above. I will post all results here, including the explain plan etc for each try and the original plan.

    Sincerely,

     

Viewing 10 posts - 1 through 9 (of 9 total)

You must be logged in to reply to this topic. Login to reply