Can I make it efficient????

  • I have a table that has about 60 columns and carries 40 million records.

    I have to create a report summary table out of this table with the requirement that calls for GROUP BY on 30 of these columns and SUM on the 25.

    Some thing like

    select a, b, c, ..30SuchCols......,

    sum (x), sum(y), sum(z)...25SuchCols...

    from BigTable

    Group By a, b, c....30SuchCols......

    It runs for about 10 - 12 hours.. completely un acceptable...

    Is there any trick/method to do this in an alternate faster way?

    I was thinking of breaking the table into a few by some criteria and then running the GROUP BY on those fragments - and later merging the result sets!!! Any other ideas??.

    Thanks....

  • give us more info,

    sounds like the query is big, so

    save the actual query as a text file, and show us the estimated execution plan, saved as a .sqlplan file. attach them to this thread so we can look at them;

    With that information, we should be able to help you diagnose your perforance issues and how to fix them.

    performance questions like this really requires absolute details in order to really help; general answers can't really fix it;

    i've got plenty of million row tables,and I've never had one run for 12 hours; even my whopper queries runs are in minutes at a max, because they have to touch every row.

    you might have something as simple as a triangle or cartesian join, or no indexes; any number of things could be diagnosed instantly with that .sqlplan.

    Lowell


    --help us help you! If you post a question, make sure you include a CREATE TABLE... statement and INSERT INTO... statement into that table to give the volunteers here representative data. with your description of the problem, we can provide a tested, verifiable solution to your question! asking the question the right way gets you a tested answer the fastest way possible!

  • Here is a query that I need to run every day for some reports and this is taking over 2 hours. Can someone help me optimize this one.

    copied below is the query followed by the plan

    Query

    SELECT DROP_DATE,

    PRODUCT,

    OFFER_TYPE,

    OFFER,

    TF_NUM,

    SPR_NBR,

    CAMPAIGN_CODE,

    segment_match,

    segment,

    version_segment,

    version_market,

    version_format_comp,

    version_message_comp,

    version_provider_comp,

    TEST_GROUP_DESC,

    CTL_REGION,

    GM_MARKET,

    CFT_PRS_ACTIVE,

    count(distinct WILSVC) DIST_LEADS,

    count(WILSVC) LEADS,

    count(distinct cv_key_num) ASSIGNED_CALLS,

    sum(ADD_AL_INTENT_CALLS) as ADD_AL_INTENT_CALLS,

    sum(ADD_PKG_INTENT_CALLS) as ADD_PKG_INTENT_CALLS,

    sum(ADD_HSI_INTENT_CALLS) as ADD_HSI_INTENT_CALLS,

    sum(ADD_DISH_INTENT_CALLS) as ADD_DISH_INTENT_CALLS,

    sum(OFFICIAL_HSI_SALE_CALLS) as OFFICIAL_HSI_SALE_CALLS,

    sum(OFFICIAL_DISH_SALE_CALLS) as OFFICIAL_DISH_SALE_CALLS,

    sum(ADD_AL_INTENT_LEADS) as ADD_AL_INTENT_LEADS,

    sum(ADD_PKG_INTENT_LEADS) as ADD_PKG_INTENT_LEADS,

    sum(OFFICIAL_HSI_SALE_LEADS) as OFFICIAL_HSI_SALE_LEADS,

    sum(OFFICIAL_DISH_SALE_LEADS) as OFFICIAL_DISH_SALE_LEADS,

    sum(CFT_ANY_PKG_SALE) as CFT_ANY_PKG_SALE,

    sum(CFT_QUAL_PKG_SALE) as CFT_QUAL_PKG_SALE,

    sum(CFT_HSI_SALE) CFT_HSI_SALE,

    sum(CFT_DISH_SALE) CFT_DISH_SALE,

    sum(CFT_ULD_SALE) CFT_ULD_SALE

    into LEADS_CVA_XREF_TABLE_RPT

    from LEADS_CVA_XREF_ALLDTLS with (index(0))

    group by

    DROP_DATE,

    PRODUCT,

    OFFER_TYPE,

    OFFER,

    TF_NUM,

    SPR_NBR,

    CAMPAIGN_CODE,

    segment_match,

    segment,

    version_segment,

    version_market,

    version_format_comp,

    version_message_comp,

    version_provider_comp,

    TEST_GROUP_DESC,

    CTL_REGION,

    GM_MARKET,

    CFT_PRS_ACTIVE

    --------------------------------

    Plan

    StmtText

    |--Table Insert(OBJECT:([LEADS_CVA_XREF_TABLE_RPT]), SET:([LEADS_CVA_XREF_TABLE_RPT].[CFT_ULD_SALE]=[Expr1019], [LEADS_CVA_XREF_TABLE_RPT].[CFT_DISH_SALE]=[Expr1018], [LEADS_CVA_XREF_TABLE_RPT].[CFT_HSI_SALE]=[Expr1017], [LEADS_CVA_XREF_TABLE_RPT].[CFT_Q

    |--Top(ROWCOUNT est 0)

    |--Compute Scalar(DEFINE:([LEADS_CVA_XREF_ALLDTLS].[DROP_DATE]=[LEADS_CVA_XREF_ALLDTLS].[DROP_DATE], [LEADS_CVA_XREF_ALLDTLS].[PRODUCT]=[LEADS_CVA_XREF_ALLDTLS].[PRODUCT], [LEADS_CVA_XREF_ALLDTLS].[OFFER_TYPE]=[LEADS_CVA_XREF_ALLDTLS].[OFFER_TY

    |--Parallelism(Gather Streams)

    |--Hash Match(Inner Join, HASH:([LEADS_CVA_XREF_ALLDTLS].[DROP_DATE], [LEADS_CVA_XREF_ALLDTLS].[PRODUCT], [LEADS_CVA_XREF_ALLDTLS].[OFFER_TYPE], [LEADS_CVA_XREF_ALLDTLS].[OFFER], [LEADS_CVA_XREF_ALLDTLS].[TF_NUM], [LEADS_CVA_XREF_ALLD

    |--Merge Join(Inner Join, MANY-TO-MANY MERGE:([LEADS_CVA_XREF_ALLDTLS].[DROP_DATE], [LEADS_CVA_XREF_ALLDTLS].[PRODUCT], [LEADS_CVA_XREF_ALLDTLS].[OFFER_TYPE], [LEADS_CVA_XREF_ALLDTLS].[OFFER], [LEADS_CVA_XREF_ALLDTLS].[TF_NUM], [

    | |--Parallelism(Repartition Streams, PARTITION COLUMNS:([LEADS_CVA_XREF_ALLDTLS].[DROP_DATE], [LEADS_CVA_XREF_ALLDTLS].[PRODUCT], [LEADS_CVA_XREF_ALLDTLS].[OFFER_TYPE], [LEADS_CVA_XREF_ALLDTLS].[OFFER], [LEADS_CVA_XREF_ALLDTL

    | | |--Compute Scalar(DEFINE:([LEADS_CVA_XREF_ALLDTLS].[DROP_DATE]=[LEADS_CVA_XREF_ALLDTLS].[DROP_DATE], [LEADS_CVA_XREF_ALLDTLS].[PRODUCT]=[LEADS_CVA_XREF_ALLDTLS].[PRODUCT], [LEADS_CVA_XREF_ALLDTLS].[OFFER_TYPE]=[LEADS_CV

    | | |--Compute Scalar(DEFINE:([Expr1002]=Convert([Expr1148])))

    | | |--Stream Aggregate(GROUP BY:([LEADS_CVA_XREF_ALLDTLS].[DROP_DATE], [LEADS_CVA_XREF_ALLDTLS].[PRODUCT], [LEADS_CVA_XREF_ALLDTLS].[OFFER_TYPE], [LEADS_CVA_XREF_ALLDTLS].[OFFER], [LEADS_CVA_XREF_ALLDTLS].[TF_NUM

    | | |--Parallelism(Repartition Streams, PARTITION COLUMNS:([LEADS_CVA_XREF_ALLDTLS].[DROP_DATE], [LEADS_CVA_XREF_ALLDTLS].[PRODUCT], [LEADS_CVA_XREF_ALLDTLS].[OFFER_TYPE], [LEADS_CVA_XREF_ALLDTLS].[OFFER], [L

    | | |--Sort(DISTINCT ORDER BY:([LEADS_CVA_XREF_ALLDTLS].[DROP_DATE] ASC, [LEADS_CVA_XREF_ALLDTLS].[PRODUCT] ASC, [LEADS_CVA_XREF_ALLDTLS].[OFFER_TYPE] ASC, [LEADS_CVA_XREF_ALLDTLS].[OFFER] ASC, [LEADS_CV

    | | |--Parallelism(Repartition Streams, PARTITION COLUMNS:([LEADS_CVA_XREF_ALLDTLS].[DROP_DATE], [LEADS_CVA_XREF_ALLDTLS].[PRODUCT], [LEADS_CVA_XREF_ALLDTLS].[OFFER_TYPE], [LEADS_CVA_XREF_ALLDTLS].[

    | | |--Table Scan(OBJECT:([SALES_REPORTING].[dbo].[LEADS_CVA_XREF_ALLDTLS]))

    | |--Parallelism(Repartition Streams, PARTITION COLUMNS:([LEADS_CVA_XREF_ALLDTLS].[DROP_DATE], [LEADS_CVA_XREF_ALLDTLS].[PRODUCT], [LEADS_CVA_XREF_ALLDTLS].[OFFER_TYPE], [LEADS_CVA_XREF_ALLDTLS].[OFFER], [LEADS_CVA_XREF_ALLDTL

    | |--Compute Scalar(DEFINE:([LEADS_CVA_XREF_ALLDTLS].[DROP_DATE]=[LEADS_CVA_XREF_ALLDTLS].[DROP_DATE], [LEADS_CVA_XREF_ALLDTLS].[PRODUCT]=[LEADS_CVA_XREF_ALLDTLS].[PRODUCT], [LEADS_CVA_XREF_ALLDTLS].[OFFER_TYPE]=[LEADS_CV

    | |--Compute Scalar(DEFINE:([Expr1004]=Convert([Expr1149])))

    | |--Stream Aggregate(GROUP BY:([LEADS_CVA_XREF_ALLDTLS].[DROP_DATE], [LEADS_CVA_XREF_ALLDTLS].[PRODUCT], [LEADS_CVA_XREF_ALLDTLS].[OFFER_TYPE], [LEADS_CVA_XREF_ALLDTLS].[OFFER], [LEADS_CVA_XREF_ALLDTLS].[TF_NUM

    | |--Parallelism(Repartition Streams, PARTITION COLUMNS:([LEADS_CVA_XREF_ALLDTLS].[DROP_DATE], [LEADS_CVA_XREF_ALLDTLS].[PRODUCT], [LEADS_CVA_XREF_ALLDTLS].[OFFER_TYPE], [LEADS_CVA_XREF_ALLDTLS].[OFFER], [L

    | |--Sort(DISTINCT ORDER BY:([LEADS_CVA_XREF_ALLDTLS].[DROP_DATE] ASC, [LEADS_CVA_XREF_ALLDTLS].[PRODUCT] ASC, [LEADS_CVA_XREF_ALLDTLS].[OFFER_TYPE] ASC, [LEADS_CVA_XREF_ALLDTLS].[OFFER] ASC, [LEADS_CV

    | |--Parallelism(Repartition Streams, PARTITION COLUMNS:([LEADS_CVA_XREF_ALLDTLS].[DROP_DATE], [LEADS_CVA_XREF_ALLDTLS].[PRODUCT], [LEADS_CVA_XREF_ALLDTLS].[OFFER_TYPE], [LEADS_CVA_XREF_ALLDTLS].[

    | |--Table Scan(OBJECT:([SALES_REPORTING].[dbo].[LEADS_CVA_XREF_ALLDTLS]))

    |--Parallelism(Repartition Streams, PARTITION COLUMNS:([LEADS_CVA_XREF_ALLDTLS].[DROP_DATE], [LEADS_CVA_XREF_ALLDTLS].[PRODUCT], [LEADS_CVA_XREF_ALLDTLS].[OFFER_TYPE], [LEADS_CVA_XREF_ALLDTLS].[OFFER], [LEADS_CVA_XREF_ALLDTLS].[T

    |--Compute Scalar(DEFINE:([LEADS_CVA_XREF_ALLDTLS].[DROP_DATE]=[LEADS_CVA_XREF_ALLDTLS].[DROP_DATE], [LEADS_CVA_XREF_ALLDTLS].[PRODUCT]=[LEADS_CVA_XREF_ALLDTLS].[PRODUCT], [LEADS_CVA_XREF_ALLDTLS].[OFFER_TYPE]=[LEADS_CVA_XRE

    |--Compute Scalar(DEFINE:([Expr1003]=Convert([Expr1150]), [Expr1005]=If ([Expr1151]=0) then NULL else [Expr1152], [Expr1006]=If ([Expr1153]=0) then NULL else [Expr1154], [Expr1007]=If ([Expr1155]=0) then NULL else [Expr

    |--Stream Aggregate(GROUP BY:([LEADS_CVA_XREF_ALLDTLS].[DROP_DATE], [LEADS_CVA_XREF_ALLDTLS].[PRODUCT], [LEADS_CVA_XREF_ALLDTLS].[OFFER_TYPE], [LEADS_CVA_XREF_ALLDTLS].[OFFER], [LEADS_CVA_XREF_ALLDTLS].[TF_NUM], [L

    |--Sort(ORDER BY:([LEADS_CVA_XREF_ALLDTLS].[DROP_DATE] ASC, [LEADS_CVA_XREF_ALLDTLS].[PRODUCT] ASC, [LEADS_CVA_XREF_ALLDTLS].[OFFER_TYPE] ASC, [LEADS_CVA_XREF_ALLDTLS].[OFFER] ASC, [LEADS_CVA_XREF_ALLDTLS].[TF

    |--Parallelism(Repartition Streams, PARTITION COLUMNS:([LEADS_CVA_XREF_ALLDTLS].[DROP_DATE], [LEADS_CVA_XREF_ALLDTLS].[PRODUCT], [LEADS_CVA_XREF_ALLDTLS].[OFFER_TYPE], [LEADS_CVA_XREF_ALLDTLS].[OFFER], [L

    |--Table Scan(OBJECT:([SALES_REPORTING].[dbo].[LEADS_CVA_XREF_ALLDTLS]))

  • 1) Buy more CPUs

    2) Buy more RAM

    3) Put TempDB on SSDs (or at least VERY fast IO subsystem), because it is getting CRUSHED by the sorting/grouping

    4) do file IO stall analyis to determine if you need faster IO for main database.

    5) do wait stats analysis to determine if you need to adjust degree of parallelism to reduce CXPACKET waits.

    6) best recommendation: hire a performance tuning professional to assist you. Couple of hours should do it for this one.

    By the way, what is the total size of the data, and what is the hardware you are on??

    Best,
    Kevin G. Boles
    SQL Server Consultant
    SQL MVP 2007-2012
    TheSQLGuru on googles mail service

  • Since you are dumping this data into a report table, why not do the grouping on a smaller level say weekly or monthly and store that data? Then you can report off the semi-pre-aggregated table. You can do this on a daily basis and delete the row for the time period affected as step 1 and only load that data.

    For example every day you reload that week/month/period/year's aggregate data.

    This should help reduce the load and should work as long as you are not making regular changes to older data.

  • Since you don't have any WHERE clause you're pretty much doomed to scan whole table every time the report is called.

    All the millions of rows.

    If your requirement is to get whole history in every report then there's not much you can do to speed it up. Reading data still takes some time in this imperfect world.

    _____________
    Code for TallyGenerator

Viewing 6 posts - 1 through 5 (of 5 total)

You must be logged in to reply to this topic. Login to reply