Can I make it efficient????

Question

Post reply

Can I make it efficient????

SQLMAIN

SSC Eights!

Points: 936
More actions
August 31, 2009 at 10:37 am

#385284

I have a table that has about 60 columns and carries 40 million records.
I have to create a report summary table out of this table with the requirement that calls for GROUP BY on 30 of these columns and SUM on the 25.
Some thing like
select a, b, c, ..30SuchCols......,
sum (x), sum(y), sum(z)...25SuchCols...
from BigTable
Group By a, b, c....30SuchCols......
It runs for about 10 - 12 hours.. completely un acceptable...
Is there any trick/method to do this in an alternate faster way?
I was thinking of breaking the table into a few by some criteria and then running the GROUP BY on those fragments - and later merging the result sets!!! Any other ideas??.
Thanks....

Viewing 6 posts - 1 through 5 (of 5 total)

You must be logged in to reply to this topic. Login to reply

Lowell SSC Guru Points: 323482 More actions · Answer 1

give us more info,

sounds like the query is big, so

save the actual query as a text file, and show us the estimated execution plan, saved as a .sqlplan file. attach them to this thread so we can look at them;

With that information, we should be able to help you diagnose your perforance issues and how to fix them.

performance questions like this really requires absolute details in order to really help; general answers can't really fix it;

i've got plenty of million row tables,and I've never had one run for 12 hours; even my whopper queries runs are in minutes at a max, because they have to touch every row.

you might have something as simple as a triangle or cartesian join, or no indexes; any number of things could be diagnosed instantly with that .sqlplan.

Lowell

--help us help you! If you post a question, make sure you include a CREATE TABLE... statement and INSERT INTO... statement into that table to give the volunteers here representative data. with your description of the problem, we can provide a tested, verifiable solution to your question! asking the question the right way gets you a tested answer the fastest way possible!

SQLMAIN SSC Eights! Points: 936 More actions · Answer 2

Here is a query that I need to run every day for some reports and this is taking over 2 hours. Can someone help me optimize this one.

copied below is the query followed by the plan

Query

SELECT DROP_DATE,

PRODUCT,

OFFER_TYPE,

OFFER,

TF_NUM,

SPR_NBR,

CAMPAIGN_CODE,

segment_match,

segment,

version_segment,

version_market,

version_format_comp,

version_message_comp,

version_provider_comp,

TEST_GROUP_DESC,

CTL_REGION,

GM_MARKET,

CFT_PRS_ACTIVE,

count(distinct WILSVC) DIST_LEADS,

count(WILSVC) LEADS,

count(distinct cv_key_num) ASSIGNED_CALLS,

sum(ADD_AL_INTENT_CALLS) as ADD_AL_INTENT_CALLS,

sum(ADD_PKG_INTENT_CALLS) as ADD_PKG_INTENT_CALLS,

sum(ADD_HSI_INTENT_CALLS) as ADD_HSI_INTENT_CALLS,

sum(ADD_DISH_INTENT_CALLS) as ADD_DISH_INTENT_CALLS,

sum(OFFICIAL_HSI_SALE_CALLS) as OFFICIAL_HSI_SALE_CALLS,

sum(OFFICIAL_DISH_SALE_CALLS) as OFFICIAL_DISH_SALE_CALLS,

sum(ADD_AL_INTENT_LEADS) as ADD_AL_INTENT_LEADS,

sum(ADD_PKG_INTENT_LEADS) as ADD_PKG_INTENT_LEADS,

sum(OFFICIAL_HSI_SALE_LEADS) as OFFICIAL_HSI_SALE_LEADS,

sum(OFFICIAL_DISH_SALE_LEADS) as OFFICIAL_DISH_SALE_LEADS,

sum(CFT_ANY_PKG_SALE) as CFT_ANY_PKG_SALE,

sum(CFT_QUAL_PKG_SALE) as CFT_QUAL_PKG_SALE,

sum(CFT_HSI_SALE) CFT_HSI_SALE,

sum(CFT_DISH_SALE) CFT_DISH_SALE,

sum(CFT_ULD_SALE) CFT_ULD_SALE

into LEADS_CVA_XREF_TABLE_RPT

from LEADS_CVA_XREF_ALLDTLS with (index(0))

group by

DROP_DATE,

PRODUCT,

OFFER_TYPE,

OFFER,

TF_NUM,

SPR_NBR,

CAMPAIGN_CODE,

segment_match,

segment,

version_segment,

version_market,

version_format_comp,

version_message_comp,

version_provider_comp,

TEST_GROUP_DESC,

CTL_REGION,

GM_MARKET,

CFT_PRS_ACTIVE

--------------------------------

Plan

StmtText

|--Table Insert(OBJECT:([LEADS_CVA_XREF_TABLE_RPT]), SET:([LEADS_CVA_XREF_TABLE_RPT].[CFT_ULD_SALE]=[Expr1019], [LEADS_CVA_XREF_TABLE_RPT].[CFT_DISH_SALE]=[Expr1018], [LEADS_CVA_XREF_TABLE_RPT].[CFT_HSI_SALE]=[Expr1017], [LEADS_CVA_XREF_TABLE_RPT].[CFT_Q

|--Top(ROWCOUNT est 0)

|--Compute Scalar(DEFINE:([LEADS_CVA_XREF_ALLDTLS].[DROP_DATE]=[LEADS_CVA_XREF_ALLDTLS].[DROP_DATE], [LEADS_CVA_XREF_ALLDTLS].[PRODUCT]=[LEADS_CVA_XREF_ALLDTLS].[PRODUCT], [LEADS_CVA_XREF_ALLDTLS].[OFFER_TYPE]=[LEADS_CVA_XREF_ALLDTLS].[OFFER_TY

|--Parallelism(Gather Streams)

|--Hash Match(Inner Join, HASH:([LEADS_CVA_XREF_ALLDTLS].[DROP_DATE], [LEADS_CVA_XREF_ALLDTLS].[PRODUCT], [LEADS_CVA_XREF_ALLDTLS].[OFFER_TYPE], [LEADS_CVA_XREF_ALLDTLS].[OFFER], [LEADS_CVA_XREF_ALLDTLS].[TF_NUM], [LEADS_CVA_XREF_ALLD

|--Merge Join(Inner Join, MANY-TO-MANY MERGE:([LEADS_CVA_XREF_ALLDTLS].[DROP_DATE], [LEADS_CVA_XREF_ALLDTLS].[PRODUCT], [LEADS_CVA_XREF_ALLDTLS].[OFFER_TYPE], [LEADS_CVA_XREF_ALLDTLS].[OFFER], [LEADS_CVA_XREF_ALLDTLS].[TF_NUM], [

| |--Parallelism(Repartition Streams, PARTITION COLUMNS:([LEADS_CVA_XREF_ALLDTLS].[DROP_DATE], [LEADS_CVA_XREF_ALLDTLS].[PRODUCT], [LEADS_CVA_XREF_ALLDTLS].[OFFER_TYPE], [LEADS_CVA_XREF_ALLDTLS].[OFFER], [LEADS_CVA_XREF_ALLDTL

| | |--Compute Scalar(DEFINE:([LEADS_CVA_XREF_ALLDTLS].[DROP_DATE]=[LEADS_CVA_XREF_ALLDTLS].[DROP_DATE], [LEADS_CVA_XREF_ALLDTLS].[PRODUCT]=[LEADS_CVA_XREF_ALLDTLS].[PRODUCT], [LEADS_CVA_XREF_ALLDTLS].[OFFER_TYPE]=[LEADS_CV

| | |--Compute Scalar(DEFINE:([Expr1002]=Convert([Expr1148])))

| | |--Stream Aggregate(GROUP BY:([LEADS_CVA_XREF_ALLDTLS].[DROP_DATE], [LEADS_CVA_XREF_ALLDTLS].[PRODUCT], [LEADS_CVA_XREF_ALLDTLS].[OFFER_TYPE], [LEADS_CVA_XREF_ALLDTLS].[OFFER], [LEADS_CVA_XREF_ALLDTLS].[TF_NUM

| | |--Parallelism(Repartition Streams, PARTITION COLUMNS:([LEADS_CVA_XREF_ALLDTLS].[DROP_DATE], [LEADS_CVA_XREF_ALLDTLS].[PRODUCT], [LEADS_CVA_XREF_ALLDTLS].[OFFER_TYPE], [LEADS_CVA_XREF_ALLDTLS].[OFFER], [L

| | |--Sort(DISTINCT ORDER BY:([LEADS_CVA_XREF_ALLDTLS].[DROP_DATE] ASC, [LEADS_CVA_XREF_ALLDTLS].[PRODUCT] ASC, [LEADS_CVA_XREF_ALLDTLS].[OFFER_TYPE] ASC, [LEADS_CVA_XREF_ALLDTLS].[OFFER] ASC, [LEADS_CV

| | |--Parallelism(Repartition Streams, PARTITION COLUMNS:([LEADS_CVA_XREF_ALLDTLS].[DROP_DATE], [LEADS_CVA_XREF_ALLDTLS].[PRODUCT], [LEADS_CVA_XREF_ALLDTLS].[OFFER_TYPE], [LEADS_CVA_XREF_ALLDTLS].[

| | |--Table Scan(OBJECT:([SALES_REPORTING].[dbo].[LEADS_CVA_XREF_ALLDTLS]))

| |--Parallelism(Repartition Streams, PARTITION COLUMNS:([LEADS_CVA_XREF_ALLDTLS].[DROP_DATE], [LEADS_CVA_XREF_ALLDTLS].[PRODUCT], [LEADS_CVA_XREF_ALLDTLS].[OFFER_TYPE], [LEADS_CVA_XREF_ALLDTLS].[OFFER], [LEADS_CVA_XREF_ALLDTL

| |--Compute Scalar(DEFINE:([LEADS_CVA_XREF_ALLDTLS].[DROP_DATE]=[LEADS_CVA_XREF_ALLDTLS].[DROP_DATE], [LEADS_CVA_XREF_ALLDTLS].[PRODUCT]=[LEADS_CVA_XREF_ALLDTLS].[PRODUCT], [LEADS_CVA_XREF_ALLDTLS].[OFFER_TYPE]=[LEADS_CV

| |--Compute Scalar(DEFINE:([Expr1004]=Convert([Expr1149])))

| |--Stream Aggregate(GROUP BY:([LEADS_CVA_XREF_ALLDTLS].[DROP_DATE], [LEADS_CVA_XREF_ALLDTLS].[PRODUCT], [LEADS_CVA_XREF_ALLDTLS].[OFFER_TYPE], [LEADS_CVA_XREF_ALLDTLS].[OFFER], [LEADS_CVA_XREF_ALLDTLS].[TF_NUM

| |--Parallelism(Repartition Streams, PARTITION COLUMNS:([LEADS_CVA_XREF_ALLDTLS].[DROP_DATE], [LEADS_CVA_XREF_ALLDTLS].[PRODUCT], [LEADS_CVA_XREF_ALLDTLS].[OFFER_TYPE], [LEADS_CVA_XREF_ALLDTLS].[OFFER], [L

| |--Sort(DISTINCT ORDER BY:([LEADS_CVA_XREF_ALLDTLS].[DROP_DATE] ASC, [LEADS_CVA_XREF_ALLDTLS].[PRODUCT] ASC, [LEADS_CVA_XREF_ALLDTLS].[OFFER_TYPE] ASC, [LEADS_CVA_XREF_ALLDTLS].[OFFER] ASC, [LEADS_CV

| |--Parallelism(Repartition Streams, PARTITION COLUMNS:([LEADS_CVA_XREF_ALLDTLS].[DROP_DATE], [LEADS_CVA_XREF_ALLDTLS].[PRODUCT], [LEADS_CVA_XREF_ALLDTLS].[OFFER_TYPE], [LEADS_CVA_XREF_ALLDTLS].[

| |--Table Scan(OBJECT:([SALES_REPORTING].[dbo].[LEADS_CVA_XREF_ALLDTLS]))

|--Parallelism(Repartition Streams, PARTITION COLUMNS:([LEADS_CVA_XREF_ALLDTLS].[DROP_DATE], [LEADS_CVA_XREF_ALLDTLS].[PRODUCT], [LEADS_CVA_XREF_ALLDTLS].[OFFER_TYPE], [LEADS_CVA_XREF_ALLDTLS].[OFFER], [LEADS_CVA_XREF_ALLDTLS].[T

|--Compute Scalar(DEFINE:([LEADS_CVA_XREF_ALLDTLS].[DROP_DATE]=[LEADS_CVA_XREF_ALLDTLS].[DROP_DATE], [LEADS_CVA_XREF_ALLDTLS].[PRODUCT]=[LEADS_CVA_XREF_ALLDTLS].[PRODUCT], [LEADS_CVA_XREF_ALLDTLS].[OFFER_TYPE]=[LEADS_CVA_XRE

|--Compute Scalar(DEFINE:([Expr1003]=Convert([Expr1150]), [Expr1005]=If ([Expr1151]=0) then NULL else [Expr1152], [Expr1006]=If ([Expr1153]=0) then NULL else [Expr1154], [Expr1007]=If ([Expr1155]=0) then NULL else [Expr

|--Stream Aggregate(GROUP BY:([LEADS_CVA_XREF_ALLDTLS].[DROP_DATE], [LEADS_CVA_XREF_ALLDTLS].[PRODUCT], [LEADS_CVA_XREF_ALLDTLS].[OFFER_TYPE], [LEADS_CVA_XREF_ALLDTLS].[OFFER], [LEADS_CVA_XREF_ALLDTLS].[TF_NUM], [L

|--Sort(ORDER BY:([LEADS_CVA_XREF_ALLDTLS].[DROP_DATE] ASC, [LEADS_CVA_XREF_ALLDTLS].[PRODUCT] ASC, [LEADS_CVA_XREF_ALLDTLS].[OFFER_TYPE] ASC, [LEADS_CVA_XREF_ALLDTLS].[OFFER] ASC, [LEADS_CVA_XREF_ALLDTLS].[TF

|--Parallelism(Repartition Streams, PARTITION COLUMNS:([LEADS_CVA_XREF_ALLDTLS].[DROP_DATE], [LEADS_CVA_XREF_ALLDTLS].[PRODUCT], [LEADS_CVA_XREF_ALLDTLS].[OFFER_TYPE], [LEADS_CVA_XREF_ALLDTLS].[OFFER], [L

|--Table Scan(OBJECT:([SALES_REPORTING].[dbo].[LEADS_CVA_XREF_ALLDTLS]))

TheSQLGuru SSC Guru Points: 134017 More actions · Answer 3

1) Buy more CPUs

2) Buy more RAM

3) Put TempDB on SSDs (or at least VERY fast IO subsystem), because it is getting CRUSHED by the sorting/grouping

4) do file IO stall analyis to determine if you need faster IO for main database.

5) do wait stats analysis to determine if you need to adjust degree of parallelism to reduce CXPACKET waits.

6) best recommendation: hire a performance tuning professional to assist you. Couple of hours should do it for this one.

By the way, what is the total size of the data, and what is the hardware you are on??

Best,
Kevin G. Boles
SQL Server Consultant
SQL MVP 2007-2012
TheSQLGuru on googles mail service

Jack Corbett SSC Guru Points: 184388 More actions · Answer 4

Since you are dumping this data into a report table, why not do the grouping on a smaller level say weekly or monthly and store that data? Then you can report off the semi-pre-aggregated table. You can do this on a daily basis and delete the row for the time period affected as step 1 and only load that data.

For example every day you reload that week/month/period/year's aggregate data.

This should help reduce the load and should work as long as you are not making regular changes to older data.

Jack Corbett
Consultant - Straight Path Solutions
Check out these links on how to get faster and more accurate answers:
Forum Etiquette: How to post data/code on a forum to get the best help
Need an Answer? Actually, No ... You Need a Question

Sergiy SSC Guru Points: 110192 More actions · Answer 5

Since you don't have any WHERE clause you're pretty much doomed to scan whole table every time the report is called.

All the millions of rows.

If your requirement is to get whole history in every report then there's not much you can do to speed it up. Reading data still takes some time in this imperfect world.

_____________
Code for TallyGenerator