Limit Resources to UGLY query

Question

Limit Resources to UGLY query

Viewing 15 posts - 16 through 30 (of 58 total)

You must be logged in to reply to this topic. Login to reply

TheSQLGuru SSC Guru Points: 134017 More actions · Answer 1

Really sounding like Analysis Services is the way to go here. It is REALLY good at preaggregating/precomputing and it does allow for updates to values too.

Sometimes we really can't make things run faster - there is just too much data. 🙁 In such cases we simply fall back to the "throw more hardware at it" resolution.

Best,
Kevin G. Boles
SQL Server Consultant
SQL MVP 2007-2012
TheSQLGuru on googles mail service

Jeff Moden SSC Guru Points: 1000539 More actions · Answer 2

Hi Bob...

Just confirming... how many rows does the following update?

4) The _Change fields would be the difference between the day before and the current day for the specific type. This can be quite painful.
;WITH X (RN,StoredCalcDataID,AsOfDate,[Weekly_Type1]) AS (
SELECT ROW_NUMBER() OVER (ORDER BY MainSeriesID ,theDate ) AS RN,
MainSeriesID ,theDate ,[Weekly_Type1]
FROM #ComplexFilter
WHERE WEEKLY=1)
UPDATE #TempTable
SET Weekly_Type1_Change =
CASE
WHEN Newer.Weekly_Type1 > Older.Weekly_Type1 THEN 1
WHEN Newer.Weekly_Type1 < Older.Weekly_Type1 THEN -1
ELSE 0 END
FROM #TempTable
INNER JOIN X AS Newer
ON Newer.MainSeriesID = #TempTable.MainSeriesID
AND Newer.theDate = #TempTable.theDate
INNER JOIN X AS Older
ON Older.RN + 1= Newer.RN
AND Older.MainSeriesID = Newer.MainSeriesID

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.
"Change is inevitable... change for the better is not".

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)
Intro to Tally Tables and Functions

Bob Fazio SSChampion Points: 10794 More actions · Answer 3

This part of it can vary widely. It will update 100% of the rows that are extracted to the temp table in most cases. The temp table will hold between 500k (this takes a few seconds at most) to 20M (which will take a few minutes for that update to run).

Also if they are looking at daily, weekly and monthly series, I have to run 3 virutally identical versions of that. Obviously the weekly would be 1 in 5 and the monthly would be 1 in 20. If you would just have weekly and monthly it would be different.

Now to allow me to go back and see that the delta for daily_type1 and weekly_type1 both exceeded 5 for example, I need to backfill in the weekly value for all days.

Jeff Moden SSC Guru Points: 1000539 More actions · Answer 4

Thanks, Bob...

First, updating 500K or even a million rows in a single update probably isn't a problem... 20 million is a problem. If you do some testing, I think you find a million row update will only take about 2.7 minutes (assuming a 6k row per second update). But, if you try 20 million rows, instead of only taking 20 times longer, it can take 200 or even 2000 times longer! Why? Hell, I dunno... all I know is that's what happens in the tests I've done. Some of it is because the LDF may need to grow. Some of it may be because TempDB needs to grow. Some of it might be because the swap file comes into play for updates that big. I don't know for sure though.

Anyway, you need to split the update into more manageable chuncks of a quarter million or so rows.

The other thing is, updates with a join can be a bugger... if you do it wrong, one row gets updated and then the join has to be 100% recalculated, over and over and over. Guess how much time that little gem will take? And, I'm thinking that's what's happened in your code... you're updating columns that appear in the CTE... I just can't help thinking that's a bad idea.

Getting ready to go to a movie with my sweetie... if I think of it, I'll try to write an example of how you might be able to do this so it runs in seconds instead of hours...

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.
"Change is inevitable... change for the better is not".

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)
Intro to Tally Tables and Functions

Steve Jones - SSC Editor SSC Guru Points: 728164 More actions · Answer 5

I've had great luck with batching updates into smaller chunks. One long update might take an hour, but going with quarters (1/4) each time, I might get all 4 to run in 10 minutes.

Follow me on Twitter: http://www.twitter.com/way0utwest
Forum Etiquette: How to post data/code on a forum to get the best help
My Blog: www.voiceofthedba.com

Bob Fazio SSChampion Points: 10794 More actions · Answer 6

I should be able to do some tests. I'll start with something like the following.

SELECT @Ctr = 1

WHILE(@Ctr > 0)

BEGIN

;WITH X (RN,StoredCalcDataID,AsOfDate,[Weekly_Type1]) AS (

SELECT TOP(250000) ROW_NUMBER() OVER (ORDER BY MainSeriesID ,theDate ) AS RN,

MainSeriesID ,theDate ,[Weekly_Type1]

FROM #ComplexFilter

WHERE WEEKLY=1 AND Weekly_Type1_Change IS NULL

)

UPDATE #TempTable

SET Weekly_Type1_Change =

CASE

WHEN Newer.Weekly_Type1 > Older.Weekly_Type1 THEN 1

WHEN Newer.Weekly_Type1 < Older.Weekly_Type1 THEN -1

ELSE 0 END

FROM #TempTable

INNER JOIN X AS Newer

ON Newer.MainSeriesID = #TempTable.MainSeriesID

AND Newer.theDate = #TempTable.theDate

INNER JOIN X AS Older

ON Older.RN + 1= Newer.RN

AND Older.MainSeriesID = Newer.MainSeriesID

SELECT @Ctr = @@ROWCOUNT

END

Bob Fazio SSChampion Points: 10794 More actions · Answer 7

Without indexes this has no hope of performing better. Now in my other tests, the indexes were of no help because they slowed the process down signficantly on the insert, with less improvement than loss. That caused me to remove them. This however, might still work. I'll update as soon as I can get a better test.

Bob Fazio SSChampion Points: 10794 More actions · Answer 8

Formatting in this new site is less than adequate.

I tried to post the table, but that didn't work so well.

Short answer is that Adding a PK to the process for a 3.3M row result set. Slowed the insert by about 103% of No index. Update improved to 97% of the update of all rows at a time....

However; after several runs, it was found that I have enough background load on my dev system to cause swings in performance by as much as an additional 2%.

End result. On average the multiple updates still performed worse than the batch update without indexes.

FYI, the BIG update WITH AN INDEX was 111% the performance of the small update. So it does appear that there is a penalty for big updates if there are indexes, but in this case without worked better.

So Jeff, you were right, but in this case I have the option to NOT use an index, and it did perform better.

Oh by the way. The performance:

Insert of 3.3m rows.

Best run (no index). 270 - seconds

worst run (with index) 283 - seconds

Update

200k chunks - 166 secs

all (no index) - 173 secs

All (with index) - 185 secs

Bob Fazio SSChampion Points: 10794 More actions · Answer 9

Jeff Gray (9/26/2007)
Another thing you might try is to use the READ UNCOMMITTED transaction isolation level. That will reduce overhead to some degree.

I have run throught about 10 runs It looks like this does help. Average appears to be somewere around 6% improvement.

Kurt W. Zimmerman SSCrazy Eights Points: 8676 More actions · Answer 10

I've had to deal with very long running updates and processor intensive stored procedures that has been able to take my dual quad core w/ 8gb to it's knees.

If off-hour processing is not available I've been able to run the an intensive process on a separate box that has a copy of the production database as a hot spare.

I have also written a specially designed process that throws a bunch of jobs on the job queue and divided up the work amongst all of the jobs. I have one such job that spawns 8 separate jobs. I can get 24 hrs of work done in 2. The premise here is to break up the workload into smaller more efficient chunks.

So there are ways to divide & conquer to get it done.

Kurt

DBA

RHWI, Inc

Poughkeepsie, NY

Kurt W. Zimmerman
SR DBA
Lefrak Organization
New York, NY

http://www.linkedin.com/in/kurtwzimmerman

Jeff Moden SSC Guru Points: 1000539 More actions · Answer 11

Kurt W. Zimmerman (10/1/2007)
I've had to deal with very long running updates and processor intensive stored procedures that has been able to take my dual quad core w/ 8gb to it's knees.

My word... Don't you think someone needs to fix that :blink:? Maybe a rewrite? 😉

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.
"Change is inevitable... change for the better is not".

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)
Intro to Tally Tables and Functions

Jeff Moden SSC Guru Points: 1000539 More actions · Answer 12

Bob Fazio (10/1/2007)
Formatting in this new site is less than adequate.

I absolutely agree!

Insert of 3.3m rows.
Best run (no index). 270 - seconds
worst run (with index) 283 - seconds
Update
200k chunks - 166 secs
all (no index) - 173 secs
All (with index) - 185 secs

Not sure what's going on... I've got a single CPU 1.8Ghz box with IDE drives and 2 GB ram with Developer's Edition of SQL Server 2000 SP 4...

I inserted 3.3 million rows into a new table with a clustered PK in only 42 seconds.

I inserted 3.3 million "interleaved" rows (according to PK) in 156 seconds.

I inserted 3.3 non-interleaved rows in 72 seconds.

Something else is (maybe dreadfully) wrong but I just don't have SQL Server 2k5 to figure out what. It may be that the datatypes for the columns in the various tables/CTE aren't the same as they are in the target tables... perhaps there are some triggers on the tables (if they're not temp tables, of course)... perhaps you have missing or damaged stats... perhaps the indexes are fragged... perhaps, etc... dunno for sure.

Also, I updated 3.3 million rows (single batch, NOT the primary key) in only 26 seconds. Yeah, it was only a single column, but still... something else is going on and I'm just not sure what it is...

Here's the code I tested with (in the absence of your actual data)...

--===== Create and populate a 3,333,333 row test table.

-- Column RowNum has a range of 1 to 1,000,000 unique numbers

-- Column "SomeInt" has a range of 1 to 50,000 non-unique numbers

-- Column "SomeString" has a range of "AA" to "ZZ" non-unique 2 character strings

-- Column "SomeNumber has a range of 0.0000 to 99.9999 non-unique numbers

-- Column "SomeDate" has a range of >=01/01/2000 and <01/01/2010 non-unique date/times

-- Takes a bit over 3 minutes to build this initial talbe

SELECT TOP 3333333

RowNum = IDENTITY(INT,0,10),

SomeInt = CAST(RAND(CAST(NEWID() AS VARBINARY))*50000+1 AS INT),

SomeString = CHAR(STR(RAND(CAST(NEWID() AS VARBINARY))*25+65))

+ CHAR(STR(RAND(CAST(NEWID() AS VARBINARY))*25+65)),

SomeCSV = CAST('Part01,Part02,Part03,Part04,Part05,Part06,Part07,Part08,Part09,Part10' AS VARCHAR(80)),

SomeNumber = CAST(RAND(CAST(NEWID() AS VARBINARY))*100 AS MONEY),

SomeDate = CAST(RAND(CAST(NEWID() AS VARBINARY))*3653.0+36524.0 AS DATETIME)

INTO dbo.JBMTest

FROM Master.dbo.SysColumns sc1,

Master.dbo.SysColumns sc2 --Lack of join criteria makes this a CROSS-JOIN

--===== A table is not properly formed unless a Primary Key has been assigned

ALTER TABLE dbo.JBMTest

ADD PRIMARY KEY CLUSTERED (RowNum)

GO

--===== Create a "target" table with a Clustered PK same as the source table

CREATE TABLE [dbo].[JBMTest1] (

[RowNum] [int] NOT NULL,

[SomeInt] [int] NULL ,

[SomeString] [char] (2) NULL ,

[SomeCSV] [varchar] (80) NULL ,

[SomeNumber] [money] NULL ,

[SomeDate] [datetime] NULL ,

PRIMARY KEY CLUSTERED

(

[RowNum]

)

GO

--===== Straight insert, no data in table

INSERT INTO jbmTest1 --(50 seconds)

(RowNum, SomeInt, SomeString, SomeCSV, SomeNumber, SomeDate)

SELECT RowNum, SomeInt, SomeString, SomeCSV, SomeNumber, SomeDate

FROM jbmTest

--===== Insert Data 100% interleaved in clustered index

INSERT INTO jbmTest1 --(156 seconds)

(RowNum, SomeInt, SomeString, SomeCSV, SomeNumber, SomeDate)

SELECT RowNum+5, SomeInt, SomeString, SomeCSV, SomeNumber, SomeDate

FROM jbmTest

--===== Insert data into the "end" of the table (non-interleaved)

INSERT INTO jbmTest1 --(72 seconds)

(RowNum, SomeInt, SomeString, SomeCSV, SomeNumber, SomeDate)

SELECT RowNum+33333326, SomeInt, SomeString, SomeCSV, SomeNumber, SomeDate

FROM jbmTest

--===== Table size is almost 10 million rows, now. Update 3.3 million of them

-- (26 seconds)

UPDATE jbmTest1

SET SomeDate = 0

WHERE RowNum <= 33333326/2

Like I said, previously, dunno for sure what the problem is and can't check because I don't have 2k5... but now you have some code you can check your box with.

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.
"Change is inevitable... change for the better is not".

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)
Intro to Tally Tables and Functions

Ninja's_RGR'us SSC Guru Points: 294069 More actions · Answer 13

My results on Express SP2

INSERT INTO jbmTest1 --(120 seconds)

(RowNum, SomeInt, SomeString, SomeCSV, SomeNumber, SomeDate)

SELECT RowNum, SomeInt, SomeString, SomeCSV, SomeNumber, SomeDate

FROM jbmTest

-- secs

--===== Insert Data 100% interleaved in clustered index

INSERT INTO jbmTest1 --(79 seconds)

(RowNum, SomeInt, SomeString, SomeCSV, SomeNumber, SomeDate)

SELECT RowNum+5, SomeInt, SomeString, SomeCSV, SomeNumber, SomeDate

FROM jbmTest

--===== Insert data into the "end" of the table (non-interleaved)

INSERT INTO jbmTest1 --(129 seconds)

(RowNum, SomeInt, SomeString, SomeCSV, SomeNumber, SomeDate)

SELECT RowNum+33333326, SomeInt, SomeString, SomeCSV, SomeNumber, SomeDate

FROM jbmTest

--===== Table size is almost 10 million rows, now. Update 3.3 million of them

-- (90 seconds)

UPDATE jbmTest1

SET SomeDate = 0

WHERE RowNum <= 33333326/2

Bob Fazio SSChampion Points: 10794 More actions · Answer 14

When I get some time tomorrow, I'll try your tests. But this goes back to the initial point of this post.

The insert is the pivot of the 1BILLION row table (actually it is a union of 2 250Million row tables and a view that is over a 500Million row table).... So ignore the insert total time. Most of it is from the query, not the insert. The index does impact the insert performance though.

As for the update. The example I gave, and the one you tested was for 1 column. The test run I gave was actually updating 4 columns in one pass. Type1,2,3 and 4... using case statements.

Bob Fazio SSChampion Points: 10794 More actions · Answer 15

Jeff,

The update I am doing is calculating the delta from row N-1 to row N. And storing that as an INT (1 for rising, 0 for same, -1 for falling) That is why this is hard/painful.