Limit Resources to UGLY query

  • Really sounding like Analysis Services is the way to go here. It is REALLY good at preaggregating/precomputing and it does allow for updates to values too.

    Sometimes we really can't make things run faster - there is just too much data. 🙁 In such cases we simply fall back to the "throw more hardware at it" resolution.

    Best,
    Kevin G. Boles
    SQL Server Consultant
    SQL MVP 2007-2012
    TheSQLGuru on googles mail service

  • Hi Bob...

    Just confirming... how many rows does the following update?

    4) The _Change fields would be the difference between the day before and the current day for the specific type. This can be quite painful.

    ;WITH X (RN,StoredCalcDataID,AsOfDate,[Weekly_Type1]) AS (

    SELECT ROW_NUMBER() OVER (ORDER BY MainSeriesID ,theDate ) AS RN,

    MainSeriesID ,theDate ,[Weekly_Type1]

    FROM #ComplexFilter

    WHERE WEEKLY=1)

    UPDATE #TempTable

    SET Weekly_Type1_Change =

    CASE

    WHEN Newer.Weekly_Type1 > Older.Weekly_Type1 THEN 1

    WHEN Newer.Weekly_Type1 < Older.Weekly_Type1 THEN -1

    ELSE 0 END

    FROM #TempTable

    INNER JOIN X AS Newer

    ON Newer.MainSeriesID = #TempTable.MainSeriesID

    AND Newer.theDate = #TempTable.theDate

    INNER JOIN X AS Older

    ON Older.RN + 1= Newer.RN

    AND Older.MainSeriesID = Newer.MainSeriesID

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.
    "Change is inevitable... change for the better is not".

    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)
    Intro to Tally Tables and Functions

  • This part of it can vary widely. It will update 100% of the rows that are extracted to the temp table in most cases. The temp table will hold between 500k (this takes a few seconds at most) to 20M (which will take a few minutes for that update to run).

    Also if they are looking at daily, weekly and monthly series, I have to run 3 virutally identical versions of that. Obviously the weekly would be 1 in 5 and the monthly would be 1 in 20. If you would just have weekly and monthly it would be different.

    Now to allow me to go back and see that the delta for daily_type1 and weekly_type1 both exceeded 5 for example, I need to backfill in the weekly value for all days.

  • Thanks, Bob...

    First, updating 500K or even a million rows in a single update probably isn't a problem... 20 million is a problem. If you do some testing, I think you find a million row update will only take about 2.7 minutes (assuming a 6k row per second update). But, if you try 20 million rows, instead of only taking 20 times longer, it can take 200 or even 2000 times longer! Why? Hell, I dunno... all I know is that's what happens in the tests I've done. Some of it is because the LDF may need to grow. Some of it may be because TempDB needs to grow. Some of it might be because the swap file comes into play for updates that big. I don't know for sure though.

    Anyway, you need to split the update into more manageable chuncks of a quarter million or so rows.

    The other thing is, updates with a join can be a bugger... if you do it wrong, one row gets updated and then the join has to be 100% recalculated, over and over and over. Guess how much time that little gem will take? And, I'm thinking that's what's happened in your code... you're updating columns that appear in the CTE... I just can't help thinking that's a bad idea.

    Getting ready to go to a movie with my sweetie... if I think of it, I'll try to write an example of how you might be able to do this so it runs in seconds instead of hours...

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.
    "Change is inevitable... change for the better is not".

    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)
    Intro to Tally Tables and Functions

  • I've had great luck with batching updates into smaller chunks. One long update might take an hour, but going with quarters (1/4) each time, I might get all 4 to run in 10 minutes.

  • I should be able to do some tests. I'll start with something like the following.

    SELECT @Ctr = 1

    WHILE(@Ctr > 0)

    BEGIN

    ;WITH X (RN,StoredCalcDataID,AsOfDate,[Weekly_Type1]) AS (

    SELECT TOP(250000) ROW_NUMBER() OVER (ORDER BY MainSeriesID ,theDate ) AS RN,

    MainSeriesID ,theDate ,[Weekly_Type1]

    FROM #ComplexFilter

    WHERE WEEKLY=1 AND Weekly_Type1_Change IS NULL

    )

    UPDATE #TempTable

    SET Weekly_Type1_Change =

    CASE

    WHEN Newer.Weekly_Type1 > Older.Weekly_Type1 THEN 1

    WHEN Newer.Weekly_Type1 < Older.Weekly_Type1 THEN -1

    ELSE 0 END

    FROM #TempTable

    INNER JOIN X AS Newer

    ON Newer.MainSeriesID = #TempTable.MainSeriesID

    AND Newer.theDate = #TempTable.theDate

    INNER JOIN X AS Older

    ON Older.RN + 1= Newer.RN

    AND Older.MainSeriesID = Newer.MainSeriesID

    SELECT @Ctr = @@ROWCOUNT

    END

  • Without indexes this has no hope of performing better. Now in my other tests, the indexes were of no help because they slowed the process down signficantly on the insert, with less improvement than loss. That caused me to remove them. This however, might still work. I'll update as soon as I can get a better test.

  • Formatting in this new site is less than adequate.

    I tried to post the table, but that didn't work so well.

    Short answer is that Adding a PK to the process for a 3.3M row result set. Slowed the insert by about 103% of No index. Update improved to 97% of the update of all rows at a time....

    However; after several runs, it was found that I have enough background load on my dev system to cause swings in performance by as much as an additional 2%.

    End result. On average the multiple updates still performed worse than the batch update without indexes.

    FYI, the BIG update WITH AN INDEX was 111% the performance of the small update. So it does appear that there is a penalty for big updates if there are indexes, but in this case without worked better.

    So Jeff, you were right, but in this case I have the option to NOT use an index, and it did perform better.

    Oh by the way. The performance:

    Insert of 3.3m rows.

    Best run (no index). 270 - seconds

    worst run (with index) 283 - seconds

    Update

    200k chunks - 166 secs

    all (no index) - 173 secs

    All (with index) - 185 secs

  • Jeff Gray (9/26/2007)


    Another thing you might try is to use the READ UNCOMMITTED transaction isolation level. That will reduce overhead to some degree.

    I have run throught about 10 runs It looks like this does help. Average appears to be somewere around 6% improvement.

  • I've had to deal with very long running updates and processor intensive stored procedures that has been able to take my dual quad core w/ 8gb to it's knees.

    If off-hour processing is not available I've been able to run the an intensive process on a separate box that has a copy of the production database as a hot spare.

    I have also written a specially designed process that throws a bunch of jobs on the job queue and divided up the work amongst all of the jobs. I have one such job that spawns 8 separate jobs. I can get 24 hrs of work done in 2. The premise here is to break up the workload into smaller more efficient chunks.

    So there are ways to divide & conquer to get it done.

    Kurt

    DBA

    RHWI, Inc

    Poughkeepsie, NY

    Kurt W. Zimmerman
    SR DBA
    Lefrak Organization
    New York, NY

    http://www.linkedin.com/in/kurtwzimmerman

  • Kurt W. Zimmerman (10/1/2007)


    I've had to deal with very long running updates and processor intensive stored procedures that has been able to take my dual quad core w/ 8gb to it's knees.

    My word... Don't you think someone needs to fix that :blink:? Maybe a rewrite? 😉

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.
    "Change is inevitable... change for the better is not".

    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)
    Intro to Tally Tables and Functions

  • Bob Fazio (10/1/2007)


    Formatting in this new site is less than adequate.

    I absolutely agree!

    Insert of 3.3m rows.

    Best run (no index). 270 - seconds

    worst run (with index) 283 - seconds

    Update

    200k chunks - 166 secs

    all (no index) - 173 secs

    All (with index) - 185 secs

    Not sure what's going on... I've got a single CPU 1.8Ghz box with IDE drives and 2 GB ram with Developer's Edition of SQL Server 2000 SP 4...

    I inserted 3.3 million rows into a new table with a clustered PK in only 42 seconds.

    I inserted 3.3 million "interleaved" rows (according to PK) in 156 seconds.

    I inserted 3.3 non-interleaved rows in 72 seconds.

    Something else is (maybe dreadfully) wrong but I just don't have SQL Server 2k5 to figure out what. It may be that the datatypes for the columns in the various tables/CTE aren't the same as they are in the target tables... perhaps there are some triggers on the tables (if they're not temp tables, of course)... perhaps you have missing or damaged stats... perhaps the indexes are fragged... perhaps, etc... dunno for sure.

    Also, I updated 3.3 million rows (single batch, NOT the primary key) in only 26 seconds. Yeah, it was only a single column, but still... something else is going on and I'm just not sure what it is...

    Here's the code I tested with (in the absence of your actual data)...

    --===== Create and populate a 3,333,333 row test table.

    -- Column RowNum has a range of 1 to 1,000,000 unique numbers

    -- Column "SomeInt" has a range of 1 to 50,000 non-unique numbers

    -- Column "SomeString" has a range of "AA" to "ZZ" non-unique 2 character strings

    -- Column "SomeNumber has a range of 0.0000 to 99.9999 non-unique numbers

    -- Column "SomeDate" has a range of >=01/01/2000 and <01/01/2010 non-unique date/times

    -- Takes a bit over 3 minutes to build this initial talbe

    SELECT TOP 3333333

    RowNum = IDENTITY(INT,0,10),

    SomeInt = CAST(RAND(CAST(NEWID() AS VARBINARY))*50000+1 AS INT),

    SomeString = CHAR(STR(RAND(CAST(NEWID() AS VARBINARY))*25+65))

    + CHAR(STR(RAND(CAST(NEWID() AS VARBINARY))*25+65)),

    SomeCSV = CAST('Part01,Part02,Part03,Part04,Part05,Part06,Part07,Part08,Part09,Part10' AS VARCHAR(80)),

    SomeNumber = CAST(RAND(CAST(NEWID() AS VARBINARY))*100 AS MONEY),

    SomeDate = CAST(RAND(CAST(NEWID() AS VARBINARY))*3653.0+36524.0 AS DATETIME)

    INTO dbo.JBMTest

    FROM Master.dbo.SysColumns sc1,

    Master.dbo.SysColumns sc2 --Lack of join criteria makes this a CROSS-JOIN

    --===== A table is not properly formed unless a Primary Key has been assigned

    ALTER TABLE dbo.JBMTest

    ADD PRIMARY KEY CLUSTERED (RowNum)

    GO

    --===== Create a "target" table with a Clustered PK same as the source table

    CREATE TABLE [dbo].[JBMTest1] (

    [RowNum] [int] NOT NULL,

    [SomeInt] [int] NULL ,

    [SomeString] [char] (2) NULL ,

    [SomeCSV] [varchar] (80) NULL ,

    [SomeNumber] [money] NULL ,

    [SomeDate] [datetime] NULL ,

    PRIMARY KEY CLUSTERED

    (

    [RowNum]

    )

    )

    GO

    --===== Straight insert, no data in table

    INSERT INTO jbmTest1 --(50 seconds)

    (RowNum, SomeInt, SomeString, SomeCSV, SomeNumber, SomeDate)

    SELECT RowNum, SomeInt, SomeString, SomeCSV, SomeNumber, SomeDate

    FROM jbmTest

    --===== Insert Data 100% interleaved in clustered index

    INSERT INTO jbmTest1 --(156 seconds)

    (RowNum, SomeInt, SomeString, SomeCSV, SomeNumber, SomeDate)

    SELECT RowNum+5, SomeInt, SomeString, SomeCSV, SomeNumber, SomeDate

    FROM jbmTest

    --===== Insert data into the "end" of the table (non-interleaved)

    INSERT INTO jbmTest1 --(72 seconds)

    (RowNum, SomeInt, SomeString, SomeCSV, SomeNumber, SomeDate)

    SELECT RowNum+33333326, SomeInt, SomeString, SomeCSV, SomeNumber, SomeDate

    FROM jbmTest

    --===== Table size is almost 10 million rows, now. Update 3.3 million of them

    -- (26 seconds)

    UPDATE jbmTest1

    SET SomeDate = 0

    WHERE RowNum <= 33333326/2

    Like I said, previously, dunno for sure what the problem is and can't check because I don't have 2k5... but now you have some code you can check your box with.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.
    "Change is inevitable... change for the better is not".

    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)
    Intro to Tally Tables and Functions

  • My results on Express SP2

    INSERT INTO jbmTest1 --(120 seconds)

    (RowNum, SomeInt, SomeString, SomeCSV, SomeNumber, SomeDate)

    SELECT RowNum, SomeInt, SomeString, SomeCSV, SomeNumber, SomeDate

    FROM jbmTest

    -- secs

    --===== Insert Data 100% interleaved in clustered index

    INSERT INTO jbmTest1 --(79 seconds)

    (RowNum, SomeInt, SomeString, SomeCSV, SomeNumber, SomeDate)

    SELECT RowNum+5, SomeInt, SomeString, SomeCSV, SomeNumber, SomeDate

    FROM jbmTest

    --===== Insert data into the "end" of the table (non-interleaved)

    INSERT INTO jbmTest1 --(129 seconds)

    (RowNum, SomeInt, SomeString, SomeCSV, SomeNumber, SomeDate)

    SELECT RowNum+33333326, SomeInt, SomeString, SomeCSV, SomeNumber, SomeDate

    FROM jbmTest

    --===== Table size is almost 10 million rows, now. Update 3.3 million of them

    -- (90 seconds)

    UPDATE jbmTest1

    SET SomeDate = 0

    WHERE RowNum <= 33333326/2

  • When I get some time tomorrow, I'll try your tests. But this goes back to the initial point of this post.

    The insert is the pivot of the 1BILLION row table (actually it is a union of 2 250Million row tables and a view that is over a 500Million row table).... So ignore the insert total time. Most of it is from the query, not the insert. The index does impact the insert performance though.

    As for the update. The example I gave, and the one you tested was for 1 column. The test run I gave was actually updating 4 columns in one pass. Type1,2,3 and 4... using case statements.

  • Jeff,

    The update I am doing is calculating the delta from row N-1 to row N. And storing that as an INT (1 for rising, 0 for same, -1 for falling) That is why this is hard/painful.

Viewing 15 posts - 16 through 30 (of 58 total)

You must be logged in to reply to this topic. Login to reply