Multiple updates on the same column

  • I need to replace some values in a column (varchar(max)) and I want to avoid cursors.

    I have a table which has an identifier and a text, and then another one which has the ID so it can be linked to the first table and a pair of old values and new values.

    I need to replace in the text column of the first table all the old values with the new values.

    This is easy using a cursor, but I need to avoid it and I'm trying to do it in a single update.

    IF OBJECT_ID('TEMPDB..#FINAL') IS NOT NULL

    DROP TABLE #FINAL

    IF OBJECT_ID('TEMPDB..#VAL') IS NOT NULL

    DROP TABLE #VAL

    GO

    CREATE TABLE #FINAL (

    ID INT NOT NULL,

    VAL VARCHAR(MAX) NULL)

    CREATE TABLE #VAL (

    ID INT NOT NULL,

    OLD_VAL VARCHAR(30) NOT NULL,

    NEW_VAL VARCHAR(30) NULL

    )

    CREATE UNIQUE CLUSTERED INDEX #IX_FINAL ON #FINAL(ID)

    CREATE UNIQUE CLUSTERED INDEX #IX_VAL ON #VAL(ID, OLD_VAL)

    INSERT INTO #FINAL VALUES(1, 'Some text AAA Another text BBB some more text')

    INSERT INTO #FINAL VALUES(2, 'Some text CCC Another text DDD some more text')

    INSERT INTO #VAL VALUES(1, 'AAA', 'NewAAA')

    INSERT INTO #VAL VALUES(1, 'BBB', 'NewBBB')

    INSERT INTO #VAL VALUES(2, 'CCC', 'NewCCC')

    INSERT INTO #VAL VALUES(2, 'DDD', 'NewDDD')

    DROP TABLE #FINAL

    DROP TABLE #VAL

    The expected result would be

    ID VAL

    1 Some text NewAAA Some Text NewBBB

    2 Some text NewCCC Some Text NewDDD

    Of course, my first attempt was a failure πŸ™‚

    UPDATE F SET F.VAL=REPLACE(F.VAL, V.OLD_VAL, V.NEW_VAL)

    FROM #VAL V WITH (INDEX(#IX_VAL), TABLOCKX)

    , #FINAL F WITH (INDEX(#IX_FINAL), TABLOCKX)

    WHERE F.ID=V.ID

    produced (as expected)

    ID VAL

    1 Some text NewAAA Some Text BBB

    2 Some text NewCCC Some Text DDD

    Then I tried

    DECLARE @val VARCHAR(MAX), @I INT

    SET @I=0

    SET @val=''

    UPDATE F SET @val=F.VAL=REPLACE(CASE WHEN V.ID=@I THEN @val ELSE F.VAL END, V.OLD_VAL, V.NEW_VAL)

    , @I=V.ID

    FROM #VAL V

    , #FINAL F

    WHERE F.ID=V.ID

    But the result was the same.

    Looking at the Execution plan I see

    1. a Clustered Index Scan on #VAL which returns 4 rows - so far so good πŸ™‚

    2. a Stream Aggregate (Aggregate) operation which groups the 4 returned rows by ID, and I end up with only 2 rows.

    Any suggestions on how can I avoid that Stream Aggregate, or any other grouping operation?

    Thank you,

    Daniel

  • Hi Daniel

    In your sample data, you have two rows in table #VAL for each row in your update target table. Is this always the case? What's the maximum number of rows in #VAL sharing the same ID?

    β€œWrite the query the simplest way. If through testing it becomes clear that the performance is inadequate, consider alternative query forms.” - Gail Shaw

    For fast, accurate and documented assistance in answering your questions, please read this article.
    Understanding and using APPLY, (I) and (II) Paul White
    Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden

  • Yes, for the same ID there can be multiple pairs of old value/new value.

    As for the maximum number of pairs, ... usually there are 3 or 4 pairs.

  • Here's why I asked - a simple and fast way to get your results:

    ;WITH Sourcedata AS (

    SELECT SubID = ROW_NUMBER() OVER (PARTITION BY ID ORDER BY OLD_VAL),

    ID, OLD_VAL, NEW_VAL

    FROM #VAL

    )

    SELECT t.*, s1.*, s2.*,

    AmendedTarget = REPLACE(REPLACE(T.VAL, s1.OLD_VAL, s1.NEW_VAL), s2.OLD_VAL, s2.NEW_VAL)

    FROM #FINAL t

    INNER JOIN Sourcedata s1 ON s1.ID = t.ID AND s1.SubID = 1

    LEFT JOIN Sourcedata s2 ON s2.ID = t.ID AND s2.SubID = 2

    -- account for more rows per ID

    This will work fine with only three or four rows of changes per target.

    β€œWrite the query the simplest way. If through testing it becomes clear that the performance is inadequate, consider alternative query forms.” - Gail Shaw

    For fast, accurate and documented assistance in answering your questions, please read this article.
    Understanding and using APPLY, (I) and (II) Paul White
    Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden

  • Chris, I see what you mean.

    I'm not sure it works because the key word in my last post was usually. πŸ™‚

    There can always be more than 4 pairs for the same ID and the cod will become really ugly.

    But, this points me to another direction. Not quite one update, but at least it does not have cursors. And yes, I know it's a RBAR, but this is what I'm looking for

    -- Get them all into one table

    SELECT V.ID, V.OLD_VAL, V.NEW_VAL, F.VAL

    INTO #TEMP

    FROM #VAL V

    JOIN #FINAL F ON F.ID=V.ID

    ORDER BY V.ID, V.OLD_VAL -- order is important

    DECLARE @val VARCHAR(MAX), @ID INT, @OLD_VAL VARCHAR(30)

    SET @ID=0

    SET @val=''

    --CREATE CLUSTERED INDEX #IX_TEMP ON TEMP(ID, OLD_VAL) -- not sure if this is really needed

    /*the idea is to update column VAL on each row with the new value and carry the

    updated text to the next row, something like a running total.

    Thanks to Jeff Moden http://qa.sqlservercentral.com/articles/T-SQL/68467/

    */

    UPDATE T SET

    @val=T.VAL = REPLACE(CASE WHEN @ID<>ID THEN T.VAL ELSE @val END, T.OLD_VAL, T.NEW_VAL)

    , @ID=ID --, @OLD_VAL= OLD_VAL

    FROM #TEMP T WITH (INDEX(0), TABLOCKX) --index(0) forces a table scan

    -- now Temp table has the desired text in the row with OLD_VAL = max(OLD_VAL)

    UPDATE F SET F.VAL=T.VAL

    FROM #FINAL F

    JOIN (SELECT ID, MAX(OLD_VAL) AS OLD_VAL FROM #TEMP GROUP BY ID) X ON F.ID=X.ID

    JOIN #TEMP T ON F.ID=T.ID AND X.OLD_VAL=T.OLD_VAL

    And now, if anybody can improve this, I'll be happy to try it.

  • I would go at this from another direction:

    Dynamically generate an UPDATE statement that does all the REPLACEs (or one UPDATE for each set of rows if there are variations for different blocks of rows).

    For example:

    DECLARE @sql1 varchar(max)

    DECLARE @sql2 varchar(max)

    DECLARE @newValCount int

    SELECT @newValCount = COUNT(*)

    FROM #VAL

    SET @sql1 = 'UPDATE #FINAL SET VAL = ' +

    REPLICATE('REPLACE(', @newValCount) + 'val'

    SET @sql2 = ''

    --you can replace this with a cursor if you're worried about errors w/ this method

    SELECT @sql2 = @sql2 +

    ',''' + old_val + ''', ''' + new_val + ''')'

    FROM #VAL

    ORDER BY ID

    PRINT @sql1 + @sql2

    EXEC(@sql1 + @sql2)

    SELECT *

    FROM #FINAL

    Scott Pletcher, SQL Server MVP 2008-2010

Viewing 6 posts - 1 through 5 (of 5 total)

You must be logged in to reply to this topic. Login to reply