Unexpected consequence of update from multiple rows

  • I was surprised recently when a query that updates a record from a recordset with multiple rows succeeded without giving an error and seemed to arbitrarily select the row used in the update.

    If you attempt to update a record from a subquery expression that returns multiple rows, you get an error, as you would expect. When the multiple rows are from a joined subquery however (using the FROM clause of the update statement), the update succeeds and one of the rows is used for the update.

    My expectation was that both updates would fail. I'm pretty certain that some other database engines would fail in both situations (e.g. Ingres?). Does anyone have an explanation for why the update succeeds and which row gets used for the update?

    Here is some code to demonstrate:

    -- Create tables.

    if exists (select name from [dbo].sysobjects

    where name = 'temp_account' and type = 'U')

    drop table temp_account

    if exists (select name from [dbo].sysobjects

    where name = 'temp_transaction' and type = 'U')

    drop table temp_transaction

    create table temp_account

    (account_key varchar(10) not null,

    account_name varchar(40) null,

    balance01 money null,

    balance02 money null)

    create table temp_transaction

    (tid integer identity (1,1) not null,

    account_key varchar(10) not null,

    trans_type varchar(1) not null,

    trans_amount money not null)

    -- Test data.

    insert into temp_account

    (account_key,

    account_name,

    balance01,

    balance02)

    values ('TEST001','Test Account 1',0,0)

    insert into temp_account

    (account_key,

    account_name,

    balance01,

    balance02)

    values ('TEST002','Test Account 2',0,0)

    insert into temp_transaction

    (account_key,

    trans_type,

    trans_amount

    )

    values ('TEST001','P',120)

    insert into temp_transaction

    (account_key,

    trans_type,

    trans_amount

    )

    values ('TEST001','P',222)

    insert into temp_transaction

    (account_key,

    trans_type,

    trans_amount

    )

    values ('TEST001','R',-40)

    -- Update single record from multiple rows - succeeds and arbitrarily selects row to use in update.

    -- Balance is set to 342 which is sum for P transaction type. R transactions are ignored.

    -- Or is balance first updated by R transactions, then P (row-by-row)?

    update temp_account

    set balance01 = ta.balance01 + rs1.trans_amount

    from temp_account as ta inner join

    (select tt.account_key,

    tt.trans_type,

    sum(tt.trans_amount) as trans_amount

    from temp_transaction as tt

    where tt.trans_type in ('P','R')

    group by tt.account_key, tt.trans_type

    ) as rs1

    on ta.account_key = rs1.account_key

    select *

    from temp_account

    -- Update from multiple rows - fails with error message, as expected.

    -- Msg 512, Level 16, State 1, Line 1

    -- Subquery returned more than 1 value. This is not permitted when the subquery follows =, !=, <, <= , >, >= or when the subquery is used as an expression.

    update temp_account

    set balance02 = (select sum(tt.trans_amount) as trans_amount

    from temp_transaction as tt

    where tt.trans_type in ('P','R')

    and tt.account_key = ta.account_key

    group by tt.trans_type)

    from temp_account as ta

    select *

    from temp_account

  • The problem is in your code. Please note the changes I have made in both versions.

    update temp_account set

    balance01 = ta.balance01 + rs1.trans_amount

    from

    temp_account as ta inner join

    (select

    tt.account_key,

    -- tt.trans_type,

    sum(tt.trans_amount) as trans_amount

    from

    temp_transaction as tt

    where

    tt.trans_type in ('P','R')

    group by

    tt.account_key

    ) as rs1

    on ta.account_key = rs1.account_key

    select *

    from temp_account

    update temp_account set

    balance02 = isnull(

    ( select

    sum(tt.trans_amount) as trans_amount

    from

    temp_transaction as tt

    where

    tt.trans_type in ('P','R')

    and tt.account_key = ta.account_key

    group by

    tt.account_key), 0)

    from

    temp_account as ta

    select *

    from temp_account

  • If you attempt to update a record from a subquery expression that returns multiple rows, you get an error, as you would expect.

    The situation you are describing here is one in which the subquery is being used as an expression. That is to say it is being used in a context where only a single value is expected to be returned, as if from another column, a function, a formula, a CASE expression, etc.

    UPDATE OrderLog

    SET OrderTotal = (select sum(amt) from OrderDetail where OrderID = 1075)

    -- not -- SET OrderTotal = (select sum(amt), count(*) from OrderDetail where OrderID = 1075)

    -- not -- SET OrderTotal = (select amt from OrderDetail where OrderID = 1075)

    -- (in situations where any orders would have multiple amts)

    WHERE OrderID = 1075

    In this context, receiving multiple values from an expression is unacceptable, and the error results. An error would result if it were multiple columns being returned as well as multiple rows.

    However, in the second case there are multiple rows in the input set. But the target column is only being set to a single value. Looking at the example below, it appears that SQL is arbitrarily deciding which row to get that single value from. I can't cite authority for this, but I would bet money that it is simply using the first valid source value it finds. This makes sense if you think about it, because waiting for the rest of the valid rows in the set would be just as arbitrary and would waste resources.

    declare @Target table (rowID int identity(1,1), data varchar(100))

    insert into @target

    select 'Apples'

    declare @Source table (data varchar(100))

    insert into @Source

    select 'Oranges' union all

    select 'Bananas' union all

    select 'Watermelons'

    select 'Before' as [@source],* from @source

    select 'Before' as [@target],* from @target

    update @target

    set data = src.data

    output deleted.data as [from], inserted.data as [to]

    from @target t

    cross join @source src

    where t.rowID =1

    select 'After' as [@target],* from @target

    It is the developer's responsibility to ensure that the correct value is used to update, and I can sympathize with your desire for an error message to keep unexpected multiple rows out. But this is the way it works.

    __________________________________________________

    Against stupidity the gods themselves contend in vain. -- Friedrich Schiller
    Stop, children, what's that sound? Everybody look what's going down. -- Stephen Stills

  • Lynn Pettis (10/28/2009)


    The problem is in your code. Please note the changes I have made in both versions.

    Hi Lynn, and thanks. Perhaps I wasn't clear about my purpose. I know the code is wrong and what the problems are. I was just surprised that any of the update statements in my example code work. I thought they would both cause an error and perhaps SQL Server is mis-behaving. The results of the update that works seem arbitrary - something that is unusual, in my experience, with SQL programming and set theory.

  • It is the developer's responsibility to ensure that the correct value is used to update, and I can sympathize with your desire for an error message to keep unexpected multiple rows out. But this is the way it works.

    Yes, I think you are right, it is taking the first value it comes across, which is surprising to me. Thanks for your response.

  • You're welcome. Best of luck to you, Glen.

    __________________________________________________

    Against stupidity the gods themselves contend in vain. -- Friedrich Schiller
    Stop, children, what's that sound? Everybody look what's going down. -- Stephen Stills

  • Glen Harland (10/28/2009)


    It is the developer's responsibility to ensure that the correct value is used to update, and I can sympathize with your desire for an error message to keep unexpected multiple rows out. But this is the way it works.

    Yes, I think you are right, it is taking the first value it comes across, which is surprising to me. Thanks for your response.

    As with many other such subtle points, it IS unfortunately documented, but rather poorly. If you read through BOL's description of the UPDATE statement, some where buried in the middle is a statement to the effect that "if I row is going to be updated as a result of a one to many join, the update occurs based on the first row it encounters, and that this row is then locked and cannot be updated any further" by that same update statement.

    I know that doesn't help your current conundrum.

    ----------------------------------------------------------------------------------
    Your lack of planning does not constitute an emergency on my part...unless you're my manager...or a director and above...or a really loud-spoken end-user..All right - what was my emergency again?

  • Thanks for everyone's comments.

    Just to illustrate that the SQL Server behaviour is not necessarily obvious ...

    With the test data I used above, the following SQL Server query results in 120 (first row found):

    UPDATE temp_account

    SET temp_account.balance01 = temp_account.balance01+temp_transaction.trans_amount

    from temp_account INNER JOIN temp_transaction

    ON temp_account.account_key=temp_transaction.account_key

    The same query in Sybase ASA results in 302 - the sum of the three rows.

    In Access 2003, the following equivalent query also results in 302 (sum of the three rows):

    UPDATE temp_account INNER JOIN temp_transaction

    ON temp_account.account_key=temp_transaction.account_key

    SET temp_account.balance01 = temp_account.balance01+temp_transaction.trans_amount;

    So SQL Server takes the first row, Sybase ASA and Access 2003 add the rows up, and from memory, Ingres reports an error (Ingres circa 1998 - I can't verify this since I don't have access to the software).

    Is SQL Server the only database that arbitrarily takes the first row? Is there a standard? Does it matter? 😉

Viewing 8 posts - 1 through 7 (of 7 total)

You must be logged in to reply to this topic. Login to reply