select statement with inner join returns more rows than equivalent update?

  • I have one table in an accounting system that I need to update based on a spreadsheet. The spreadsheet has an identifier for the records that need to be touched. When I look at these two queries I consider them to be identical in scope but clearly SQL server does not. "Table1" is the target table in the accounting database to RECEIVE the update. "Table2" is the spreadsheet that was brought into SQL 2005 using the linked server method but then was selected into a table also located in the accounting database to keep the syntax easy.

    select ACS.Contract_Start_Date, zzp.start

    FROM table1 as ACS INNER JOIN

    table2 zzp ON ACS.Alarm_Account = zzp.SG#

    WHERE (ACS.Inactive = 'N')

    This returns 560 rows.

    However when I run

    update ACS set ACS.Contract_Start_Date = zzp.start

    FROM table1 as ACS INNER JOIN

    table2 zzp ON ACS.Alarm_Account = zzp.SG#

    WHERE (ACS.Inactive = 'N')

    it touches only 346 rows and I don't understand why.

    Table1's design (leaving only the relevant fields the query touches) is

    SET ANSI_NULLS ON

    GO

    SET QUOTED_IDENTIFIER ON

    GO

    SET ANSI_PADDING ON

    GO

    CREATE TABLE [dbo].[table1](

    [Customer_System_Id] [int] IDENTITY(1,1) NOT NULL,

    [Customer_Id] [int] NOT NULL,

    [Alarm_Account] [nvarchar](25) NULL,

    [Contract_Start_Date] [datetime] NULL

    ) ON [PRIMARY]

    CREATE TABLE [dbo].[table2](

    [PAS_AR] [float] NULL,

    [PAS#] [nvarchar](255) NULL,

    [SG#] [nvarchar](255) NULL,

    [Customer Name] [nvarchar](255) NULL,

    [Site Name] [nvarchar](255) NULL,

    [Site Address] [nvarchar](255) NULL,

    [Start] [datetime] NULL,

    [Term] [float] NULL,

    [Renewal] [float] NULL

    ) ON [PRIMARY]

    I don't think it's due to a difference in data types since I'm setting a datetime with another datetime and the join would seem to be OK since the number of rows (560) referenced by the select is the number of rows in the spreadsheet. It's got to be some syntax I'm not understanding in the update.

    Any ideas?

  • It sounds to me that table1 and table2 have a 1 to many relation. In other words - the "left" side of the join is providing 346 rows, and the "right" side sometimes has multiple "children" to the matching rows on the left.

    The update is going to counted only the rows actually updated (i.e. just the left side). This brings up another question for you to handle: assuming a row in Table1 relates to more than one row in table2, which one of those rows in table2 should provide the data? As of right now - you likely are getting unpredictable results isnce it will use the first child value it finds (which isn't predictable and could change between execution runs.

    ----------------------------------------------------------------------------------
    Your lack of planning does not constitute an emergency on my part...unless you're my manager...or a director and above...or a really loud-spoken end-user..All right - what was my emergency again?

  • THANKS for looking at it and... I mis-spoke. My spreadsheet has 569 rows... saw the 0 as a 9. Yes, there's a 1-to-many relation between table1 and table2. I consider that table1's information is "more correct" than in "table2." There could be duplicates of each on both sides which is why I'm testing for invalid although i know the join doesn't take the where clause into account. I don't know how to include the formula in the join although I've seen it done. That still won't ensure lack of duplicates but it'll be close.

    My problem is... I need to be able to identify "which" records got updated and which did not so that the user who created the spreadsheet can go back and manually do the ones they typo'd.

  • Use the OUTPUT clause on the UPDATE statement to return back the rows being updated.

    Wayne
    Microsoft Certified Master: SQL Server 2008
    Author - SQL Server T-SQL Recipes


    If you can't explain to another person how the code that you're copying from the internet works, then DON'T USE IT on a production system! After all, you will be the one supporting it!
    Links:
    For better assistance in answering your questions
    Performance Problems
    Common date/time routines
    Understanding and Using APPLY Part 1 & Part 2

  • Thanks... is there a join type that will support a one-to-many update, if only to mark the occurrences in the original file as rejects? So far google searches haven't yielded a clear answer 🙂

  • Sorry to say - but no. It's baked into the UPDATE (it's actually described, albeit badly in the BOL entry about UPDATE).

    If you have any ability to add a column, I would consider adding the primary k from table2 in table1 ("where did I get the START column"). Otherwise a subsequent join back to table1 back on account_alarm AND the start shoul be able to tell you which row did the update.

    ----------------------------------------------------------------------------------
    Your lack of planning does not constitute an emergency on my part...unless you're my manager...or a director and above...or a really loud-spoken end-user..All right - what was my emergency again?

  • Where would an output clause fit into this query? BOL is, as usual, 99% incomprehensible. I want to output ACS.Alarm_Account into table3 ?

    (table3 does not yet exist, I'm thinking like select into) This isn't contained in a procedure so I don't know that I can use a @temp table

    update ACS set ACS.Contract_Start_Date = zzp.start

    FROM table1 as ACS INNER JOIN

    table2 zzp ON ACS.Alarm_Account = zzp.SG#

    WHERE (ACS.Inactive = 'N')

    I'm putting the output clause before the from:

    update ACS set ACS.Contract_Start_Date = zzp.start output ACS.Alarm_Account

    FROM table1 as ACS INNER JOIN

    table2 zzp ON ACS.Alarm_Account = zzp.SG#

    WHERE (ACS.Inactive = 'N')

    Without the output the right number of rows is returned.

    With output I get "The multi-part identifier ACS.Alarm_Account could not be bound"

    Thanks for your help I'm soooo close to having this done.

  • I figured out where OUTPUT goes but it doesn't help on the production database.

    update ACS set ACS.Contract_Start_Date = zzp.start output inserted.Alarm_Account

    FROM table1 as ACS INNER JOIN

    table2 zzp ON ACS.Alarm_Account = zzp.SG#

    WHERE (ACS.Inactive = 'N')

    returns "The target table table1 of the DML statement cannot have any enabled triggers if the statement contains an OUTPUT clause without INTO clause."

  • Create a table variable or temp table, and output into it.

    declare table @outputtbl(alarm_account int);

    update ACS set ACS.Contract_Start_Date = zzp.start

    output inserted.Alarm_Account

    into @outputtbl(alarm_account)

    FROM table1 as ACS INNER JOIN

    table2 zzp ON ACS.Alarm_Account = zzp.SG#

    WHERE (ACS.Inactive = 'N');

    Select * from @outputtbl;

    ----------------------------------------------------------------------------------
    Your lack of planning does not constitute an emergency on my part...unless you're my manager...or a director and above...or a really loud-spoken end-user..All right - what was my emergency again?

Viewing 9 posts - 1 through 8 (of 8 total)

You must be logged in to reply to this topic. Login to reply