How to compare data between two tables and update changes only

  • hi All,

    I have a task to complete and i am hoping to get some direction on how to get statred with that.

    I have received a large amount of data from our overseas office primarily with company and people contact details and about 80% of the data in the excel files is already present in our database and another 20% are new contacts to be added.

    But there may be some updates for the 80% in the excel file for address, job title, city etc. I have to update any changes from the excel to reflect in our database.

    Intially i have thought to import all the data from excel as a table in sql server 2005 and then compare it with what data we have in our two tables tblcompany, tblpeople and then update the changes in production database.

    Is that a correct approach? is there any other way of doing it. And how do i compare both the tables to update only the changes.

    Please help, any suggestions will be highly appreicated.

    Thanks, Shilpa.

  • Hi,

    The approach you have mentioned (to import data into SQL Server and then compare) is a right approach.

    regarding your question about how to compare both the tables and then update only changes is possible with using JOIN.

    A much better description of the solution would be possible if you can give the create table scripts for the 2 tables and also the condition to look for.


    Bru Medishetty

    Blog -- LearnSQLWithBru

    Join on Facebook Page Facebook.comLearnSQLWithBru

    Twitter -- BruMedishetty

  • Hi,

    Yes, this is one of the options. You can load everything to a staging table and then update rows that do exist in the target table and insert the rest. There is no point in filtering out rows that did not change from the update statement in my opinion, as updating all rows will not change rows that are equal anyway.

    Do you have a column or columns that uniquely identify particular rows? The sql would look something like this:

    --update existing rows with imported values

    update dest

    set cola = a.cola, colb = a.colb, colc = a.colc

    from staging a inner join dest b on a.keycol = b.keycol

    --insert missing rows

    insert dest (cola, colb, colc)

    select cola, colb, colc

    from staging

    where not exists

    (select 1 from dest where keycol = staging.keycol)

    The other option probably would be to use Fuzzy Lookup Transformation in SSIS if you have manually entered data with typos and spelling mistakes. Probably it wouldn't solve all your problems though.

    Regards

    Piotr

    ...and your only reply is slàinte mhath

  • Piotr.Rodak (11/10/2009)

    ...There is no point in filtering out rows that did not change from the update statement in my opinion, as updating all rows will not change rows that are equal anyway....

    That is not true. This simple test script demonstrates that.

    if object_id('tempdb..#t','U') is not null drop table #t

    go

    select

    *

    into

    #t

    from

    -- Function F_TABLE_NUMBER_RANGE available on this link

    -- http://www.sqlteam.com/forums/topic.asp?TOPIC_ID=47685

    F_TABLE_NUMBER_RANGE(1,5000000)

    order by

    number

    go

    declare @st datetime

    declare @x1 int

    declare @x2 int

    set @st = getdate()

    set @x1 = 44

    set @x2 = @x1

    update #t

    set number = number+@x1-@x2

    where

    number <> number+@x1-@x2

    select Elapsed_MS_1 = datediff(ms,0,getdate()-@st)

    go

    declare @st datetime

    declare @x1 int

    declare @x2 int

    set @st = getdate()

    set @x1 = 44

    set @x2 = @x1

    update #t

    set number = number+@x1-@x2

    select Elapsed_MS_2 = datediff(ms,0,getdate()-@st)

    Results:

    (5000000 row(s) affected)

    (0 row(s) affected)

    Elapsed_MS_1

    ------------

    843

    (1 row(s) affected)

    (5000000 row(s) affected)

    Elapsed_MS_2

    ------------

    7686

    (1 row(s) affected)

  • hi Rodak,

    Thanks for your suggestion. I have listed the structure of data in Excel file but the same data is stored in two different tables in the database tlbcompany, tblperson.

    I can match excel data with table data using COID, PersonID

    Do i need to split the Excel file into company and people ?

    And i want to update only the rows which are different from the database tables.

    Can i use INTERSECT/EXCEPT to find the rows with changes and update only those? how would i go about that?

    Columns in excel file

    CoID

    PersonID

    Company

    Address

    Street

    City

    State

    PostCode

    Country

    Dx

    Phone

    Fax

    MrMs

    FistName

    LastName

    Title

    Specialisation

    Direct Phone

    Direct Fax

  • Thanks Michael, I realize this is a risky statement. Thanks for your sample. On the other hand I don't think there will be 5 million rows of data sent in Excel files of the OP. That's why I decided to say that it will not make too big difference and will be easier to write (there may be many columns to compare, not only the key).

    sharonrao123, I don't know what are your business requirements - when you say that you have same data in tblcompany and tblperson, does it mean that both tables have the same structure? Are COId and PersonId always populated for both tables? I guess that you will have to run the statements I sent earlier separately for companies and for persons, but this is only a guess, you have to decide what result you want to achieve. You can use EXCEPT to capture different rows, but NOT EXISTS will be OK as well.

    Regards

    Piotr

    ...and your only reply is slàinte mhath

  • hi Rodak,

    Sorry i was not clear in my response. We store company and person details seperately in company (columns : name, address, street, city, state, postcode, country, dx, phone, fax) and people (columns: mrms, firstname, lastname, title, email, directphone, directfax) table and the people table has CoIDas foreing key.

    But in Excel file they are combined that should not be a problem ignore that.

    Total records in excel file should be around 30,000. And i decided to split the excel sheet data into company and people.

    Excel sheet data has aslo got the primary keys for people(personid) and company(coid) but my question really was is there a way to find out which columns have different data when compared between excel and database. And please note that for each record not all the fields have updates for some records it might be address field and for some title field and so on.

    I am not really able to decide weather to update all the fields based or just update the fields which have changes. As you said that is something which i have to figure out.

    Thanks for your input i really appreciate your time.

    Shilpa.

Viewing 7 posts - 1 through 6 (of 6 total)

You must be logged in to reply to this topic. Login to reply