INSERT if doesn't exist, UPDATE if changed

  • Hello, all:

    Please forgive me if this is an idiotic question--I'm still learning!

    I'm trying to create a process that pulls data from an outside (non-SQL Server) database, then combines that data with data from one of my tables and inserts the results into another table.

    Demographic data, including CustomerType, Location, Date, AgeBracket, Zip Code, Email, and DBXID (a unique identifier assigned to each record in the outside database), is pulled from the outside database into a table called DBX. Using an insert statement, I combine this data with my ZipCodeAssoc table (which lists the City, County, State, and Country associated with each zip code) and insert the results into my Demographics table. Currently I'm using the following query:

    INSERT

    INTO Demographics

    SELECT DBX.CustomerType, DBX.Location, DBX.[Date], DBX.AgeBracket,

    ZipCodeAssoc.City, ZipCodeAssoc.County, ZipCodeAssoc.State, DBX.[Zip Code],

    ZipCodeAssoc.Country, DBX.Email, DBX.DBXID

    FROM DBX LEFT JOIN ZipCodeAssoc

    ON DBX.[Zip Code] = ZipCodeAssoc.[Zip Code]

    WHERE DBX.DBXID NOT IN

    (SELECT Demographics.DBXID

    FROM Demographics

    WHERE DBX.[Zip Code] = ZipCodeAssoc.[Zip Code])

    This works fine to insert new records into Demographics, but what I'm struggling with is how to update existing records in Demographics if they're changed in the outside database. (Part of this process is that DBX is truncated and rewritten to every night, so any changes or additions will be current in the DBX table.) Is there a way to identify which records in Demographics have been changed, and to overwrite those with the new data in DBX? The only solution I can come up with is to truncate and rewrite the Demographics table every time this process runs...is there a better alternative?

    Thanks in advance!

  • lk (11/26/2008)


    ...Is there a way to identify which records in Demographics have been changed, and to overwrite those with the new data in DBX? The only solution I can come up with is to truncate and rewrite the Demographics table every time this process runs.

    the only way to tell if a record in Demographics has changed is either have some sort of [last_updated] datetime column in the table from the outside source, and check that value, or do the painfull operation of comparing the Demographics table values with the new values in your DBX table. If you don't have any indication of when the Demographics record was updated, then depending on the number of records expected in the Demographics table as a whole, you could do an UPDATE statement with a join in it before the INSERT statement you mentioned (which would update all the records that have a match) or do the TRUNCATE and completely reload as you suggested.

  • Might I ask as to why you appear to be wanting to create table structure which is not normalized? For example with your tables DBX and ZipCodeAssoc

    you could extract all that you appear to want with some simple T-SQL for example:

    SELECT DBX.CustomerType, DBX.Location, DBX.DBXID, ZipCodeAssoc.City, ZipCodeAssoc.County, ZipCodeAssoc.State, DBX.ZipCode

    FROM DBX INNER JOIN

    ZipCodeAssoc ON DBX.ZipCode = ZipCodeAssoc.[Zip Code]

    WHERE (ZipCodeAssoc.State = 'input state name or abbreviation')

    If everything seems to be going well, you have obviously overlooked something.

    Ron

    Please help us, help you -before posting a question please read[/url]
    Before posting a performance problem please read[/url]

  • Thanks to you both for your responses! It turns out that the records with which we'll be working will not change once they're created, so it renders my question moot. I'm sure this issue will come up again, though, so thanks for your help! And bitbucket, the reason that I want to pull all this data into a new table is that we have some sort of weird, pre-existing reporting software that needs to use the table as a direct data source. Hopefully we'll soon be moving away from that and toward SQL Reporting Services, but for now, we just have to make do... :rolleyes:

    Thanks again!

  • Hi

    I was facing the same problem last week where I have to update my database table from a flat file changed everyday having some new records and some updated records.

    Truncating the table and inserting all data again is one of the easiast method in such a case, but during this process data is not accessible or in some cases if after truncation there is some problem in importing data then data loss case occurs.

    To avoid this I have created two identical tables, in your case you can create DBX and DBX_intermediate with same structure and do the following steps

    step 1 ) truncate DBX_Intermediate table

    step 2 ) import all data into DBX_Intermediat ( including DBXID ) from external source

    step 3) Update DBX fields by making inner join between DBX and DBX_Intermediate on DBXID

    step 4) Insert record into DBX from DBX_Intermediate for DBXID not in DBX table

    So first time all records will be inserted into DBX and further existing will be updated with updated values and new one will be inserted.

    Hope it will help u.

    Thanks

    Vaseem

Viewing 5 posts - 1 through 4 (of 4 total)

You must be logged in to reply to this topic. Login to reply