SSIS MERGE for incremental loads

  • Hello,

    I inherited an SSIS package that is rather simple. It grabs data from a SQL Query and then loads it into a SQL table. The first step of this process TRUNCATES the destination table and then reloads for the current year. This table has over a million rows and the DB SOURCE that we are pulling from is not in our domain, so one can imagine how long this takes.

    This process is working fine, (given the 45 minutes it takes to repopulate data in the DESTINATION table), but what I really need is a way to load only the rows that are NEW and UPDATED. I would also need functionality to DELETE the rows that have been removed (sounds like a MERGE, right?).

    I tried using MERGE and MERGE JOIN transformations but these transformations seem to be different from the T-SQL MERGE statement. MERGE seems like a slow UNION and MERGE JOIN only seems to work with SELECTS.

    Can anyone offer guidance on this issue?

    SQL-TG

    Everyone has a plan until they get punched in the mouth. --Mike Tyson

  • SQLTougherGuy (9/28/2015)


    Hello,

    I inherited an SSIS package that is rather simple. It grabs data from a SQL Query and then loads it into a SQL table. The first step of this process TRUNCATES the destination table and then reloads for the current year. This table has over a million rows and the DB SOURCE that we are pulling from is not in our domain, so one can imagine how long this takes.

    This process is working fine, (given the 45 minutes it takes to repopulate data in the DESTINATION table), but what I really need is a way to load only the rows that are NEW and UPDATED. I would also need functionality to DELETE the rows that have been removed (sounds like a MERGE, right?).

    I tried using MERGE and MERGE JOIN transformations but these transformations seem to be different from the T-SQL MERGE statement. MERGE seems like a slow UNION and MERGE JOIN only seems to work with SELECTS.

    Can anyone offer guidance on this issue?

    SQL-TG

    Everyone has a plan until they get punched in the mouth. --Mike Tyson

  • Hi I'm currently in the process of modifying our Data Warehouse from a full load process to an incremental load.

    I use a the lookup transform with a cache file and use row version data type to detect changes at source. I have this running over 26 tables of various sizes 2 million to 14 million rows. The load runs every 30 mins and takes between 14-26 seconds to run. Note a full load takes approx. 10 mins to run

    It's a pretty robust solution I'm in the process of documenting it.

  • If it must be dealt with in SSIS you could use slowly changing dimension for upserts (you would need a separate task for your DELETEs). However in my experience this can have poor performance with large datasets and is a pain if you need to change the data source configuration.

    Is there any reason not to use T-SQL MERGE in a stored procedure, using staging tables if necessary?

Viewing 4 posts - 1 through 3 (of 3 total)

You must be logged in to reply to this topic. Login to reply