updating a dimension table

  • Hi all,

    I have one question...

    I have a huge dimension table (7 million reks) in which i have to do updates if any column changes, I am using lookup and conditional split presently,

    It takes lot of time to complete this process...

    Is there any other better way???

    Thanks for your help.

    [font="Comic Sans MS"]
    ---------------------------------------------------

    Thanks [/font]

  • Updates can be more efficent if done in a t-sql script command and with the correct indexes it can run quickly, can you post some more details about what exactly you are updating?

  • Hi Steve,

    Thanks for your reply...

    I have to do this process on a weekly basis to check if something has come new or something has changed in the source for this 6 million records dimension table, So I am designing a package to do this.

    I have a dimention table with 3 business keys that I use to lookup for New and Existing Rows...

    For New Rows I just INSERT them.

    For Existing Rows:

    (I get all the target columns that may change when I lookup, to compare with the source columns)

    then I take the green arrow of lookup to a Conditional Split Transformation and check for 10 columns( that may have changed in the source as Sourcecolumn != TargetColumn) and If I found them changed I Use OLE DB command to update the Table.

    But it is not working It just got stuck in the progress bar, nothing is happening.

    I do have index on the business keys that i use to lookup.

    Any help is appreciated.

    Thanks in advance

    [font="Comic Sans MS"]
    ---------------------------------------------------

    Thanks [/font]

  • That sounds like a fairly common scenario..

    The problem with using an OLE update statement is that it has to run for each row that requires updating, so if you have a lot of rows then this can take a very long time.

    what you might find quicker would be to indentify the rows that have changed, and then delete the entire row from the dimensional table and do an insert of all the changed rows rather than running an update statement.

    You may also find it quicker to load the source data into a staging table, then index this table and create a join query with the dimensional table to indentify the rows that have changed, with the right indexing this could run quicker than using SSIS lookup and you can use this query as the data source in the update data flow. Also creating a query gives you a chance to experiment with execution plans and indexes.

  • Thanks a lot Steve,

    I think last option is the best,yeah I already have staged the source data and will try now to join it in the OLE DB source query instead of using a lookup and will make separate dataflows for an update and an insert.

    I have indexes on the target but will have to create index on the staging table now.

    Hope this will be fast.

    I cannot delete the record as its used in fact table.

    Thanks Again.

    [font="Comic Sans MS"]
    ---------------------------------------------------

    Thanks [/font]

  • SQL Learner-684602 (10/14/2009)


    Hi all,

    I have one question...

    I have a huge dimension table (7 million reks) in which i have to do updates if any column changes, I am using lookup and conditional split presently,

    It takes lot of time to complete this process...

    Is there any other better way???

    Thanks for your help.

    Have you checked:

    Kimball Method SCD[/url]

    Batch Destination

    This should give you huge performance improvement.

    ---
    SSIS Tasks Components Scripts Services | http://www.cozyroc.com/

Viewing 6 posts - 1 through 5 (of 5 total)

You must be logged in to reply to this topic. Login to reply