PLEASE SUGGEST: 715 Millions Data Loading taking long time

  • [font="Arial"]Hello –

    One of our ETL which refreshes the DB takes about 8 Hours to complete the package. I was asked to tune the package, below are some pointers:

    • We are loading about 12 tables thru this package and all are TRUNCATE – LOAD, most of the tables are quite huge in fact all together we are loading about 715 Millions of data.

    • I am in an impression that it could be one of the reasons why the package is taking time and thought of implementing Incremental Load rather FULL Load (TRUNCATE-LOAD). However as far as I understand for Incremental Load (either using MERGE or SCD Type 1/2) – I need two tables, one is Stag and another one is Warehouse table. Correct me if I am wrong. If that’s the case I don’t think we can afford to have two DBs and tables at this stage.

    • I am also thinking of TABLE PARTITIONING.

    Request you to please help if anyone has got a better idea.

    Thanks,

    Srini [/font]

  • You dont need two db's for merge, you could create a linked server and run a merge between production and warehouse via the linked server to save loading the table then merging.

    You could if you have the right edition of SQL use something like change data capture, or create your own custom DML triggers to log whats changed and then just ship them changes to the warehouse.

    Some form of replication would do it as well depending on your requirements.

  • One thing you may want to try is to drop the table indexes before the load and then add them back in when complete. If the tables have multiple indexes, the potential is there for an index update as each row is inserted. Re-creating the indexes after the load could potentially decrease your load time.

  • Thanks a ton, I'll try to implement the same in my scenerio.

    Thanks !!

Viewing 4 posts - 1 through 3 (of 3 total)

You must be logged in to reply to this topic. Login to reply