Which is faster Truncate and Load or Slowly Changing

  • Hi I have this problem what among those two ETL choices will i create?

    I know that Truncate and Load will Erase all records from Table and Load All records from source to it.

    And Slowly Changing insert All records if it is new and Update records if it is changed.

    But i don't know exactly which is faster?

    Thanks in Regards

  • TRUNCATE TABLE removes the data by deallocating the data pages used to store the table's data

    it is preety fast. I am not sure what do you mean by Load.

    Regards,

    Iulian

  • Actually I am going to Create an Integration Services or ETL...and i don't know which is better to use and who will finish faster.

    Specifically My Source is a Table from ERP Server(Which contains millions of Records) and my destination is the replicate of that table on Datawarehouse Server.

  • how many tables, how many rows and how many rows have been changed since last load?

    ________________________________________________________________
    you can lead a user to data....but you cannot make them think
    and remember....every day is a school day

  • In the slow method you don't mention deleting records that are no longer present in the source; presumably you have to do that - or is nothing ever deleted?

    Whether truncate followed by load from source is quicker that insert new, update changed, and delete missing from source will depend on how many records are the same, how many are changed, how many are deleted, and how many are new, and what indexes exist on the target table, whether it's a heap or clustered and what sort of page splits (if any) will be caused by deletes and updates.

    Unless you give some sort of estimates for the numbers the only answer anyone can give you is £it depends", and even with the numbers it may not be obvious without details of the schema. In fact your best bet is probably to run some tests - try doing truncate and load and see how long it takes; then try doing it with a merge statement and see how long it takes - but don't belive the results of a single comparison, do enough tests to give you an idea of speed over the range of mixes of cretae/update/delete/leave alone that you are likely to encounter in real life.

    Tom

  • From a design point of view they really have different purposes.

    But to answer your question the SSIS SCD transform is a dog and not generally recommended. There is a much better one on codeplex by the Kimball Group. PragmaticWorks has also taken the Kimbal SCD and tweaked it a bit to turn it into a commercial product.

    So back to basics. Do you need or want to maintain any historical data in the table? I suspect the answer is no because you are considering truncate. If you need to maintain historical data that will exclude a truncate and force you into some sort of SCD transformation. Whether you use the existing one because you don't really have to worry much about scale in your environment, or get a 3rd party one, or even build your own it doesn't matter. If you need to maintain historical data you need to have something that will do that.

    So that brings us to the third possibility, you don't need to maintain historical data and you are asking if a truncate and insert will be faster than an incremental load. I would generally say that an incremental load would be the way to go because it will absolutely scale better. An incremental load however would require some lookups or merge joins and some conditional splits as well as 2 destinations(an update destination, and an insert destination) so it is a bit more complicated than an Execute SQL Task with a truncate command followed by a data source and a data destination.

    Without trying to be insulting it sounds like you need to do a bit of reading on how to use the different SSIS transforms so you can figure out the best approach because comparing a truncate to an SCD is comparing apples and oranges because their fundamental purpose is different.

  • Thanks A LOT!

    I would go to Slowly Changing Dimension some of our application needs look up to that Table from Datawarehouse. We get some issues about it where we cant get data when Job is Running.

  • DELETE is a logged operation on a per row basis. This means

    that the deletion of each row gets logged and physically deleted.

    You can DELETE any row that will not violate a constraint, while leaving the foreign key or any other contraint in place.

    TRUNCATE is also a logged operation, but in a different way.

    TRUNCATE logs the deallocation of the data pages in which the data

    exists. The deallocation of data pages means that your data

    rows still actually exist in the data pages, but the

    extents have been marked as empty for reuse. This is what

    makes TRUNCATE a faster operation to perform over DELETE.

    You cannot TRUNCATE a table that has any foreign key

    constraints. You will have to remove the contraints, TRUNCATE the

    table, and reapply the contraints.

    TRUNCATE will reset any identity columns to the default seed

    value. This means if you have a table with an identity column and

    you have 264 rows with a seed value of 1, your last record will have

    the value 264 (assuming you started with value 1) in its identity

    columns. After TRUNCATEing your table, when you insert a new

    record into the empty table, the identity column will have a value of

    1. DELETE will not do this. In the same scenario, if you

    DELETEd your rows, when inserting a new row into the empty table, the

    identity column will have a value of 265

    Reagrds

    Raju

Viewing 8 posts - 1 through 7 (of 7 total)

You must be logged in to reply to this topic. Login to reply