Migrate 1 Billion row, 1 TB table from SQL2005 to SQL2008

  • I am trying to figure out the best way to migrate a 1 billion row, 1 terabyte table from SQL2005 to SQL2008. The source table is not partitioned and is not in its own file group. The target table will be partitioned on 50 million row partitions with one filegroup per partition and one file per file group.

    The approach I am currently testing is as follows:

    1. Setup the target table with the desired partitioned setup.

    2. In each file group, create a staging table corresponding to that partition. For example, the first partition will contain rows where the rowid is less than 50 million. Data in the source table with a rowid less than 50 million will be migrated into the staging table for the first partition. The migration method right now is INSERT INTO target SELECT FROM source.

    3. After the staging table is loaded, the cluster index and secondary indexes will be built (the same index structure as is found on the staging table).

    4. Then, the staging table will be moved into the partition using an ALTER command.

    Anyway, I am not pleased with the processing times and I am looking for suggestions on faster methods.

    What about bulk data options?

    Does it make sense to load each staging table table with indexes or is it better to create the indexes after loading the table.

  • Hi Tom,

    Generally I think the process you outlined is good, but you will want to leverage bulk operations, and you have a few options for it.

    I suggest you have a look at the following article: "The Data Loading Performance Guide".

    It is a little long but very good.

    Specifically, look at the table describing minimal logging conditions.

  • I recommend creating the clustered index before the load, and create the non-clustered indexes after the load. This saves the step of building the clustered index, which actually amounts to reloading the table and can take a long time.

    I recommend using SSIS to load the data in sequence by the clustered index with the Rows per batch set to a reasonable value, like 100,000 rows.

    You can use SSIS to load the data in parallel streams into the different tables to shorten the overall runtime.

Viewing 3 posts - 1 through 2 (of 2 total)

You must be logged in to reply to this topic. Login to reply