Fastest way to add indexes to 9+ million records in a table

Question

Fastest way to add indexes to 9+ million records in a table

jjbear11

Default port

Points: 1497
More actions
March 16, 2009 at 12:22 pm

#384139

I have an SSIS plan written to extract data from a source database and input the data into a destination database. The destination table has 1 unique clustered index, 1 unique NONclustered index, and 5 non-clustered indexes. There isn't a whole lot that I can do about the number/types of indexes, since they are dictated to us by another department. Is there any way that I can quickly add these indexes to this table? Would a disable/enable be quicker than dropping/recreating (that's what I'm currently doing -- dropping/recreating)? All I did was script the index creation to the clipboard, and then copy that SQL into an Execute SQL task in SSIS. The index creation is taking roughly 1 3/4 hours to complete. Is that about right considering the number of records in the table?
Thank you for any help that you can provide!

Viewing 10 posts - 1 through 9 (of 9 total)

You must be logged in to reply to this topic. Login to reply

Lynn Pettis SSC Guru Points: 442447 More actions · Answer 1

jjames (3/16/2009)
I have an SSIS plan written to extract data from a source database and input the data into a destination database. The destination table has 1 unique clustered index, 1 unique NONclustered index, and 5 non-clustered indexes. There isn't a whole lot that I can do about the number/types of indexes, since they are dictated to us by another department. Is there any way that I can quickly add these indexes to this table? Would a disable/enable be quicker than dropping/recreating (that's what I'm currently doing -- dropping/recreating)? All I did was script the index creation to the clipboard, and then copy that SQL into an Execute SQL task in SSIS. The index creation is taking roughly 1 3/4 hours to complete. Is that about right considering the number of records in the table?
Thank you for any help that you can provide!

I'd say that is the best way to go, drop and recreate. The key is doing it in the correct order. When dropping the indexes, drop all the noncolustered indexes first, then the clustered index. When recreating the indexes, create the clustered index first then the nonclustered indexes. This is important because if you drop the clustered index first, all the nonclustered indexes are rebuilt before you can drop them. The same is true when creating the indexes. If you recreate the clustered index last, all the nonclustered indexes are rebuilt by SQL Server.

Jeffrey Williams SSC Guru Points: 89773 More actions · Answer 2

I'd say that is the best way to go, drop and recreate. The key is doing it in the correct order. When dropping the indexes, drop all the noncolustered indexes first, then the clustered index. When recreating the indexes, create the clustered index first then the nonclustered indexes. This is important because if you drop the clustered index first, all the nonclustered indexes are rebuilt before you can drop them. The same is true when creating the indexes. If you recreate the clustered index last, all the nonclustered indexes are rebuilt by SQL Server.

Lynn, in SQL Server 2005 it is no longer true that rebuilding/recreating the clustered index rebuilds all non-clustered indexes. Now, if you create the non-clustered indexes first (with no clustered index) - then add a clustered index - then all bets are off.

I would recommend disabling the non-clustered indexes, loading the data and then performing a full rebuild on all indexes. You can't disable the clustered index or the table will not be available and you wouldn't be able to load.

You can easily rebuild all indexes using:

ALTER INDEX ALL ON table REBUILD;

Jeffrey Williams
Problems are opportunities brilliantly disguised as insurmountable obstacles.

How to post questions to get better answers faster
Managing Transaction Logs

Michael Valentine Jones SSC Guru Points: 64818 More actions · Answer 3

Are you completely reloading the table each time, or just adding new rows to existing data?

Jeffrey Williams SSC Guru Points: 89773 More actions · Answer 4

Michael Valentine Jones (3/16/2009)
Are you completely reloading the table each time, or just adding new rows to existing data?

Dangit - should have thought about that question myself. 🙂

However, I have found that dropping/disabling the non-clustered indexes is generally faster either way. The only time I have found that it is not any faster is when the number of rows being inserted/added/updated is a very small fraction of the total rows (less than 1 or 2 percent).

Jeffrey Williams
Problems are opportunities brilliantly disguised as insurmountable obstacles.

How to post questions to get better answers faster
Managing Transaction Logs

Lynn Pettis SSC Guru Points: 442447 More actions · Answer 5

Jeffrey Williams (3/16/2009)
I'd say that is the best way to go, drop and recreate. The key is doing it in the correct order. When dropping the indexes, drop all the noncolustered indexes first, then the clustered index. When recreating the indexes, create the clustered index first then the nonclustered indexes. This is important because if you drop the clustered index first, all the nonclustered indexes are rebuilt before you can drop them. The same is true when creating the indexes. If you recreate the clustered index last, all the nonclustered indexes are rebuilt by SQL Server.
Lynn, in SQL Server 2005 it is no longer true that rebuilding/recreating the clustered index rebuilds all non-clustered indexes. Now, if you create the non-clustered indexes first (with no clustered index) - then add a clustered index - then all bets are off.
I would recommend disabling the non-clustered indexes, loading the data and then performing a full rebuild on all indexes. You can't disable the clustered index or the table will not be available and you wouldn't be able to load.
You can easily rebuild all indexes using:
ALTER INDEX ALL ON table REBUILD;

I'd have to test the dropping and recreating of indexes to be sure. I do know if I dropped the clustered index first on a table the SQL Server 2000 then rebuilt all the clustered indexes. It was something I learned the hard way, and will continue to do in SQL Server 2005 and forward. When dropping all indexes on a table, drop the nonclustered indexes first then the clustered index. When creating the indexes, create the clustered index first than all nonclustered indexes. Once I started doing things that way, they seem to go very quickly.

Michael Valentine Jones SSC Guru Points: 64818 More actions · Answer 6

I can't see why you would drop the clustered index and then recreate it, since that forces SQL Server to rebuild the table as a heap and then rebuild it again as a clustered index.

If the table is being completely reloaded, then I would do it this way:

1. Truncate the table.

2. Drop the non-clustered indexes.

3. Load the table in order by the clustered index. This will avoid having to reindex the clustered index after the load.

4. Create the non-clustered indexes.

If the table is not being reloaded completely, then it depends. It might be faster to drop the non-clustered indexes and then build them again afterwards, or it might not, depending on the number of rows being added. As for the clustered index, if your load is just extending the table at the end of the clustered index, then there may be no need to reindex it. If it is inserting a lot of rows scattered over the range of the clustered index, you might get severe fragmentation of the clustered index and need to reindex.

Lynn Pettis SSC Guru Points: 442447 More actions · Answer 7

Michael Valentine Jones (3/16/2009)
I can't see why you would drop the clustered index and then recreate it, since that forces SQL Server to rebuild the table as a heap and then rebuild it again as a clustered index.
If the table is being completely reloaded, then I would do it this way:
1. Truncate the table.
2. Drop the non-clustered indexes.
3. Load the table in order by the clustered index. This will avoid having to reindex the clustered index after the load.
4. Create the non-clustered indexes.
If the table is not being reloaded completely, then it depends. It might be faster to drop the non-clustered indexes and then build them again afterwards, or it might not, depending on the number of rows being added. As for the clustered index, if your load is just extending the table at the end of the clustered index, then there may be no need to reindex it. If it is inserting a lot of rows scattered over the range of the clustered index, you might get severe fragmentation of the clustered index and need to reindex.

At the time I was doing this it was the requirement stated by the Project Lead on the DW project at my previous employer. The data was a full truncate and load, and I even had to make sure foreign key references were properly dropped and recreated. Some of these were also clustered indexes. One "saving grace" was that the data was loaded in the same sequence as the clustered index when it was recreated. That seemed to make the process seem quicker at least.

I was still pretty much a novice at the time with regard to quite a bit about SQL Server, so I did what I was instructed and tried to make it work the best I could. We didn't have any MS SQL Server gurus in the company as most of the other databases were either ORACLE, IDMS, or ISAM files depending on the systems.

Since then I really haven't had the need to drop and recreate all the indexes and things seem to work just fine. I just think it is important that if you drop all indexes that you should be sure to drop them in the correct sequence in order to save time.

jjbear11 Default port Points: 1497 More actions · Answer 8

Michael Valentine Jones (3/16/2009)
I can't see why you would drop the clustered index and then recreate it, since that forces SQL Server to rebuild the table as a heap and then rebuild it again as a clustered index.
If the table is being completely reloaded, then I would do it this way:
1. Truncate the table.
2. Drop the non-clustered indexes.
3. Load the table in order by the clustered index. This will avoid having to reindex the clustered index after the load.
4. Create the non-clustered indexes.
If the table is not being reloaded completely, then it depends. It might be faster to drop the non-clustered indexes and then build them again afterwards, or it might not, depending on the number of rows being added. As for the clustered index, if your load is just extending the table at the end of the clustered index, then there may be no need to reindex it. If it is inserting a lot of rows scattered over the range of the clustered index, you might get severe fragmentation of the clustered index and need to reindex.

The reason that I drop all the indexes on each run is because of problems that I've run into regarding index fragmentation. It seems that when I get up to about 1 million records, my index fragmentation is already at 99%, which kills me even more. I think I'll try what you were saying regarding dropping only the non-clustered indexes and loading the table in clustered-index order.

Taking Lynn's earlier suggestion about index creation order has shaved 20 minutes off of my load time. Trust me -- I'm trying to save time everywhere that I can with this customer. Thank you for the help!

noeld SSC Guru Points: 96590 More actions · Answer 9

From BOL:

The following table lists the minimal amount of new data that should be in a table before you drop indexes.

Indexes..............................................Relative amount of new data

Clustered index only...............................30 percent

Clustered and one nonclustered index........25 percent

Clustered and two nonclustered indexes.....25 percent

Single nonclustered index only.................100 percent

Two nonclustered indexes........................60 percent

* Noel