Indexes during nightly dataloads - tips?

  • Hi all,

    I have a nightly dataload process that inserts large amounts of rows throughout the process. I assume having any indexes on the table receiving the inserts would contribute to a performance degredation, so I believe dropping them first (rebuild after) is the correct approach. However, there are more than inserts to the dataload process - many selects are required on the same table that the inserts are done to. The indexes would be beneficial for the selects, but harmful for inserts and deletes. Given the mix, does anyone have general suggestions on how to get the best of both worlds when it comes to dataload batch processing?

    By the way, there is no user traffic during the overnight, so I am able to drop any indexes without interfering with user or other activity during this time.

    Thanks in advance,

    Sharon

  • Don't drop your clustering index !!

    (because it actually contains the data pages, so a drop will cause all your data to be moved to a heap and all non-clustering indexes to be rebuilt !)

    Sort your input dataset according to your clustered index definition !

    Mass loads ... try to design your process so you can use "bulk insert" !

    Also watch out for triggers, DRI (order of load), ...

    Johan

    Learn to play, play to learn !

    Dont drive faster than your guardian angel can fly ...
    but keeping both feet on the ground wont get you anywhere :w00t:

    - How to post Performance Problems
    - How to post data/code to get the best help[/url]

    - How to prevent a sore throat after hours of presenting ppt

    press F1 for solution, press shift+F1 for urgent solution 😀

    Need a bit of Powershell? How about this

    Who am I ? Sometimes this is me but most of the time this is me

  • I don’t think that anyone here will be able to give you the solution. We don’t know how big the tables are, how many inserts you make, how many selects you make, how selective the select statements are, etc. This input is very important because there is no one solution for every data load. I can tell you that had one customer that didn’t have any indexes and after building indexes the job ran for 8 hours instead of 14 hours without the indexes. On the other hand I once worked in a company that dropping the index before the data load and building it later saved us many hours. The only real advice I can think about is the test few scenarios (dropping different combination of indexes vs. not dropping any index).

    Adi

    --------------------------------------------------------------
    To know how to ask questions and increase the chances of getting asnwers:
    http://www.sqlservercentral.com/articles/Best+Practices/61537/

    For better answers on performance questions, click on the following...
    http://www.sqlservercentral.com/articles/SQLServerCentral/66909/

  • Very good - thank you both for the advice. I would have dropped the clustered index, btw 🙂 now I know not to...

    Will be a lot of trial an error on this one!

    Thanks again,

    Sharon

  • I have to admit that I don’t agree with ALZDBA’s opinion. It is important to know and realize what clustered index does and how dropping it and\or creating it influences on the nonclustered indexes, but without knowing anything about your process, I don’t think that anyone can say for sure if you should or shouldn’t drop the clustered index.

    Adi

    --------------------------------------------------------------
    To know how to ask questions and increase the chances of getting asnwers:
    http://www.sqlservercentral.com/articles/Best+Practices/61537/

    For better answers on performance questions, click on the following...
    http://www.sqlservercentral.com/articles/SQLServerCentral/66909/

  • Adi Cohn-120898 (9/28/2010)


    I have to admit that I don’t agree with ALZDBA’s opinion. It is important to know and realize what clustered index does and how dropping it and\or creating it influences on the nonclustered indexes, but without knowing anything about your process, I don’t think that anyone can say for sure if you should or shouldn’t drop the clustered index.

    Adi

    And correct you are !

    I should have frazed my remark this way : "Don't just drop your clustering index !! "

    Indeed, it must be evaluated according to relative load volume and expected or measured fragmentation level of that clustering index.

    Johan

    Learn to play, play to learn !

    Dont drive faster than your guardian angel can fly ...
    but keeping both feet on the ground wont get you anywhere :w00t:

    - How to post Performance Problems
    - How to post data/code to get the best help[/url]

    - How to prevent a sore throat after hours of presenting ppt

    press F1 for solution, press shift+F1 for urgent solution 😀

    Need a bit of Powershell? How about this

    Who am I ? Sometimes this is me but most of the time this is me

  • Classic "it depends" question. If you are loading 1M rows into a BILLION ROW table, then yes, 1M might be a lot of rows but you probably REALLY don't want to drop/recreate all your indexes on that 1B row table! If you are loading 1M rows into a 1000 row table (or one that you flush before each load) then you absolutely want to drop/recreate indexes.

    Best,
    Kevin G. Boles
    SQL Server Consultant
    SQL MVP 2007-2012
    TheSQLGuru on googles mail service

  • Time it (and use SQL Server Profiler SQL:BatchCompleted statistics on Reads, Writes, CPU, Duration if you can) several ways:

    1) The way it is now

    2) Dropping/recreating indexes (watch out for ones used by FK's)

    3) rebuilding indexes at various points (after inserts, before updates/deletes, perhaps)

    3a) with differing fillfactors

    Then you'll know what happens in your particular environment. The only way to find out how fast something is... is to find out how fast it is. Theory doesn't always predict practice, it just gives good ideas on what to try.

Viewing 8 posts - 1 through 7 (of 7 total)

You must be logged in to reply to this topic. Login to reply