Update Indexes/Statistics on "Temp" (staging) table (not TempDB)

  • We have a number of "temp" or "Staging" tables in our databases that are used to temporarily store data. These are actual database tables, not TEMPDB tables.

    Our processes will insert data into the table, then run a stored procedure to process the data into the actual "reporting" tables. [In case someone asks, YES, this is a much more efficient method for us for processing the data versus trying to put the data directly into the reporting tables... with a 40% to 80% improvement in processing speed.]

    Obviously, the indexes and statistics on these tables are horrible, since they have data inserted into them (daily or weekly), then have the data either deleted or truncated out to wipe the table clean and prepare for the next process cycle.

    FYI, the process to process the data out of these "stage" tables can either take minutes or hours, depending on the volume of data being inserted. And NO, I have never monitored these before to see what the impact might be on the query execution plan... this is just a general question that came to me a few moments ago.

    The questions are as follows:

    1) Is there a good strategy on these type of table(s) to either do a re-index or a statistics update when new data has been inserted but not processed yet?

    2) If the indexes and statistics are NOT updated when the table receives new data, would doing a sp_recompile do any good before running the process to process the data into the normalized "reporting" tables?

    I realize that, without test cases, actual data, execution plan, etc. this is might be a hard question to answer, but this is more of an academic question at this point (kind of like a "best practices" question).

    Thanks! 🙂

    Argue for your limitations, and sure enough they're yours (Richard Bach, Illusions)

  • Ok... guess this is what I get for being in a hurry and not doing my research/due diligence.

    Here are some posts that answer the questions (about indexes... no comments on statistics):

    http://qa.sqlservercentral.com/Forums/Topic888806-145-3.aspx

    http://qa.sqlservercentral.com/Forums/Topic866837-360-1.aspx

    http://qa.sqlservercentral.com/Forums/Topic619360-360-1.aspx

    Looks like the index based stats are updated with the new data inserts, but some of the system generated _WA_Sys_.... stats can become "old".

    Argue for your limitations, and sure enough they're yours (Richard Bach, Illusions)

  • By the way: no one will (should) judge you for using staging tables. Not only are they faster in your case, they also allow you to rebuild the data on failure without having to start over from square one. Staging tables are a good thing.

    However, are you really sure you need indexes on your staging tables. It takes extra time to update the indexes for every row you insert into that table. You may not be able to realize a return on investment for that time spent if you're only querying the data only once. If you take remove the indexes, your query time may go up, but your insert time will go down. Further, those indexes are taking up storage space as well.

    You definitely want to index your reporting table, but I'd make sure you're actually getting something out of the indexes before using them on a staging table.

  • If you do determine that you need the indexes on the staging table, you may want to consider dropping them when you clear the table, and recreating them after the data is loaded. In some cases (definitely not all), this can be faster than inserting into the table with the indexes intact. You may want to consider trying it both ways and seeing which is faster. This may also apply to your reporting table.

  • Staging tables are frequently a great solution for data import.

    Yes, you might get benefits from explicitly updating the stats, especially with a FULL SCAN, or defragging/rebuilding the indexes. It just depends on how you're using your staging tables, are they referenced once during a load or are they referenced multiple times, used in joins, etc. If the former, it might not be worth the trouble to maintain the stats & indexes, the latter, it probably is.

    Just remember that you'll be adding time & load to your ETL process.

    ----------------------------------------------------The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood... Theodore RooseveltThe Scary DBAAuthor of: SQL Server 2017 Query Performance Tuning, 5th Edition and SQL Server Execution Plans, 3rd EditionProduct Evangelist for Red Gate Software

Viewing 5 posts - 1 through 4 (of 4 total)

You must be logged in to reply to this topic. Login to reply