Archiving in SQL 2005

  • Hi All,

    I need your opinion in this matter. I have a table with 200Millions of records. We dont need to read the data in production. We only run reports for the last 90 days.

    I was thinking in do Partitions, but since we dont need the old data and my Server has only 2 drives. I think it wont be a performance if i try to do some queries in that particular table.

    I created an archiving process to Insert/Delete per trimester, so I can run the reports that I need. I created a new database called ArchiveDB. This ArchiveDB has tables per trimester and I have a Job that runs every 3 months to insert/Delete the data.

    Questions:

    1) Do you think this is a good way to Archive data?

    2) Is there any faster process to Insert/Delete data? My process take 1 hour approx to do this. I know it runs every 3 months at midnight, but I dont want the process to take that long. The process is simple, I identify the month to be processed and it will storage everything in the right table. Then the process will delete the data from the original table.

    3) what is the best way to query the archive data. The union statement??

    I appreciate your advices

    Regards,

  • non experience advice?

  • 1) This process is fine from the 10,000 ft level, not knowing the nitty gritty details.

    2) Where is your Archive database? Same Server, different server? If different server, how is that setup? The location of the ArchiveDB, the connection type and parameters, and the number of records moved/deleted during this process will affect the duration of your process. Not knowing your actual setup - it is difficult to say if there is a faster process.

    3) Best method to query the archive data depends on what you are trying to do here. Are you intending on querying all of the archive data? Just a specific Quarter or month? Yes you could use the Union quite nicely to query archive data that is spread out amongst different sets.

    Jason...AKA CirqueDeSQLeil
    _______________________________________________
    I have given a name to my pain...MCM SQL Server, MVP
    SQL RNNR
    Posting Performance Based Questions - Gail Shaw[/url]
    Learn Extended Events

  • Without knowing more about your reporting requirements, advice will be very high level.

    One solution could be to produce monthly "snapshot" reports, after which, your data can be removed. From this, you could generate quarterly and yearly reports without the need to archive any data.

    Hope this helps.

  • Hi Jason and Mark,

    Thank you for your responds. My archive database is located in the same server. I have tables per trimester around 200,000 records each table (each trimester, the archive tables grow). I just need them to run a report once a month and its read data for the last 90days.

    About the snapshot database. It’s a good option as well. But, what’s better if I need my table in production with not many records because I have a process that insert around 2million of records from another database to that particular table.

    I’m trying to find the best performance for my insertion in my table in production and for the future queries of the archiving or snapshot database.

  • There are some points to be noted incase you go with moving data to another db:

    1. Make the transaction smaller,ie, batches of 10000 where the transaction will be smaller

    2. Drop indexes and re-create the same.

  • Thank you for the advices

  • Interesting ... I am going to assume that you will not keep data older that 90 days so here we go with some general tips on performance.

    If you are have your storage on the same box as SQL server, consider moving this to another server or SAN/NAS device and use a RAID array to store your data. This should save you some IO, even with SCSI.

    Consider partitioning your tables. Generally, it is better to work with small data sizes which partitioning will help with. Remember to include an extra partition for your out or date data. You also get partitioned index into the bargain so updates should be faster. You may need to experiment with the size and/or number of partitions.

    Consider running your inserts more frequently and/or in batches. Gives your indexes an easier time when performing updates.

    Cascade data between partitions frequently. Small, frequent updates give better performance that large ones.

    Review the recovery model. If using full, does Bulk Logged or even simple meet your needs?

    Delete or truncate your out or date data frequently. This can be achieved using the partition directly.

    I Hope this has given you some food for thought. Let me know how you get on.

  • There are some good suggestions here so far. If performance is your biggest concern, I'd suggest looking closely at SQL Server 2005 Table Partitioning. This will keep you from having to archive old data from your table. If you set up your partitions correctly, you're queries against the active data (< 90 days out) will only touch the partitions that house that data. This will give you the query performance you need and prevent you from having to move the rows off to another DB.

    As far as your inserts go, look into using Bulk Insert to get your data into your table. Also, depending on the frequency of your inserts, you could look into using a sliding window concept where you insert new data into an empty partition and then 'slide' that partition into your table using the partitioning functions. I've designed a system to do this before and it worked pretty slick. Bulk inserting into the empty partition keeps locks off of the partitioned table and then moving the newly populated 'empty' partition into the base table becomes a simple metadata switch.

    John Rowan

    ======================================================
    ======================================================
    Forum Etiquette: How to post data/code on a forum to get the best help[/url] - by Jeff Moden

  • Definitely I need to create Partitions in my tables; my process took 52minutes to move/delete around 44million records (3 months of data). Plus I have a view that union all the Archive tables to run my report. This process took more than 10minutes to run when it used to be just 1 minute. Even know, all those tables have their indexes. . I can’t use indexed views because my archive tables are in another database and I need to union the table in production that it is in another server. I thought taking much data possible out of my production table will be better and faster. But it’s not working better.

    Is another better way to query my tables instead the view?

    You think adding partition and let the archive process will performance better. Or I should have all the records in the production table with partitions?

    I want to be ready for future ‘because my insertion of data grows from 300,000 records to 1.5million records. Next year I will insert 3millions of records. I have the bulk insert and empty table, so I’m not having problems in performance in my insertion.

    I just have 2 drives in my Server to do partitions. Do you think this is an issue?

    Waiting for your experienced advice.

  • Before you start looking deep into partitioned tables I have one question for you, are you running SQL Server 2005 Enterprise Edition or SQL Server 2005 Standard Edition? This is important as partitioned tables are only supported in the Enterprise and Developer editions of SQL Server 2005/2008.

  • Good point Lynn. Partitioning is only available in certain versions.

    Is another better way to query my tables instead the view?

    You think adding partition and let the archive process will performance better. Or I should have all the records in the production table with partitions?

    If you are going with SQL Server 2000 style partitioning (where you manually build each partition table and join them together via a view) then you are stuck with the performance of the view. One thing to note is where you put your clustered index. Since your tables are broken down by data (calendar month, correct?) I would recommend clustering on your date column.

    As far as the archiving goes, I would add the partitions to your production table and partition by date. You'll want to do some planning on your partition sizes based on your future projections for growth. Since you only care about a quarter at a time, I would make the partitions either quarterly or monthly. Monthly partitions would allow you to always reference a sliding quarter's worth of data.

    Your historical data would remain in your production table, but those partitions will never be looked at when you are querying current data as long as you use the data column (partition key) in your WHERE clause in your queries. SQL Server is smart enough to ingore the data/indexes in the partitions that are not needed for the query. You don't have to worry about joining historical months together via a view as all data can be referenced by the production table name.

    I just have 2 drives in my Server to do partitions. Do you think this is an issue?

    If that is what you have to work with, that'll do. Especially if you are only really using 3 months of data at a time. Set your partitioning up to 'round-robin' between your 2 disks to spread the I/O out for reads, this should give you better performance than having all of your data in one filegroup on one disk.

    When you truely want to archive data off, or purge it, it can be done by a metadata switch with the partitioning functions.

    Here's a great read on partitioning from Kimberly Tripp.

    http://www.sqlskills.com/resources/Whitepapers/Partitioning%20in%20SQL%20Server%202005%20Beta%20II.htm

    John Rowan

    ======================================================
    ======================================================
    Forum Etiquette: How to post data/code on a forum to get the best help[/url] - by Jeff Moden

  • I have the SQL Server 2005 Developer Edition.

    This is my scenario on the same server right now:

    ArchiveDatabase:

    Table2008_1

    Table2008_2

    Table2008_3

    Table2008_4

    Table2009_1

    Table2009_2

    Table2009_3

    Where each table contains 3 months of data, the _# means the trimester of the year. So each table has around 15Millions records and they are growing (the last table has 40million of records). Those tables are not partitioned yet.

    ProductionDatabase

    Table

    Where table has 45million of records (September to today). Not partitioned yet. Now, I have to worry about the kind of partition I should create because I have 3 kind of process running for this Table:

    Dialy Insertion of records (1.5million and growing)

    Query for reporting (reports can be all the kind but the condition WHERE is the date)

    Archiving old data

    Note: all my tables has clustered index Date and ID

    Knowing this, I would like to know if monthly or weekly partition will help for my 3 process.

    Either way, Should I? :

    1.- Re-structured the way to archive my data? I mean, I can switch the actual archiving tables into specific partition?

    2.- If I have weekly partitions, my weeks of 2008 and 2009 will mix? Or I should have partition per year per week? (Total of 104 partitions?) SQL only support around 200 partitions right?

    3.- how I will quey my old data if they are in different database/table? View with union (like I have it right now)

    Am I complicated the situation when it’s easier than I explained?? lol

  • OK, I'll walk through what I'd do and I'll try to answer your questions as I go.

    1. Create partitioning scheme/function for your production table. Weekly partitions may be a bit much for the amount of data you're talking about. Monthly would do nicely. Create enough partitions to house your old data and current data. Cluster the table on the date column and use that same column as the partition key.

    2. Move your old data from your archive DB back into your Production table.

    3. Get rid of your archive DB.

    4. Query all data, old and new, in the Production table. Using the date column (partition key) in your queries will instruct SQL Server on which partitions to query. A query against the current data will only touch the last 3 month partitions. A historical query will touch only the partitions that it needs to satisfiy the query.

    So to answer your questions:

    1. No need to archive data out of your Production table.

    2. Partition Monthly. The months will not mix over years. The partitioning scheme/function will ensure that each month/year has it's own partition.

    3. You'll query your old data by querying your production table and using a date filter in your WHERE clause that includes the date range you want to look at. No views, no different table names, let the partitioning and the query optimizer handle it all for you.

    If you have not yet read the article that I linked, I suggest you read it. It explains the differences between old-style SQL Server 2000 partitioned views with the new table partitioning available in SQL Server 2005 and up, again, only in certain versions.

    John Rowan

    ======================================================
    ======================================================
    Forum Etiquette: How to post data/code on a forum to get the best help[/url] - by Jeff Moden

  • Thank you John.

    Im not done readind the document yet. Thank you for all the advices.

    Regards,

Viewing 15 posts - 1 through 15 (of 21 total)

You must be logged in to reply to this topic. Login to reply