Archiving in SQL 2005

  • MTY-1082557 (11/4/2009)


    I have the SQL Server 2005 Developer Edition.

    This is my scenario on the same server right now:

    ArchiveDatabase:

    Table2008_1

    Table2008_2

    Table2008_3

    Table2008_4

    Table2009_1

    Table2009_2

    Table2009_3

    Where each table contains 3 months of data, the _# means the trimester of the year. So each table has around 15Millions records and they are growing (the last table has 40million of records). Those tables are not partitioned yet.

    ProductionDatabase

    Table

    Where table has 45million of records (September to today). Not partitioned yet. Now, I have to worry about the kind of partition I should create because I have 3 kind of process running for this Table:

    Dialy Insertion of records (1.5million and growing)

    Query for reporting (reports can be all the kind but the condition WHERE is the date)

    Archiving old data

    Note: all my tables has clustered index Date and ID

    Knowing this, I would like to know if monthly or weekly partition will help for my 3 process.

    Either way, Should I? :

    1.- Re-structured the way to archive my data? I mean, I can switch the actual archiving tables into specific partition?

    2.- If I have weekly partitions, my weeks of 2008 and 2009 will mix? Or I should have partition per year per week? (Total of 104 partitions?) SQL only support around 200 partitions right?

    3.- how I will quey my old data if they are in different database/table? View with union (like I have it right now)

    Am I complicated the situation when it’s easier than I explained?? lol

    Okay, the fact that you are using the Developers Edition is good, if it is only for development, but what version of SQL Server 2005 are you running in production?

  • morning.

    this is the version:

    Microsoft SQL Server 2005 - 9.00.3200.00 (X64) Oct 2 2007 12:23:02 Copyright (c) 1988-2005 Microsoft Corporation Enterprise Edition (64-bit) on Windows NT 5.2 (Build 3790: Service Pack 2)

    Reading the document was very interesting. My Table has to join with at least 4 more tables. Some of them not make sense to put them in a file group. My Table has a TableParent, where the relationship is one to many (many is the table I need to do the partition). How is the query performance in that kind of scenario if my TableParent wont have the date Key partition cause this table doesn't have that column.

    We didn't mention if my monthly partition will be good for my daily insertion as well?

    Thank you

  • I have everything set. Im attaching 2 images, one of the Partition and other of the Partition Files so you can take at look.

    What will happen with January2010? How those partitions and files will change? SQL will create everything or I need to specify some new filegroups/files?

    Regards,

  • I created all the partition for my existing data and for the next 6 months. Including filegroups and its files. I moved the indexes that are involve in the key partition criteria and everything works fine.

    I just wanted to share it with you. Even one of my reports runs in 7 seconds when it just to run in 20 seconds.

    Regards,

  • First off, sorry for the delayed response. I've had a real busy couple of days. It looks like you've got it all set up though! Good job. The first time I set up table partitioning in SQL Server 2005, it was a bit of a learning experience. Between understaning all of the relationships between the partitioning functions, partitioning schemes, how they map to filegroups, and the partition switching concepts, it proved to be a lot to take in at once.

    With that said, your screen shots did not come through very good, but I get the picture. You've set up partitions for each month and created a filegroup and file for each partition. That's a good practice, but with only having 2 disks to divide you data accross, you could have just as easily got away with 2 filegroups/files (one on each disk) and round-robin the partitions between them. No big deal though, if you ever move to a more robust disk subsystem in the future, you'll be set up to stripe the files accross as many disks as you want.

    As you've found, there are some maintenance tasks to consider when working with partitions. Specifically, you need to create your partitions ahead of time before you populate the data. One thing I usually do is to set up a check constraint on the partition key column to ensure that data cannot be inserted outiside of the range of the partitions. This is just another precaution to make sure that you keep the empty partition available.

    So it looks like you are already seeing some performance improvements with your queries. You mentioned that your partitioned table would need to be joined to several other tables that would not be partitioned and that the join would not be on the date partition key. Have you done any testing with query performance when joining those tables to your partitioned table? If so, can you share with us?

    I don't want to get too long winded, but just something to note. Run some sample queries in SSMS and watch the execution plans. Now that you've partitioned your table, you'll notice that the execution plans show you which partitions your query is touching. Queries that include the partitioning key (date) should show you that the query execution was limited to only those partitions for that date range. Neat stuff.

    John Rowan

    ======================================================
    ======================================================
    Forum Etiquette: How to post data/code on a forum to get the best help[/url] - by Jeff Moden

  • Im attaching a file where you can see the partitionId of the Join. It uses the Clustered Index. I actually tried using another index that is partitioned as well with the key partition criteria. The execution plan shows same result with either index.

    In the case where my partitioned table join with more than 4 non partitioned tables. I dont see difference in performance. The execution plan shows the partition ID for the condition WHERE for only the partitioned table, but the timing is the same. I dont know what to do in this case.

    note: Are you sure is not a big deal to have file for filegroup as I showed it? I can change it.. Im having fun learning how to orginize the data better. I actually restructured everything becasue I changed my range and i partitioned 3 indexes of that table πŸ™‚

  • MTY-1082557 (11/6/2009)


    Im attaching a file where you can see the partitionId of the Join. It uses the Clustered Index. I actually tried using another index that is partitioned as well with the key partition criteria. The execution plan shows same result with either index.

    In the case where my partitioned table join with more than 4 non partitioned tables. I dont see difference in performance. The execution plan shows the partition ID for the condition WHERE for only the partitioned table, but the timing is the same. I dont know what to do in this case.

    I would try to run a few queries where you need to join your partitioned table to the non-partitioned tables accross a few different date ranges to see how it affects the execution plan and the query times just to ensure there is not a negative impact on those types of queries from incorporating the partitioning.

    note: Are you sure is not a big deal to have file for filegroup as I showed it? I can change it.. Im having fun learning how to orginize the data better. I actually restructured everything becasue I changed my range and i partitioned 3 indexes of that table πŸ™‚

    I don't think it's reallly a big deal. It really more of a maintenance preference. With multiple file groups, you'll have some additional benefits with your backup/restore operations. I'd consider this before reverting back to a fewer # of files/filegroups. I can't remember if this is available in 2005 or only 2008 (I think only 2008), but you can have filegroups that are offline and keep your production DB up. So if you lost a disk that only had historical data on it, your current data would still be available. I designed a system in 2008 that worked with partitioning and I created a file and filegroup for each month's data. I made sure to name the filegroups and files intuitively so that we could look at the file and know which month's data lived in it. The downside to this is that after 2 years, you'll have 24 files to manage.

    Again, its really a personal preference.

    John Rowan

    ======================================================
    ======================================================
    Forum Etiquette: How to post data/code on a forum to get the best help[/url] - by Jeff Moden

Viewing 7 posts - 16 through 21 (of 21 total)

You must be logged in to reply to this topic. Login to reply