RE: many small tables vs. one large table??

  • Hello,

    I am dealing with 12 months of sales data 2-3M rows per month. I initially set them up as 12 seperate tables and have dealt with my queries through stored procedures (rarely change them). However, recently I have been running many ad hoc queries that require all 12 months of data....this makes for lenghty SQL.

    I am wondering if 12 2-3 million row tables perform better than one 30 million row table (indexes being equal)? Should I combine these into a single table for convenience sake?

    Please advise.

    Thank you,

    Lonnie

  • In this situation I would be inclined to put all of the data into one table and then partition the table with one partition for each month. Thereafter, when you query the table SQL Server will take care of which partitions it needs to read from and give you optimum performance, with the added benefit that you can query as many or as few months as you'd like to.

    Having the data split across 12 tables isn't very scalable and will provide more management overhead than using partitions - although you'll still need to manage the partitions and possibly use a sliding window mechanism with each new month.

    Hope that helps.

  • I'll second that.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.
    "Change is inevitable... change for the better is not".

    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)
    Intro to Tally Tables and Functions

  • SQLZ (7/23/2009)


    In this situation I would be inclined to put all of the data into one table and then partition the table with one partition for each month. Thereafter, when you query the table SQL Server will take care of which partitions it needs to read from and give you optimum performance, with the added benefit that you can query as many or as few months as you'd like to.

    Having the data split across 12 tables isn't very scalable and will provide more management overhead than using partitions - although you'll still need to manage the partitions and possibly use a sliding window mechanism with each new month.

    Hope that helps.

    Partitioning requires Enterprise Edition.

    Partitioning performance can be disappointing, especially in 2005.

    Splitting the data into 12 tables and using a partitioned view (available in all editions) will almost certainly perform better.

    Table partitioning in SQL Server is internally implemented as separate table structures anyway.

    Paul

    edit: also, a database containing a partitioned table (or even a partition scheme or function) prevents that database being attached, or restored, to an edition of SQL Server which is not Enterprise or equivalent. This can be extremely tedious if you use copies of the database for reporting purposes (for instance).

  • Good points, Paul.

    I like the idea of using a view to join the tables together if you need that.

    Are you querying by month that often? I'm just wondering if you need a current and archived table instead, letting SQL separate out things by table based on a date or a couple of columns (yr/mn)

  • Paul White (7/25/2009)


    Splitting the data into 12 tables and using a partitioned view (available in all editions) will almost certainly perform better.

    Steve Jones - Editor (7/25/2009)


    I'm just wondering if you need a current and archived table instead...

    I absolutely agree with both of those suggestions... we had a similar problem and I split the data at the 120 day mark (Telephony and Call Detail Records (CDR's))... we would call the table directly for the short term stuff and a partitioned view for the long term stuff. It helped even really good code never mind the junk code.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.
    "Change is inevitable... change for the better is not".

    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)
    Intro to Tally Tables and Functions

  • Hello All,

    Thanks for the suggestions. I have created three views of the data (based on the time frames I generally query). This will suffice for now but I believe I will eventually create one large table with partitions. I am curious about the archiving idea of Steve Jones....but not sure how to implement?

    Thanks again,

    Lonnie

Viewing 7 posts - 1 through 6 (of 6 total)

You must be logged in to reply to this topic. Login to reply