Very large facttable

  • Hi,

    I'm working on a BI project to store surveys over the last 10 years. We've got a very large fact table. This fact table contains all question/answer combination per survey, per respondent. It's about 4,5 10^9 rows and the estimated size is about 150GB (without indexes). Has anyone any idea how we should 'organize' this? I guess we have to use partitioning, but are there other possibilities.

    As for an example: 1 survey contains about 250 questions and over 250.000 people has responded. And there are dozens of surveys :ermm:

    Thanks in advance

    Jos

  • I'd use partitioning on a huge table, where there is a time or other criteria on which a big part of data is rarely used/modified as it speeds up a bit queries on new data, but slows queries on full data and backups are smaller.

    Big tables are problem if you do a full backup, if you want delete significant part of it,...

  • Robert,

    Thanks for your answer. I'm going to try to use partitioning with the help of a dba.

  • You might want to post the CREATE statement for your fact table and maybe 10 or so rows of data (see the URL in my signature line for the right way to do that so we can help you better). Although I agree that partionining may be the answer, it might not depending on what your queries will look like. Might wanna post one of those, as well.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.
    "Change is inevitable... change for the better is not".

    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)
    Intro to Tally Tables and Functions

  • Jeff,

    I've posted a CREATE script for the facttable with a few records. I can't post any queries because there an no queries yet. In the future the customer wants to analyse the data by asking a question like: "Show me for a daterange (time not included) for a certain survey the answers.

    I'm wondering if I should use partitioning by date or by survey?

    Another question: Should I use a primary key?

    Thanks in advance

    Jos

  • jos.haemers (4/6/2008)


    Jeff,

    I've posted a CREATE script for the facttable with a few records. I can't post any queries because there an no queries yet. In the future the customer wants to analyse the data by asking a question like: "Show me for a daterange (time not included) for a certain survey the answers.

    I'm wondering if I should use partitioning by date or by survey?

    Another question: Should I use a primary key?

    Thanks in advance

    Jos

    Thanks for the file. Hmmm... if that the case, here's a couple of things that I might start with...

    First, yes, you not only should use a Primary Key, I believe it will be necessary for you to successfully do partitioning. Identities are ok for this but when you split the table, you'll need to add a range constraint to the PK column in order to get the partioning to work. Spend some time looking at the partioning methods in Books Online. They'll guide you as to how to pull of partioning with no guess work.

    Second, I think you have too many indexes for such a large collection of data... wait until you have a couple of the more commonly used queries designed and ready to go and the figure out what would be the most optimal composite indexes to support them. You might even be able to get away with just 1 or 2 depending, of course, on the queries.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.
    "Change is inevitable... change for the better is not".

    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)
    Intro to Tally Tables and Functions

  • - seems like only fk-in the fact ??

    - I've read this nice art regarding a technique often used in questionairs...

    "Introduction to Bitmasking in SQL Server 2005"

    http://qa.sqlservercentral.com/articles/Miscellaneous/2748/

    Maybe that can open some chanlenges ...

    Johan

    Learn to play, play to learn !

    Dont drive faster than your guardian angel can fly ...
    but keeping both feet on the ground wont get you anywhere :w00t:

    - How to post Performance Problems
    - How to post data/code to get the best help[/url]

    - How to prevent a sore throat after hours of presenting ppt

    press F1 for solution, press shift+F1 for urgent solution 😀

    Need a bit of Powershell? How about this

    Who am I ? Sometimes this is me but most of the time this is me

Viewing 7 posts - 1 through 6 (of 6 total)

You must be logged in to reply to this topic. Login to reply