To normalize or not to normalize?

  • Hi, I'm pretty new to database normalizations, but it seems like an attractive way to cut down on database size. So my question is, are there any disadvantages to normalizations? Normalization requires duplicated data to be stored in their own separate tables, so I'm wondering if there's a significant speed cost involved when you join potentially many tables back together again in a query to get one single table view.

  • The real question should be : "Should I denormalize or not?", not the other way around. On the long run, the wasted ressources caused by not normalizing the schema will far outweight the small gains you have for it in the selects.

  • The answer is usually "it depends".

    What sort of application is the database for ? An OLTP app with many ins/upd/del operations, or a data warehouse/datamart type app used for reporting & analysis ?

  • The data are collected in interview survey programs and populated to SQL Server acting as a warehouse. Some or all the data will need to be exported to SAS eventually for analysis. Because we are storing cancer research data from our local population, the size of the tables can get quite big, but the table relationships are simple.

    I am designing the tables as follows, and from what I understand, should be at least 3NF:

     

    Interview Table: contains listing of all the interviews, each row contains an interview ID and the name of the patient

    - intID (primary key)

    - intName (text)

     

    Question Table: contains description of all the multiple choice questions in the interview

    - questionID (primary key)

    - question (text)

     

    Answer Table: contains all the answers to the multiple choice questions, can be joined with the Question table to get the description of the question and the list of multiple choice answers for that question

    - answerID (primary key)

    - answer (text)

    - questionID (foreign key referencing Question Table)

     

    Response Table: contains all the recorded answers from an interview. THIS table can grow quite big because each recorded answer is stored in a row containing the interview ID and the answer ID. Can be joined with the other tables to get a more detailed DB view.

    - intID (foreign key referencing Interview Table)

    - answerID (foreign key referencing Answer Table)

     

  • I can't take the credit for the following but it does the job for me....

    "Normalize 'till it hurts... Denormalize 'till it works."

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.
    "Change is inevitable... change for the better is not".

    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)
    Intro to Tally Tables and Functions

  • Sounds like a data warehouse.

Viewing 6 posts - 1 through 5 (of 5 total)

You must be logged in to reply to this topic. Login to reply