Normalizing vs Denormalizing

  • I am sorry if this is a question which has been answered before many times, but i am starting to stress out about what to do.

    I have taken on a project to update a database for a company from visual foxpro to Sql Server. When I started I knew the basics of database design, i.e normalization etc. So I started to design their tables using these theories, until I had lots of tables with few columns.

    For example I have a contacts table, where I split out the data that may be null, which resulted in 6 tables.

    Table 1

    customerid

    forename

    surname

    table2 (may not exist for every customer)

    customerid

    DOB

    Age

    Gender

    UKTaxPayer

    table3 (a customer can have more than 1 address)

    customerid

    Address1

    Address2

    Address3

    Address4

    Address5

    Country

    Postcode

    table4 (a customer can have more than 1 email)

    customerid

    Email

    EmailType (e.g home, work etc)

    DefaultEmail (Y/N)

    Notes

    table5 (a customer can have more than 1 phone number)

    customerid

    Phone

    PhoneType (e.g home, work mobile)

    DefaultPhone (Y/N)

    Notes

    table6

    customerid

    Organisation

    Position

    All these tables link by customerid, and some are 1:1 with table1 and some 1:many. However if I wanted to get all the data about a customer out, that would involve a 6 way join. This information needs to be accessed frequently so would it be best to recombine the tables and live with the fact that a lot of the fields will be null? The contact table also has to link to info about subscriptions and financial history? Have i just overcomplicated it by making too many joins necessary? Is there a maximum number of table joins recommended? I want to get the design right from the start, and don't want to denormalize if I don't have to, but I also don't want to have an inefficient database with too many joins.

  • First I would suggest you went to far with table 2 and would just make part of table 1 as the data does not duplicate it is just nullable data. This will lower you to 5 joins.

    Next I was not sure I understand Table 6. If the poitn was that multiple customer can have the same orginization then I would have had the table be for orginization with a unique id for each orginization and have that value be a FK column in Table 1. Make sure you have properly normalized based on your data.

    Also, consider machines are much faster than the days when 4 was the suggested limit on joins. I personally do the full normalization and see how well performance holds up. If I see significant and unacceptable data processing delays I would then consider denormalizing for sake of performance. If you denormalize start with the 1:1 relationships first.

  • thanks for the prompt reply! Is it better then to have data that may be null for many customers in table1, than splitting it out to table2? Table6 is for up-to-date and historic data, i.e if a customer changes company we want to retain their old company data.

  • Not really but your concern was the number of joins. But you do get benniffits from having the nullable data in the other table in that you don't waste extra bits they do not need. I just didn't think about it.

    Not sure of you plans with that table but I would drop the age column from storage and do it in a view calculated.

  • I agree with Antares - normalize fully, then see how it works. More often than not, it works well!

    Andy

    http://qa.sqlservercentral.com/columnists/awarren/

Viewing 5 posts - 1 through 4 (of 4 total)

You must be logged in to reply to this topic. Login to reply