Normalizing vs Denormalizing

Question

Normalizing vs Denormalizing

taffy

Ten Centuries

Points: 1067
More actions
April 11, 2003 at 2:11 am

#79834

I am sorry if this is a question which has been answered before many times, but i am starting to stress out about what to do.
I have taken on a project to update a database for a company from visual foxpro to Sql Server. When I started I knew the basics of database design, i.e normalization etc. So I started to design their tables using these theories, until I had lots of tables with few columns.
For example I have a contacts table, where I split out the data that may be null, which resulted in 6 tables.
Table 1
customerid
forename
surname
table2 (may not exist for every customer)
customerid
DOB
Age
Gender
UKTaxPayer
table3 (a customer can have more than 1 address)
customerid
Address1
Address2
Address3
Address4
Address5
Country
Postcode
table4 (a customer can have more than 1 email)
customerid
Email
EmailType (e.g home, work etc)
DefaultEmail (Y/N)
Notes
table5 (a customer can have more than 1 phone number)
customerid
Phone
PhoneType (e.g home, work mobile)
DefaultPhone (Y/N)
Notes
table6
customerid
Organisation
Position
All these tables link by customerid, and some are 1:1 with table1 and some 1:many. However if I wanted to get all the data about a customer out, that would involve a 6 way join. This information needs to be accessed frequently so would it be best to recombine the tables and live with the fact that a lot of the fields will be null? The contact table also has to link to info about subscriptions and financial history? Have i just overcomplicated it by making too many joins necessary? Is there a maximum number of table joins recommended? I want to get the design right from the start, and don't want to denormalize if I don't have to, but I also don't want to have an inefficient database with too many joins.

Viewing 5 posts - 1 through 4 (of 4 total)

You must be logged in to reply to this topic. Login to reply

Antares686 SSC Guru Points: 125444 More actions · Answer 1

First I would suggest you went to far with table 2 and would just make part of table 1 as the data does not duplicate it is just nullable data. This will lower you to 5 joins.

Next I was not sure I understand Table 6. If the poitn was that multiple customer can have the same orginization then I would have had the table be for orginization with a unique id for each orginization and have that value be a FK column in Table 1. Make sure you have properly normalized based on your data.

Also, consider machines are much faster than the days when 4 was the suggested limit on joins. I personally do the full normalization and see how well performance holds up. If I see significant and unacceptable data processing delays I would then consider denormalizing for sake of performance. If you denormalize start with the 1:1 relationships first.

taffy Ten Centuries Points: 1067 More actions · Answer 2

thanks for the prompt reply! Is it better then to have data that may be null for many customers in table1, than splitting it out to table2? Table6 is for up-to-date and historic data, i.e if a customer changes company we want to retain their old company data.

Antares686 SSC Guru Points: 125444 More actions · Answer 3

Not really but your concern was the number of joins. But you do get benniffits from having the nullable data in the other table in that you don't waste extra bits they do not need. I just didn't think about it.

Not sure of you plans with that table but I would drop the age column from storage and do it in a view calculated.

Andy Warren SSC Guru Points: 119747 More actions · Answer 4

I agree with Antares - normalize fully, then see how it works. More often than not, it works well!

Andy

http://qa.sqlservercentral.com/columnists/awarren/

Andy
SQLAndy - My Blog!
Connect with me on LinkedIn
Follow me on Twitter