Nullable Column Usage

  • When do you consider having a seperate table for nullable columns which will be sparsely populated.

    As an example:

    A Table contains Five(5) Nullable Columns of dataType datetime. They are only populated for 25 rows in the table with 89622 rows.

    The columns are only used in 0.028 % of rows

    I calculate 5 columns * 25 rows * 8 bytes = 1000 bytes used to store data

    Approximate bytes to track nulls in these columns = 89622 rows * 5 columns = 450000 bits / 8 =+-56000 bytes

    Simplistically we are using 55 times more space to track nulls than actual data.

    Back to the question:

    Do you consider moving columns from a table if the are only going to be populated for

    , 40%, 20%, 10%, 5%

    or the total rows in the table?

    Any ideas?

  • 56,000 bytes. (call it ~100kB, depending on your fill factor)

    If this hypothetical table were real, then you've already spent more of your company's/customer's money thinking about this than the cost of disk space to leave it alone.

    Implementing the additional table includes additional design ($), programming ($$$), testing ($), and regression testing ($).

    Leaving performance optimization aside - for which I'll move stuff all over the place for any number of reasons - I'll move a column out of a table if it doesn't make sense to have it there, depending on its use.  With such a tiny percentage of usage on those columns, they may have a clearer purpose and relationship to the main entity if placed in different related tables, or moved to a new table.  Then again, if these dates are status dates that affect the usage of a row's data across the application (such as a qurantine/spoil/do-not-use-this-stuff date), then it may very well belong right in there, even if rarely used.  There's less than 100,000 rows in that table, so the price is small.  If you're looking at adding a couple million rows, you're really hurting for space, and you want those columns indexed, then the benefits may start to outweight the costs.

    Now, if the table were not yet implemented, and you were still in the design phase, then you've got room to put things where they belong.  But since this is in place, I'd suspect the cost/benefit ratio is rather top-heavy.

    -Eddie

    Eddie Wuerch
    MCM: SQL

  • Thanks for the response.

    That was the example. My problem is I have front end developers who are making changes to the database ( small company, they've been here longer and was responsible for the System before I got here, but no DB knowledge ). Some of the tables affected range between 2 and 12 million records. I need some "industry standard" / best practice to convince management that decisions about these kinds of changes can not not be left to the front-end Developers....

  • I forgat to add, I have implemented a solutiopn for these ad-hoc columns or data, but the developers believe it is too difficult to use.... They have to execute a stored sproc to get the results.....see my problem?

  • I'm in a similar situation... I'm the first DBA for a small organization and, while much of the database design has been done well, there are areas that they have de-normalized to the point that indexes can't be used efficiently. 

    Example, we have a lot of tables with mutually exclusive foriegn keys... (i.e., if CenterID has a value, then RegistrationID is NULL, etc)

    The approach I'm using to educate the developers on database design is to have them review/submit query plans.  They are now seeing that table design affects performance (duh!)

    I've definitely seen an improvement on the new designs and, as time permits, when we are modifying functionality on an existing design, we are implementing some of the improvements needed to assist with performance.

    HTH,

    Leda

  • I cannot fathom that moving "mostly null" columns to a separate table would be better on any level than leaving the column in the table.  The space savings would be virtually nonexistant and probably even more having additional table definition overhead.  The queries would be more complex, take longer, and more resources to always join to that table to see if there were rows there, instead of just having null columns.  This appears to be a classic case of "penny wise and pound foolish".

    Remember the purpose of the database is to serve application systems.  Data storage is not the primary goal; data retrieval is.  Don't get too caught up on making the data more important than the functionality it provides through application systems.



    Mark

  • What you are saying is part of the problem. In the example used I would only have stored 25 rows as apposed to 100000, so the space does become significant especially since this is one of the smaller tables where I have this occuring. This data is also only retrieved once a day and never by the users using the application, they only see the results of the processes requiring it and never run the process. So more complexed( an Extra Join ) querie is not really an issue, this is required for a system procedure running after hours.

Viewing 7 posts - 1 through 6 (of 6 total)

You must be logged in to reply to this topic. Login to reply