Database Design- IS NOT NULL always better choice?

  • Matt Miller (#4) (1/31/2010)When I say X is NULL, it means I know nothing about it: I don't even know if it's something that can HAVE a value.

    Does that mean you would not use null to represent the fact that you don't know someone's age? You know that every person has an age, even if you don't know what it is. Then again, in the case where you don't even know that X can have a value then why would you bother putting a null in the database for it? Using null in that case is a form of "bundling" two different types of fact together into a single table. As a general rule I would want to represent that information with a tuple which doesn't contain X.

    It would be fun and perhaps instructive to make a list of all the different things that people say null means to them. It would be an extremely long list and I think all of the different definitions would be incompatible with SQL's own logic.

    I recall reading that after Codd claimed there were exactly two possible null markers someone came up with a list of several hundred more that did not fit within his definitions.

  • David Portas (1/31/2010)


    Matt Miller (#4) (1/31/2010)When I say X is NULL, it means I know nothing about it: I don't even know if it's something that can HAVE a value.

    Does that mean you would not use null to represent the fact that you don't know someone's age? You know that every person has an age, even if you don't know what it is. Then again, in the case where you don't even know that X can have a value then why would you bother putting a null in the database for it? Using null in that case is a form of "bundling" two different types of fact together into a single table. As a general rule I would want to represent that information with a tuple which doesn't contain X.

    It would be fun and perhaps instructive to make a list of all the different things that people say null means to them. It would be an extremely long list and I think all of the different definitions would be incompatible with SQL's own logic.

    I recall reading that after Codd claimed there were exactly two possible null markers someone came up with a list of several hundred more that did not fit within his definitions.

    Yes, you have an age. However, if you don't provide it (or your birthdate) how can I compare it to other ages, or use it in a mathematical calculation such as AVG? If I DON'T have it, it is UNKNOWN, and should not be stored in my database as anything other than NULL; not zero, or any other possible value.

  • Lynn Pettis (1/31/2010)


    David Portas (1/31/2010)


    Matt Miller (#4) (1/31/2010)When I say X is NULL, it means I know nothing about it: I don't even know if it's something that can HAVE a value.

    Does that mean you would not use null to represent the fact that you don't know someone's age? You know that every person has an age, even if you don't know what it is. Then again, in the case where you don't even know that X can have a value then why would you bother putting a null in the database for it? Using null in that case is a form of "bundling" two different types of fact together into a single table. As a general rule I would want to represent that information with a tuple which doesn't contain X.

    It would be fun and perhaps instructive to make a list of all the different things that people say null means to them. It would be an extremely long list and I think all of the different definitions would be incompatible with SQL's own logic.

    I recall reading that after Codd claimed there were exactly two possible null markers someone came up with a list of several hundred more that did not fit within his definitions.

    Yes, you have an age. However, if you don't provide it (or your birthdate) how can I compare it to other ages, or use it in a mathematical calculation such as AVG? If I DON'T have it, it is UNKNOWN, and should not be stored in my database as anything other than NULL; not zero, or any other possible value.

    We're now degenerating to the academic, but another way to look at it is - if the age is unknown, how do you actually know that the person HAS an age? Depending on the definition used, Age is something that is only a valid measurement during the person's lifespan, so someone not yet born or someone who has passed technically would not have an age.

    Bottom line still comes back to - it's an implementation choice. You choose to represent all "invalid values" or unknowns with a tuple with a myriad of distinctions: I tend to choose that implementation only when it's important to dig into WHY there isn't a valid value, and rely on NULL otherwise.

    And yes - there are multiple definitions of NULL, since everyone seems to put their own spin on things. This is why I find it useful to have a "rules of the road" discussion about how it is to be used during certain projects, so that we don't end up with conflicting results. But then again - there are lots of concepts with somewhat fluid definitions, and we still use them even if they happen to be approximation of the "real" thing. From my perspective, NULL represents enough of a concept that it can be used, effectively, in a production setting. Some of the other options (6NF, D3, 3rd manifesto and all of its variants) might have some theoretica promise, but just don't scale worth a damn, and just are not currently workable and will never be until someone devises efficient persistence and retrieval mechanisms.

    ----------------------------------------------------------------------------------
    Your lack of planning does not constitute an emergency on my part...unless you're my manager...or a director and above...or a really loud-spoken end-user..All right - what was my emergency again?

  • First, I just took an example that someone else started.

    Second, very well said, Matt, and I agree.

  • Lynn Pettis (1/31/2010)


    First, I just took an example that someone else started.

    Second, very well said, Matt, and I agree.

    (I wasn't disagreeing with you, just using your point as a starting point so answer David).

    On the implementation side, your immediate point is right on: if it's not a valid value I like to make sure I cannot have it play into my calculations, which NULL tends to do quite elegantly.

    ----------------------------------------------------------------------------------
    Your lack of planning does not constitute an emergency on my part...unless you're my manager...or a director and above...or a really loud-spoken end-user..All right - what was my emergency again?

  • David Portas (1/31/2010)


    Jeff Moden (1/30/2010)


    Heh... no, I'm not missing the point. I said virtually the same thing in the first paragraph of my last post.

    The thing that made me doubt it was this statement:

    "If you define "X" as being unknown and compare the same two unknowns, they are equally unknown"

    It's not a matter of Xs being "equally unknown". If X is a value (known or not) then it is equal to itself because all values are equal to themselves. If I say to you "X is my age. Is X equal to X?" then you can confidently reply Yes, whether or not you know how old I am. This is the behaviour which is not accurately modelled if you use null to indicate an unknown value. I've no clear idea of what you mean by "equally unknown".

    That's what I said. Well, unless the two unknowns are in parallel universes. 😉

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.
    "Change is inevitable... change for the better is not".

    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)
    Intro to Tally Tables and Functions

Viewing 6 posts - 16 through 20 (of 20 total)

You must be logged in to reply to this topic. Login to reply