Database Design Follies: NULL vs. NOT NULL

  • Sounds like the requirements are missing. Null requirements? However, if the data is really missing, each column might require custom treatment because of business requirements. In this case, using a non-null token for missing would likely result in using something like a CASE statement instead.

    I might be tempted to use a sparse column and translate a very popular value to null to save space, but only as a last resort. (Storing trillions of counters where most values are 0?) This would be confusing to somebody not familiar with the design.

    Another good one is for a programmer to use enums in the application and only store the integer value in the database. Real ugly when it's time to change the enum or use it outside the program in a report.

    Perhaps a view should always be required on tables allowing nulls - or tokens - to insure that the proper business logic to translate these is present someplace? The debate might be over nulls, but the pain might actually be missing or unknown business requirements.

    Randy
    Helpdesk: Perhaps Im not the only one that does not know what you are doing. 😉

  • [[The debate might be over nulls, but the pain might actually be missing or unknown business requirements.]]

    One could argue that allowing nulls facilitates poor definition of requirements. Not to mention poor thinking about the information involved...

    ken

  • kenambrose (3/9/2016)


    [[The debate might be over nulls, but the pain might actually be missing or unknown business requirements.]]

    One could argue that allowing nulls facilitates poor definition of requirements. Not to mention poor thinking about the information involved...

    ken

    Having 100% of the data available the instant a record is created is an unrealistic requirement. For example what about optional data?

  • Correct, and that is NOT a requirement for a schema that disallows nulls. why would you possibly think it was?

  • kenambrose (3/9/2016)


    Correct, and that is NOT a requirement for a schema that disallows nulls. why would you possibly think it was?

    At the end of the day a null represents that a piece of data is unknown, it doesn't matter whether that is stored as a null value, some random place holder value or an outer join. A null value is just the most straight forward way to show that.

  • At the end of the day a null represents that a piece of data is unknown, it doesn't matter whether that is stored as a null value, some random place holder value or an outer join. A null value is just the most straight forward way to show that.

    Somewhere at the begginig it was said that rows in a table (tuples in a realtion) represent only TRUE propositions (declarations). If 'a piece of data is unknown' it just does not fit the edfinition of a tupe/relation, table/row. That is why in many books you can find 'tables are not relations, unles constructed in a specific way to behave just as relations behave'. There is no legal requirement to adhere to relational theory, and since we can construct physical tables in any way we want, often we do not adhere to the relational theory. Does it matter? If you believe that ignoring Pithagors's theorem or that division by zero is OK, there is no legal requirement to adhere to laws of mathematics, then of course it does not matter, anyes, you can use NULLs or tokens to tell that a piece of data is unknown'. If you accept that 'unknown' is permitted in logic, than anything goes, anything can be true as long as you do'nt know the actual truth.

    Examples, for thos ewho still remeber a high school algebra. Left side of the expression reads exactly as the right side, so this true for any number X, integer, decimal, real, even comlex.

    X2 - X2 = X2 - X2

    We can factor both sides, in a different way (high school algebra):

    X (X-X) = (X-X)(X+X) which is absolutely correct. Now, we simplify teh expresion by dividin both sides by the common factor (x-x) which gicves us:

    X = X + X

    From here we may conclude that

    1X = 2X, or 1=2.

    But we know that 1 is not equal two. Otherwise, we would gladly accept 50% pay cut, after all $1 is the same as $2, we just proved it. Well, there is aflaw in our proof. In step 2, we divided both sides by (x-x) which is always 0 (zero) for any kind of numbers. If we ignore mathematical (logical) law 'thou shallt not divide by zero' we can come to outrageous conlusions.

    Same with pure logic. If something is unknown, or a piece of data is unknown, we cannot prove it is TRUE. And since database is a collection of TRUE statements, facts, that miisng piece does not belong there. How to deal with data pieces we may miss, that is a question for the database designer, and people have provided answers in this discussion.

    But, since there is no legal obligation to follow laws of logic, do as you please, at your peril.

    🙂

  • unfortunately, our profession does not have any licensing standards.

    anyone can claim to be a programmer. anyone can claim to be a database "developer" or designer, or database "engineer" (LOL)

    If there _was_licensing in our industry, a LOT of the nonsense we hear in discussions like this be a thing of the past. Heck,in my state, a construction worker can't do a job for more than $300 without having a license.

    How the heck do they let people claim to be database professionals without the correct training??

  • Zidar (3/10/2016)

    Somewhere at the begginig it was said that rows in a table (tuples in a realtion) represent only TRUE propositions (declarations). If 'a piece of data is unknown' it just does not fit the edfinition of a tupe/relation, table/row.

    Lots of things get said, doesn't mean they are true. In the real world, there are cases where the truth is not known but that fact still needs documented. An employee/contractor/whatever has a birthday whether I currently know what it is or not.

    SQL DBA,SQL Server MVP(07, 08, 09) "Money can't buy you happiness." Maybe so, but it can make your unhappiness a LOT more comfortable!

  • removed

  • One final comment on this subject, and the reason I originally mentioned TBD, UNK, and N/A in the first place, is that null tells you ONE thing (data is missing). But it doesn't tell you WHY.

    Flag values can tell you three whys (at least). For our database environment we have a need to know why a value would otherwise be Null. Is the data missing because it's not been put in yet? Or because there's no way to find out? Or because it doesn't apply in this case?

    The other advantage to flag values/tokens is that they can be included in bi-value logic, they don't REQUIRE tri-value logic like null does. Flag values equal themselves, unlike null. Flag values for RIDs (foreign keys that are identifiers) can be made part of the range or excluded with a simple change of parameter values. (I use -2, -1, and 0 with "real" keys being positive, but -2, -1, and 0 are ALSO records in the foreign table).

    For these reasons (especially avoiding tri-value logic) I find flag values extremely useful.

    Are they perfect? Of course not. They can't be used in small and/or dense domains, they require just a little more logic (depending on the exact situation and domain) and depending on the exact domain they can provoke ambiguity.

    But nulls (shudder), nulls are *nasty*.

    The only place I ever use nulls is if a record has an optional n/varchar(max) field for optional notes. Other than that--

  • funny, they don't have all the problems you mention for licensing of plumbers, electricians, doctors, lawyers, accountants, etc. etc. etc.

    wonder why you are so convinced it would cause the world to end for our industry??

  • In the logical design, the business logic might require handling missing data and reasons for missing data. Using tokens -2, -1 , 0, etc. is one way of handling nLvl logic via denormalization. (The value column and the reason columns have been combined into one "value" column because either the value or the reason is defined, not both, in any given record. If either a phone number or SSN is defined, but not both, then use one column for both? Okay in some cases, but not others? Using a null is worse?) The physical infrastructure provides one option of tri-level logic via nulls. You can choose to use the null for missing and add logic for the remainder of the nLvL logic, or handle it all without nulls. Adding perplexity to complexity can simplify a single query. (If the business requirements require this, then suck it up.)

    The argument that nulls are too complicated does not fly with me. The null logic is well defined and deterministic. A new developer is likely to make all kinds of mistakes. We don't disable other features (e.g., cursors, temp/local tables, scalar UDFs, NOLOCK, triggers, etc.) because of this.

    BTW, with the non-null design, either the "value" column table or the "reason" column table will have a record, not both. If you want either records with values or with reasons, but not both, then there are no nulls. If you deal with a report and want all columns via left joins, then there will be nulls. You can choose how to apply the business logic to the value or reason - even translating the value and reason into one column without nulls. If you use tokens instead of nulls, then you have hard coded business logic into the data. Applying the business logic will require knowing how it was already applied in those cases where the logic already applied no longer exactly applies.

    Randy
    Helpdesk: Perhaps Im not the only one that does not know what you are doing. 😉

  • why do the same flawed arguments keep popping up?

    [[If you deal with a report and want all columns via left joins, then there will be nulls. ]]

    We don't know the difference yet between a report and a data relation/table?

  • kenambrose (3/10/2016)


    why do the same flawed arguments keep popping up?

    [[If you deal with a report and want all columns via left joins, then there will be nulls. ]]

    We don't know the difference yet between a report and a data relation/table?

    This is beginning to head into something unpleasant.

  • kenambrose (3/10/2016)


    why do the same flawed arguments keep popping up?

    [[If you deal with a report and want all columns via left joins, then there will be nulls. ]]

    We don't know the difference yet between a report and a data relation/table?

    The idea that one should never allow NULLs is one of those alleged cures that's far, far worse than the purported disease. A NULL value is not a corruption of data. NULLs do indeed complicate logic, but, as noted by others, that's not a sufficient reason to make the situation even worse by trying to go around them completely.

    Otoh, physically storing "magic" values like 'TBD'/'NA'/'UNK' in a column is data corruption. If NULL is not sufficient, use a separate column instead. Gotta think a one-byte tinyint would be plenty (thus no neg values).

    SQL DBA,SQL Server MVP(07, 08, 09) "Money can't buy you happiness." Maybe so, but it can make your unhappiness a LOT more comfortable!

Viewing 15 posts - 106 through 120 (of 137 total)

You must be logged in to reply to this topic. Login to reply