Foreign Keys

  • Tom.Thomson (4/7/2011)


    I regard anyone who allows NULL in unique columns as crazy (presumably David you agree with me, as you think anyone who allows NULL anywhere is crazy). I wonder if other people have seen Hugo's connect proposal to change SQL Server for the worse so that it conforms to the SQL standard on this - there's a surprisingly (and worryingly) small number of votes against it.

    I don't think people who use nulls are crazy. I just think that a database without nulls is a more accurate and usually more practical way to represent information.

    I've never seen the sense of nullable unique constraints either but if we must have them then perhaps it's better that DBMSs follow the SQL standard rather than invent new null weirdness of their own. At least if we have one standard then more people might be more likely to understand it and avoid its pitfalls. Deprecating nullable unique constraints altogether is probably a safer option though.

  • I don't think people who use nulls are crazy. I just think that a database without nulls is a more accurate and usually more practical way to represent information.

    but there are lots of things that are "unknown" in this world, and therefore require the value NULL in the database sense.

    for example, take a list of people where their birth date is stored in one column and the date of their death stored in another. That's fine for people who are already dead πŸ™‚

    But what about people who are living, what do you suggest storing in the "date of death column" - there's only one answer - it's unknown (and therefore NULL)

    unless of course you know something that we all don't πŸ™‚

    looking forward to your response.

  • Tom.Thomson (4/7/2011)


    I wonder if other people have seen Hugo's connect proposal to change SQL Server for the worse so that it conforms to the SQL standard on this - there's a surprisingly (and worryingly) small number of votes against it.

    Tom,

    I posted a new suggestion on Connect, which you might prefer to vote for:

    http://connect.microsoft.com/SQLServer/feedback/details/658638/dont-allow-nullable-columns-as-part-of-unique-constraints

  • martin catherall (4/11/2011)


    for example, take a list of people where their birth date is stored in one column and the date of their death stored in another. That's fine for people who are already dead πŸ™‚

    But what about people who are living, what do you suggest storing in the "date of death column" - there's only one answer - it's unknown (and therefore NULL)

    Not at all. If a person is alive then we know they don't have a date of death at the present time. Indicating that the date of death is unknown would suggest that they were dead but that we didn't know when they died, which is not accurate. Why on earth would you want to store the date of death of a living person?

  • Not at all. If a person is alive then we know they don't have a date of death at the present time.

    Exactly, they don't have a date of death - so it is unknown. so what value do you suggest putting in the "date of death column" (I would have though that an unknown in this column means they are still alive πŸ™‚ )

    Indicating that the date of death is unknown would suggest that they were dead but that we didn't know when they died, which is not accurate.

    ok, I'll agree that you might not know the date of death in all cases - but most lightly you do!

    I would personally look at it the other way and say not knowing a "date of death" means a person is still alive (potentially we are deriving different meaning from the null value)

    Why on earth would you want to store the date of death of a living person?

    I agree, you would not! but you would want you update the column when the person died.

    sorry, it's not my intention to be picky, but NULL means unknown and there are a lot of things that are unknown.

    for example, how much money do you have in you pocket right now.

    There's only one correct answer, if you don't know - then that's a null value.

    what date did you product ship? if you don't know then it's NULL, which would suggest the product has not shipped to me.

  • martin catherall (4/11/2011)


    sorry, it's not my intention to be picky, but NULL means unknown and there are a lot of things that are unknown.

    Null does not mean unknown according to the semantics defined by the ISO SQL standard and used by SQL Server - a fact which has been extensively discussed in SSC and elsewhere many times. In any case, nulls are no excuse for silliness like having a date of death attribute for living people. That would just be lousy data modelling and has nothing to do with the purpose of nulls.

  • David Portas (4/11/2011)


    Tom.Thomson (4/7/2011)


    I wonder if other people have seen Hugo's connect proposal to change SQL Server for the worse so that it conforms to the SQL standard on this - there's a surprisingly (and worryingly) small number of votes against it.

    Tom,

    I posted a new suggestion on Connect, which you might prefer to vote for:

    http://connect.microsoft.com/SQLServer/feedback/details/658638/dont-allow-nullable-columns-as-part-of-unique-constraints

    Yes, that is roughly what I would like to see. I've voted for it and added a comment.

    On a somewhat related topic, I occassionally wonder how best to put together a proposal to allow Foreign Keys to reference multiple columns (when I'm not wondering that it's because I've forgotten - again - that MS doesn't support this) - the current T-SQL syntax defined here doesn't allow multiple columns and this forces use of a surrogate key any time a natural key with more than one column is the natural target of a reference constraint. I suspect there are a lot of people who would vote against such a proposal because they have absolute faith that surrogate keys are always the right approach, although there are certainly cases where using a surrogate key will increase overall storage requirements and reduce overall performance (just about any case where the ratio of references to entries in the target table is low enough will exhibit this behaviour).

    Tom

  • Tom.Thomson (4/11/2011)


    On a somewhat related topic, I occassionally wonder how best to put together a proposal to allow Foreign Keys to reference multiple columns (when I'm not wondering that it's because I've forgotten - again - that MS doesn't support this)

    Microsoft SQL Server already does allow this and has done since at least 2000 I'm pretty sure.

    The following works for me (on 2005 and 2008):

    CREATE TABLE tbl1 (x INT NOT NULL, z INT NOT NULL, PRIMARY KEY (x, z));

    CREATE TABLE tbl2 (a INT NOT NULL PRIMARY KEY, x INT NOT NULL, z INT NOT NULL, FOREIGN KEY (x,z) REFERENCES tbl1 (x,z));

  • David Portas (4/11/2011)


    martin catherall (4/11/2011)


    for example, take a list of people where their birth date is stored in one column and the date of their death stored in another. That's fine for people who are already dead πŸ™‚

    But what about people who are living, what do you suggest storing in the "date of death column" - there's only one answer - it's unknown (and therefore NULL)

    Not at all. If a person is alive then we know they don't have a date of death at the present time. Indicating that the date of death is unknown would suggest that they were dead but that we didn't know when they died, which is not accurate. Why on earth would you want to store the date of death of a living person?

    martin catherall (4/11/2011)


    Not at all. If a person is alive then we know they don't have a date of death at the present time.

    Exactly, they don't have a date of death - so it is unknown. so what value do you suggest putting in the "date of death column" (I would have though that an unknown in this column means they are still alive πŸ™‚ )

    Indicating that the date of death is unknown would suggest that they were dead but that we didn't know when they died, which is not accurate.

    ok, I'll agree that you might not know the date of death in all cases - but most lightly you do!

    I would personally look at it the other way and say not knowing a "date of death" means a person is still alive (potentially we are deriving different meaning from the null value)

    So you think François Villon is alive, because we certainly don't know his date of death? Well, I suspect you are wrong because if he's alive he'll be celebrating his 580th birthday sometime this year. If you've ever had to deal with a database of French poets you will have come across this one; similar things happen with a database of composers. And something similar happens with a database of plays: should we say that if we don't have the date of first performance we can assume the play was never performed? Anyway, there are issues with assigning particluar meanings to the single NULL of SQL, and these cause a lot of maintenance headaches down the line. The theory behind NULLs is pretty complex too, and there is no real agreement on how absent data should be handled in relational systems (we even have some people rejecting Multiple Value Logics (where Multiple means more that 2) which of course would preclude the use of NULL altogether).

    Why on earth would you want to store the date of death of a living person?

    I agree, you would not! but you would want you update the column when the person died.

    sorry, it's not my intention to be picky, but NULL means unknown and there are a lot of things that are unknown.

    for example, how much money do you have in you pocket right now.

    There's only one correct answer, if you don't know - then that's a null value.

    what date did you product ship? if you don't know then it's NULL, which would suggest the product has not shipped to me.

    Things like date of death are why Ted Codd wanted several distinct NULLs, instead of the single one that got into SQL. RM2 had a NULL-i and NULL-a, which of course doesn't quite work: Codd's Null-i meant "we know there is no value because it's inapplicable; NULL-a meant "we know this is applicable but we don't know the value"; but there was no way of directly expressing "we don't even know whether this is applicable or not", which is actually quite a common case.

    Codd's two NULL system does allow us to handle this third case sometimes, by an indirect method. Put the date of death into a separate table, with a foreign key constraint pointing to the original table. If there's no entry in the date of death table, we don't know whether the person is alive or dead; if there's an entry and the date of death is specified or is NULL-a, we know the person is dead; if there's an entry and the date of death is NULL-i, the person must be alive (how else can the date of death be inapplicable?).

    With single NULL of SQL (as in Codd's first proposal for introducing NULL into relational calculus) You have to invent an extra column to tell you whether or not you know something; and if you want to cover all possibilities, it can be quite messy. It's sometimes best to express inapplicability by splitting to a separate table, and have the absence of an entry in that table mean that the attribute(s) concerned is(are) inapplicable for this key value. Then SQL's single NULL could mean "it's applicable but we don't know the value" and you don't need a separate sort of NULL for inapplicable; but you can't express "we don't even know whether it's applicable or not" without introducing an extra column. On the other hand, you might decide that the absence on an entry in the scond table means "we don't know whether it's applicable or not" and a NULL in that table means "either we don't know or it's inapplicable", and then you need an extra column in one of the two tables to tell you whether it's applicable or not. That demonstrates two radically different meanings for the single NULL, and of course schema designers generally do assign such meanings to NULL and in cases where they don't need to distinguish all the possibilities (or, more often, incorrectly believe that they don't) and don't add the extra column it can be quite difficult some time later to unravle what the data, with its NULLs, actually means. Because people have all sorts of different intentions when they use this single sort of NULL, it's best to treat it as meaning no more or less than "we don't have a value for this attribute in this database" and not allowing it to mean "inapplicable" or "unknown" or whatever else some designer dreampt up as a novel and clever use for NULLs.

    The conclusion that David and I both draw from this sort of thing is that it is best to avoid NULLs wherever possible; where we differ is that I think it's often not possible while he thinks it's almost always possible. But both of us would put date of death into a separate table so as not to need to hold it for a person known to be living, because it reduces both the number of NULLs that will occur and the ambiguity of what those NULLs mean (provided proper documentation is done and incorporated into the SQL scripts where it won't get lost).

    Tom

  • martin catherall (4/11/2011)


    But what about people who are living, what do you suggest storing in the "date of death column" - there's only one answer - it's unknown (and therefore NULL)

    I agree with David that NULL does not mean "unknwon". NULL is a marker to represent missing data - so in the case of a "Date of Death" column, it neither means "unknown", nor "not applicable (still living)", but "no date of death registered for this person".

    Of course, it may mean more in the context of a specific application. Just as the value "31" may mean more than just the numeric value in some applications (for instance, in telecommunications, it represents the Netherlands).

    What I think David means with his replies is that the data model should not have combined these atrtbiutes into a single table. You'd either have a table with both birthday and date of death for dead people, and a table with only date of birth for living people; or a table with only date of birth for all people, and a table with only date of death for dead people.

    That is a design pattern that can in fact completely eliminate all nulls from all tables. It can not eliminate nulls from the language though, since they'll return as soon as you perform an outer join on those tables to get a combined report. Completely eliminating nulls requires a completely different language to replace SQL. There are languages that (claim to) achieve this goal, but none of them is quite as successfull as SQL. My guess is that this is because the representation with NULLs is much closer to how we humans see tabular data in our heads - with empty cells where data is missing, and not with seperate tables for all columns that might not be populated for all rows.

    I have voted against David's sugestion on Connect. There are use cases where you want a value in a column to be either missing, or unique.


    Hugo Kornelis, SQL Server/Data Platform MVP (2006-2016)
    Visit my SQL Server blog: https://sqlserverfast.com/blog/
    SQL Server Execution Plan Reference: https://sqlserverfast.com/epr/

  • Tom.Thomson (4/11/2011)


    On a somewhat related topic, I occassionally wonder how best to put together a proposal to allow Foreign Keys to reference multiple columns (when I'm not wondering that it's because I've forgotten - again - that MS doesn't support this) - the current T-SQL syntax defined here doesn't allow multiple columns and this forces use of a surrogate key any time a natural key with more than one column is the natural target of a reference constraint.

    You should read the entire syntax descriptiopn.

    The first constraint syntax doesn't support creating a multi-column foreign key coonstraint, as that is the syntax for creating column-constrains. And no column-constraint can be multi-column. That syntax does not allow you to create a two-column primary key or unique constraint either.

    But if you scroll down, you'll find that the syntax for creating table-constraints allows you to create a multi-column foreign key constraint just fine. As David's example demonstrates.

    Tom.Thomson (4/11/2011)


    Things like date of death are why Ted Codd wanted several distinct NULLs, instead of the single one that got into SQL. RM2 had a NULL-i and NULL-a, which of course doesn't quite work: Codd's Null-i meant "we know there is no value because it's inapplicable; NULL-a meant "we know this is applicable but we don't know the value"; but there was no way of directly expressing "we don't even know whether this is applicable or not", which is actually quite a common case.

    (snip)

    I've seen this argument many times, and I'm constantly amazed that people fall for it. Why would you want to combined the optional attribute "date of death" and the optional attribute "reason date of death is missing" in a single column? That would violate first normal form!

    Or do you also try to squeeze "report mark" and "reason for given report mark" into a single column?


    Hugo Kornelis, SQL Server/Data Platform MVP (2006-2016)
    Visit my SQL Server blog: https://sqlserverfast.com/blog/
    SQL Server Execution Plan Reference: https://sqlserverfast.com/epr/

  • Hugo Kornelis (4/12/2011)


    What I think David means with his replies is that the data model should not have combined these atrtbiutes into a single table. You'd either have a table with both birthday and date of death for dead people, and a table with only date of birth for living people; or a table with only date of birth for all people, and a table with only date of death for dead people.

    That is a design pattern that can in fact completely eliminate all nulls from all tables.

    Hugo,

    How does that design pattern represent dead people with an unknown date of death without using NULL?

  • UMG Developer (4/12/2011)


    Hugo Kornelis (4/12/2011)


    What I think David means with his replies is that the data model should not have combined these atrtbiutes into a single table. You'd either have a table with both birthday and date of death for dead people, and a table with only date of birth for living people; or a table with only date of birth for all people, and a table with only date of death for dead people.

    That is a design pattern that can in fact completely eliminate all nulls from all tables.

    Hugo,

    How does that design pattern represent dead people with an unknown date of death without using NULL?

    Assuming that the distinction between "dead, date unknown" and "alive" is not relevant (i.e., we work with the dates, and if they're not there, too bad, move on) - exactly the same (with probably idfferent table names).

    If the distinction is important, then there either is yet another table, or an extra attribute to explicitly distinguish the dead from the living.

    Note that I never said I endorse this design pattern. I think NULLs are the best (or, if you don't like that word in this context, the least bad) way to handle real-world scenarios, where data can be missing for a variety of reasons.


    Hugo Kornelis, SQL Server/Data Platform MVP (2006-2016)
    Visit my SQL Server blog: https://sqlserverfast.com/blog/
    SQL Server Execution Plan Reference: https://sqlserverfast.com/epr/

  • Hugo Kornelis (4/12/2011)


    UMG Developer (4/12/2011)


    How does that design pattern represent dead people with an unknown date of death without using NULL?

    Assuming that the distinction between "dead, date unknown" and "alive" is not relevant (i.e., we work with the dates, and if they're not there, too bad, move on) - exactly the same (with probably idfferent table names).

    If the distinction is important, then there either is yet another table, or an extra attribute to explicitly distinguish the dead from the living.

    Note that I never said I endorse this design pattern. I think NULLs are the best (or, if you don't like that word in this context, the least bad) way to handle real-world scenarios, where data can be missing for a variety of reasons.

    OK, I just wanted to make sure I wasn't missing something. (I've been known to do that.)

    I think using NULLs is fine as long as the usage is known by the people using the DB. Like using a NULL in the death date table to indicate dead with an unknown date.

  • Hugo Kornelis (4/12/2011)


    Tom.Thomson (4/11/2011)


    Things like date of death are why Ted Codd wanted several distinct NULLs, instead of the single one that got into SQL. RM2 had a NULL-i and NULL-a, which of course doesn't quite work: Codd's Null-i meant "we know there is no value because it's inapplicable; NULL-a meant "we know this is applicable but we don't know the value"; but there was no way of directly expressing "we don't even know whether this is applicable or not", which is actually quite a common case.

    (snip)

    I've seen this argument many times, and I'm constantly amazed that people fall for it. Why would you want to combined the optional attribute "date of death" and the optional attribute "reason date of death is missing" in a single column? That would violate first normal form!

    Or do you also try to squeeze "report mark" and "reason for given report mark" into a single column?

    I think you've misunderstood horribly what I said. I said Codd's RM2 scheme prevented us from expressing something that we often need to express. Nothing about making attributes nonatomic (Codd had already done that to a very limited extent by introducing his "inapplicable" NULL; but his real mistake was to get rid of his real NULL and substitute an "applicable" NULL).

    I'm not suggesting squeezing anything into a column: what I am saying is that by forcing a column to mean one of "this attribute is know to be applicable and this is its value", "the attribute is known to be applicable and its value is not present", or "this attribute is known to be inapplicable and its value is not present" - what Codd suggested in RM2 - you arrive at a situation where another column can't (without introducing a clear and direct inconsistency) permit the meaning "we don't know whether that attribute over there is applicable or not" because whatever value (or version of NULL) you put over there implies one of "we know it's applicable" and "we know it is not applicable", so the two columns would contradict each other.

    There are three ways out: one is to have a single NULL whose sole meaning is "the value of this attribute is not present in your view of the database"; another column can then indicate reason (if any) for absence of the value; the other is use a different two null system (one null meaning inapplicable, the other meaning only "not present in your view"); and the third way is to use Codd's two NULLs and add a third which means only "not present in your view".

    The interesting point is that every workable system has a NULL which means exactly and only "the value of this attribute is not present in your view of the database". You can either add some additional versions of NULL or not add them (adding extra sorts of NULL doesn't stop the system working) but if adding them leads to attributes no longer being in some sense atomic it will be a bad thing.

    Codd originally favoured (workable version of) the single NULL approach. I favour either the workable version of the two null system or the workable single NULL system (not SQL's NULL, which is an abomination to any logician or mathematician, including while he was still with us Codd), depending on what I had for breakfast or something equally unpredictable and unreasonable. Dace decided that adding a third value was what was needed to fix the issue, but then concluded (I don't know why, but his published arguments on this were incredibly weak arm-waving) that this leads to an infinite regress so you need an infinite collection of NULLs. I seem to recall that Codd ended up favouring the workable two NULL system but I can't find documentation for that.

    Tom

Viewing 15 posts - 31 through 45 (of 49 total)

You must be logged in to reply to this topic. Login to reply