Do You Really Need Surrogate Keys? Part 1 – Concepts and Methods

  • ebaya (11/4/2010)


    If your key changes, you update it.

    Yeah, along with all the data in all the tables that rely on it. Nice one.

    Random Technical Stuff[/url]

  • if you have to account for dead people then you'll probably need a multi-column PK. on the SSN column and on a name column or maybe add a bit column for alive/dead and use that

  • ta.bu.shi.da.yu (11/4/2010)


    ebaya (11/4/2010)


    If your key changes, you update it.

    Yeah, along with all the data in all the tables that rely on it. Nice one.

    Given you were the poster who thought the rules of normalization only affect one table at a time, I guess I shouldn't be surprised by this comment. If your PK changes, you change the PK only. Not the rest of the data in the table. If that PK is a FK in 2 other tables, you change one column only in each table. Not "all the data".

    Seriously, this isn't rocket science here.

  • ta.bu.shi.da.yu (11/4/2010)


    ebaya (11/4/2010)


    If your key changes, you update it.

    Yeah, along with all the data in all the tables that rely on it. Nice one.

    Sure, so if you have a key where you expect that to be a problem then you wouldn't reference it in other tables would you? You would use another key in the same table for the foreign key references (maybe a surrogate, maybe another natural key). So what you say is no argument at all against natural keys - it's just an argument for prudent design and choosing sensible foreign keys, which of course I agree with.

  • ebaya (11/4/2010)


    Robert Mahon-475361 (11/4/2010)


    That's what I don't get, an SSN doesn't uniquely identify a person.

    Yes it does. You're still thinking at the database level. But a database is simply a model of some real world set of requirements. The real world is what counts here. In the real world, no two people have the same SSN.

    A duplicate SSN is a data error. Changing your key doesn't "solve the problem". It simply pushes it up a level. You still have the data error -- but now you've let that error into your table, rather than keeping it out. To repeat, this is why that even if you choose to use a synthetic key for other reasons, you'd be wise to put a unique constraint on SSN anyway.

    ...

    Similarly, claiming "the natural key may change so I need a surrogate" is the hallmark of a lackwit. ....

    Two problems keep surfacing from this thread: confusion between a primary key and unique key constraints, and more fundamentally why the two are required in the first place.

    A primary key is required to identify a data item - it's purely physical and has nothing to do with the real world but everything to do with the data storage. This is where the physical model is insulated from the real world. This key should never change (the example Ebaya used about migrating data from one physical to another entails resetting the primary key and has no bearing on the above statement).

    A unique key, or multiple unique keys, are required to identify a valid real world entity. This has everything to do with the real world and nothing to do with the data storage. This is governed by business logic. Where *all* data is required to be stored for forensic analysis, no unique key is required but a primary key is. Using SSN as the example, one may want to store all SSNs along with other info and then filter out the duplicates. Putting a unique key here creates a new problem: where do you store the duplicates?

    Only in rare situations where these two keys can be combined into one, such as a calendar. But this does not lend any legitimacy for mixing the two keys together.

  • Michael Ebaya (11/4/2010)


    ta.bu.shi.da.yu (11/4/2010)


    ebaya (11/4/2010)


    If your key changes, you update it.

    Yeah, along with all the data in all the tables that rely on it. Nice one.

    Given you were the poster who thought the rules of normalization only affect one table at a time, I guess I shouldn't be surprised by this comment. If your PK changes, you change the PK only. Not the rest of the data in the table. If that PK is a FK in 2 other tables, you change one column only in each table. Not "all the data".

    Seriously, this isn't rocket science here.

    I wrote that wrongly - indeed you need to change only the foreign key that refernce it. I of course know that, I managed to write my last comment fairly badly.

    You're getting a bit abusive here dude. Perhaps you should chill?

    Random Technical Stuff[/url]

  • ta.bu.shi.da.yu (11/4/2010)


    You're getting a bit abusive here dude. Perhaps you should chill?

    Thanks, but I'm not a "dude", and if you don't wish to be abused, perhaps you shouldn't be condescending ... especially when you're the one who is wrong.

  • David Portas (11/4/2010)


    ta.bu.shi.da.yu (11/4/2010)


    ebaya (11/4/2010)


    If your key changes, you update it.

    Yeah, along with all the data in all the tables that rely on it. Nice one.

    Sure, so if you have a key where you expect that to be a problem then you wouldn't reference it in other tables would you? You would use another key in the same table for the foreign key references (maybe a surrogate, maybe another natural key). So what you say is no argument at all against natural keys - it's just an argument for prudent design and choosing sensible foreign keys, which of course I agree with.

    I'm glad you knew what I was talking about (obviously you don't change all the data in the table - can't believe I wrote that!) - 🙂

    But I don't see the point of the key if it is expected to change.

    I am really curious to know why it's a good idea to not enforce the immutability of the key! I've not seen a really good argument yet.

    Random Technical Stuff[/url]

  • Michael Ebaya (11/4/2010)


    ta.bu.shi.da.yu (11/4/2010)


    You're getting a bit abusive here dude. Perhaps you should chill?

    Thanks, but I'm not a "dude", and if you don't wish to be abused, perhaps you shouldn't be condescending ... especially when you're the one who is wrong.

    No, you are an anonymous poster who claims that he knows a great deal about databases, and not only that make claims about normal forms that aren't true. You've been showing a lot of condescending behaviour yourself, so I'd suggest that those who live in glasshouses shouldn't throw stones. Like for instance, you just wrote "If that PK is a FK in 2 other tables, you change one column only in each table" - I'm going to assume that you didn't actually mean that but you mean that when you change a PK value and there are two FKs that reference it, then you need to change the FK values in the other tables.

    Of course, if you have an immutable PK, then you'll never ever have to change the FKs in any tables. That's just good design - if you have a million row table and let's say 80% of your rows have values that reference your PK that you need to change, this could be a bit of a nightmare. And many folks reasonably add an index to the FK to improve performance, so every update to the base table will update the index - again, this is not a great way to go about doing things!

    And yet another consideration is the amount of effort you need to go towards actually ensuring that you update all the correct tables. Miss one and you could, again, be in trouble!

    Random Technical Stuff[/url]

  • Michael Wang (11/4/2010)


    Two problems keep surfacing from this thread: confusion between a primary key and unique key constraints, and more fundamentally why the two are required in the first place.

    A primary key is required to identify a data item - it's purely physical and has nothing to do with the real world but everything to do with the data storage. This is where the physical model is insulated from the real world.

    No, this is not a helpful distinction in any way. A primary key is just any one candidate key. A "primary" key is not fundamentally different to any other key (unless you want it to be) and all keys serve the same purpose of enforcing uniqueness for some attributes in a table. Keys are therefore logical by definition. Indexes are physical structures but keys are logical constraints and keys need not have anything to do with storage because the DBMS may use some other identifier for data in storage (as indeed SQL Server uses RIDs).

    To say a primary key doesn't have meaning makes no sense as a general rule. If I create a table with only one key and it has meaning then how would that not qualify as a primary key? What difference would it make whether I called it "primary" or not? In fact it makes no difference at all.

    The phrase "unique key" is obviously a tautology and doesn't explain anything useful. Key = Candidate Key = Primary Key. There is no difference.

  • ta.bu.shi.da.yu (11/4/2010)


    I am really curious to know why it's a good idea to not enforce the immutability of the key!

    This has already been covered in excruciating detail. First of all, a surrogate reduces the likelihood of an update, but it doesn't eliminate it entirely. More importantly, immutability is a desirable quality of key, but its not a requirement of a key. There are other desirable qualities that natural keys have (in some cases), and surrogates do not (again, in some cases). Neither is best in any and all cases.

    No, you are an anonymous poster who claims that he knows a great deal about databases, and not only that make claims about normal forms that aren't true.

    Such as what? Your little myth that "no normal form affects more than one table" has already been roundly debunked.

  • David Portas (11/4/2010)


    Michael Wang (11/4/2010)


    Two problems keep surfacing from this thread: confusion between a primary key and unique key constraints, and more fundamentally why the two are required in the first place.

    A primary key is required to identify a data item - it's purely physical and has nothing to do with the real world but everything to do with the data storage. This is where the physical model is insulated from the real world.

    No, this is not a helpful distinction in any way.

    ....

    Either you see it or you don't: the distinction is to separate your model world from the real world. It's where the real line is drawn.

  • David Portas (11/4/2010)


    The phrase "unique key" is obviously a tautology and doesn't explain anything useful. Key = Candidate Key = Primary Key. There is no difference.

    Hi David, I'm a bit confused by what you mean here! Are you referring to a key that is a candidate key that is a primary key?

    Random Technical Stuff[/url]

  • Michael Ebaya (11/4/2010)


    ta.bu.shi.da.yu (11/4/2010)


    I am really curious to know why it's a good idea to not enforce the immutability of the key!

    This has already been covered in excruciating detail. First of all, a surrogate reduces the likelihood of an update, but it doesn't eliminate it entirely. More importantly, immutability is a desirable quality of key, but its not a requirement of a key. There are other desirable qualities that natural keys have (in some cases), and surrogates do not (again, in some cases). Neither is best in any and all cases.

    No, you are an anonymous poster who claims that he knows a great deal about databases, and not only that make claims about normal forms that aren't true.

    Such as what? Your little myth that "no normal form affects more than one table" has already been roundly debunked.

    Actually, it hasn't. As has been pointed out to you a number of times, the formal definition of the normal forms relates to the tables themselves. You seem to be confused with how the normal forms work.

    Note that I'm not saying that what you say is a bad idea - if you have two tables that do the same thing, then that is quite silly and you should of course rationalize the table design.

    Random Technical Stuff[/url]

  • Michael Wang (11/4/2010)


    David Portas (11/4/2010)


    Michael Wang (11/4/2010)


    Two problems keep surfacing from this thread: confusion between a primary key and unique key constraints, and more fundamentally why the two are required in the first place.

    A primary key is required to identify a data item - it's purely physical and has nothing to do with the real world but everything to do with the data storage. This is where the physical model is insulated from the real world.

    No, this is not a helpful distinction in any way.

    ....

    Either you see it or you don't: the distinction is to separate your model world from the real world. It's where the real line is drawn.

    Sure but in the words of Chris Date it's a "purely psychological" distinction, calling a key "primary" being just an aide-mémoire or a convenient label. It's only as important as you want it to be. What I objected to was your implication that this was somehow a definition of a primary key - it obviously isn't because the name does not define any special type of key.

    ta.bu.shi.da.yu (11/4/2010)


    Hi David, I'm a bit confused by what you mean here! Are you referring to a key that is a candidate key that is a primary key?

    I'm referring to the fact that a primary key is just any one candidate key. All candidate keys are equal in the relational model and in SQL so the "primary" key doesn't imply anything fundamentally different from any other key.

Viewing 15 posts - 151 through 165 (of 178 total)

You must be logged in to reply to this topic. Login to reply