Do We Need a PK?

  • How is a unique index defined and used as a PK handled in code? I'm slightly confused as to what they are doing in code? Does Oracle not define PKs or handle them differently? I wouldn't think that the developer's code would be impacted by PK/FK relationships.

    It's also dicey in that if you expect there not to be orphaned children somewhere, and the developers make a mistake, this can creep in and without checks specifically to look for this, you'd miss it.

    Of course, I have a different philosophy. I think you should be platform dependent. If you can make sales on 2 platforms, hire a guy that makes changes specifically for the 2nd platform. He should pay for himself out of sales.

  • A table with a single row needs a primary key to insure that there can only be one row. The normal way to accomplish this is to define a primary key, and add a check constraint on the primary key column to limit it to a single value.

    At a theoretical level, any table that allows duplicate values is not in first normal form, so any table without a primary key is not in first normal form. It is a pretty radical departure from standard database design practice to create a table that is not in first normal form, so I think it is up to the designer to justify it and prove that you are better off without it.

    I have yet to see a design where having a primary key caused a problem, but I have seen plenty where the lack of a primary key caused a problem.

  • Here is an example where a PK might be detrimental to performance: a compound PK consisting of two GUIDs with the next three columns being ints and the last one a bit. This table is hideously slow, and it is the table with the most reads, updates and inserts. We're considering not defining those columns as PKs for a test. Being a small company we can't really afford to throw hardware at it...

  • Just my humble observation - and one that I haven't tested in SQL 2005 - is that a lack of a PK causes this issue in Enterprise Manager, assuming there is no other mechanism to prevent duplicate rows in the table:

    1. You have a table without a PK.

    2. The table allows you to enter a row that is an exact duplicate of an existing row.

    3. You now have two identical rows and you try to delete one of them.

    4. An error is thrown:

    "Key column information is insufficient or incorrect. Too many rows were affected by update."

    The workaround is to use a delete statement in Query Analyzer. But I'm not sure how many client applications (such as MS Access) would have this problem and result in users not being able to delete records from an application if they needed to, and suddenly it is a bug or ticket that lands on your plate.

    Suppose that problem happens to interfere with some other work that needs to get done, and it happens going into a weekend or a holiday and no one is around to fix the problem for hours or days.

    Maybe this is not as big a problem as it seems. But it seems to me that unless there is a defensible reason why a PK should not have been on that table (such as an approved way to mitigate performance issues), I imagine a DBA would be in big trouble having to explain to their boss that they could easily have added a PK but didn't think they needed to.

    webrunner

    -------------------
    A SQL query walks into a bar and sees two tables. He walks up to them and asks, "Can I join you?"
    Ref.: http://tkyte.blogspot.com/2009/02/sql-joke.html

  • Classic confusion of keys vs. indices. The purpose of a PK is to provide referential integrity; access is provided by an index. One simple case where a PK is not wanted or needed is a many-to-many resolution table. Since it contains only FK references to other tables, the only possible PK is the combination of all the FK fields, essentially duplicating the entire table. Unless you plan to use that as an FK somewhere else (unlikely but possible), the PK has no purpose. If you want to ensure there are no duplicate rows, use a unique constraint on the combined FK columns. Your penalty is again, duplicating the entire table in an index.

  • GDI Lord (12/11/2007)


    Here is an example where a PK might be detrimental to performance: a compound PK consisting of two GUIDs with the next three columns being ints and the last one a bit. This table is hideously slow, and it is the table with the most reads, updates and inserts. We're considering not defining those columns as PKs for a test. Being a small company we can't really afford to throw hardware at it...

    I doubt that those columns being a primary key is the cause of your performance problem. It is more likely that those columns being a clustered index is causing the problem. A primary key may or may not be a clustered index.

    Having randomly generated guid columns in the clustered index will cause new rows to be inserted in random order. This will cause page splits, tend to slow inserts, and make table scans very slow.

    You might consider ordering the table by an clustered index identity column, so that new rows are inserted in sequential order.

  • I did work on a database where we had a table with jut one row, to hold the company's billing address, phone number, primary contact name, contact email, etc. The table was used to display information on invoices, etc., and we had a small application so that administrative people could update the information, but we didn't allow inserts into the table. So, no, we didn't define a primary key on this table.

    This is the only kind of table I can think of where I wouldn't need or want a primary key. Any table with more than one row, or the ability to insert more rows, should have a primary key.

  • Barry McConnell (12/11/2007)


    Classic confusion of keys vs. indices. The purpose of a PK is to provide referential integrity; access is provided by an index. One simple case where a PK is not wanted or needed is a many-to-many resolution table. Since it contains only FK references to other tables, the only possible PK is the combination of all the FK fields, essentially duplicating the entire table. Unless you plan to use that as an FK somewhere else (unlikely but possible), the PK has no purpose. If you want to ensure there are no duplicate rows, use a unique constraint on the combined FK columns. Your penalty is again, duplicating the entire table in an index.

    The primary key is needed to ensure that you do not have duplicates, so it is an absolute requirement. Implementing is as a unique constraint is a possibility, but it is still really just a primary key, so you might as well make it a primary key.

    If you make the primary key clustered, the data is not duplicated because the table becomes the leaf-level of the index. It is usually good to also create another unique constraint (or unique index) with the columns reversed in order to give fast access in each direction from child to parent and parent to child.

  • Barry McConnell (12/11/2007)


    Classic confusion of keys vs. indices. The purpose of a PK is to provide referential integrity; access is provided by an index. One simple case where a PK is not wanted or needed is a many-to-many resolution table. Since it contains only FK references to other tables, the only possible PK is the combination of all the FK fields, essentially duplicating the entire table. Unless you plan to use that as an FK somewhere else (unlikely but possible), the PK has no purpose. If you want to ensure there are no duplicate rows, use a unique constraint on the combined FK columns. Your penalty is again, duplicating the entire table in an index.

    Wait, I just want to clarify - by confusion do you mean my post about the duplicate rows error in Enterprise Manager?

    I thought the purpose of a PK was to uniquely identify each row in a table (entity integrity), whether or not that key appears as an FK in any other tables. Is this wrong?

    Thanks,

    webrunner

    -------------------
    A SQL query walks into a bar and sees two tables. He walks up to them and asks, "Can I join you?"
    Ref.: http://tkyte.blogspot.com/2009/02/sql-joke.html

  • In Barry McConnell's case, I think I would still define a primary key, using an identity column, assuming that rows are actively inserted into this table. -- well, now that I've said this... changing my mind on using an identity column. Presumeably, the foreign keys would be unique, so a composite primary key would prevent duplicate rows. Having a hard time picturing a table that would have more than 3 such foreign keys, so this seems workable.

  • Carla Wilson (12/11/2007)


    I did work on a database where we had a table with jut one row, to hold the company's billing address, phone number, primary contact name, contact email, etc. The table was used to display information on invoices, etc., and we had a small application so that administrative people could update the information, but we didn't allow inserts into the table. So, no, we didn't define a primary key on this table.

    This is the only kind of table I can think of where I wouldn't need or want a primary key. Any table with more than one row, or the ability to insert more rows, should have a primary key.

    The problem with that design is that nothing prevents inserting another row in the table, and causing major problems in the application.

    As I mentioned in a previous post, the normal way to accomplish this is to define a primary key, and add a check constraint on the primary key column to limit it to a single value, thus limiting the table to a single row.

    As an additional safeguard, I would also add a before delete trigger on the table to prevent the row from being deleted.

  • Michael Valentine Jones (12/11/2007)


    Carla Wilson (12/11/2007)


    I did work on a database where we had a table with jut one row, to hold the company's billing address, phone number, primary contact name, contact email, etc. The table was used to display information on invoices, etc., and we had a small application so that administrative people could update the information, but we didn't allow inserts into the table. So, no, we didn't define a primary key on this table. ...

    The problem with that design is that nothing prevents inserting another row in the table, and causing major problems in the application.

    As I mentioned in a previous post, the normal way to accomplish this is to define a primary key, and add a check constraint on the primary key column to limit it to a single value, thus limiting the table to a single row.

    As an additional safeguard, I would also add a before delete trigger on the table to prevent the row from being deleted.

    I've done single row tables frequently. My preferred method is to use an identity column defined as a PK, add the one record, then add a constraint that the identity column must = 1. No more records can be inserted, and it also cancels deletes IIRC.

    I don't know that an index, clustered, unique, or otherwise, is going to speed up or optimize any better on a single row table. Oh no! It's going to RBAR a single row!

    We have to remember the difference between a PK and a unique constraint, and that there can be multiple unique constraints on a table. For example, an employee table might have an identity column as employee number, and also contain a social security number for payroll purposes. Both should have unique constraints, the employee# should probably be the PK as you don't want SSN appearing in multiple tables as a rule.

    Another example would be a table that stores latitude/longitude information. If your requirements define that there will only be one entry per lat/long pair, then you put a unique index on the pair. At the same time, if that table participates in other relationships, you'll probably want an identity or other field as the table PK as your lat/long pair is going to be kind of long for PK/FK use.

    But as far as never having a PK, I would agree with the line of thought that it might be appropriate for a reporting heap that is not going to be related to anything else. Yes, it might violate 1NF, but if a table is not transactionally active, how critical is that? I've seen lots of tables used for report generation that fell into this category. At the same time, I would probably have several non-unique indexes on that heap to give the reporting tool all the help it can get.

    I believe there are times that it is justifiable to violate normalization, and reporting is one of them. Our ERP system does it, but it should be very limited in implementation for all of the reasons that we do normalize.

    -----
    [font="Arial"]Knowledge is of two kinds. We know a subject ourselves or we know where we can find information upon it. --Samuel Johnson[/font]

  • Michael Valentine Jones (12/11/2007)

    The problem with that design is that nothing prevents inserting another row in the table, and causing major problems in the application.

    As I mentioned in a previous post, the normal way to accomplish this is to define a primary key, and add a check constraint on the primary key column to limit it to a single value, thus limiting the table to a single row.

    As an additional safeguard, I would also add a before delete trigger on the table to prevent the row from being deleted.

    I have a table just like that, and I handled it in the way you describe. However, you don't have to do it this way - a unique index with constraint would work just as well, or even a trigger with no index that checks the row count.

    In my eyes, it's not about whether you CAN have a table without a PK, it's about why you wouldn't want one being as there is almost no overhead to it.

  • webrunner (12/11/2007)


    Wait, I just want to clarify - by confusion do you mean my post about the duplicate rows error in Enterprise Manager?

    I thought the purpose of a PK was to uniquely identify each row in a table (entity integrity), whether or not that key appears as an FK in any other tables. Is this wrong?

    Thanks,

    webrunner

    The reason the PK needs to be unique is so that referential integrity can be maintained, ie the FK must point to exactly one row.

    By confusion, I was referring to the original scenario. PK/FKs are solely for maintaining referential integrity. Unfortunately so many DBMS vendors and users have confused the roles of PK/FK, indices, and constraints that it is a jumbled mess. When you declare a PK, you also get automatically an index and a constraint, but that doesn't make them equivalent. Let's take a simple example and look at what we actually want and need.

    Two entities: employees and projects, business rule = an employee can be assigned to multiple projects and vice versa

    Need: three tables - employees, projects, and assignments. employees and projects need PKs because they will be participating in referential integrity. Assignments will consist of two FKs and needs 3 indices (one on each FK and a combined one), and a unique constraint on the combined index. Assignments does not need a PK since it will never be a FK in another table and thus no reason to uniquely identify each row. The single column indices provide fast access for the typical query/join performed on this table, i.e. given one FK for a column find all the corresponding FKs of the other column. The uniqueness constraint solves the duplicate row problem which is not the same as uniquely identifying a row.

  • It seems the question is slightly ambiguous due to a confusion over whether a PK is a theoretical construct or a database element.

    If we look at the PK as a theoretical construct, the question is: Should we ever have a table that could contain the exact same information in more than one row? I would say that the answer here is no, but there's a huge caveat. Identifying people. If you are storing information on a large enough set of people you will run into the huge problem that there is no way to truly distinguish individuals (With 6.5B people it is fairly likely that there is at least one set of people with the same name, born on the same day, in the same city). So what do you do? In that case there is no natural primary key across your data.

    If we look at the PK as a database element then we are asking if there is any situation that is not helped by having a PK. Single row tables might not need a key (does a key on a single row table help the optimizer?) Any small automated table that is used in its entirety would likewise not necessarily need a PK to function properly. Tables containing only freeform text fields (e.g. note taking tables) might not need a pk, and indeed might not have a natural pk.

    --

    JimFive

Viewing 15 posts - 16 through 30 (of 57 total)

You must be logged in to reply to this topic. Login to reply