pro and con about foreign key constraint

  • Hi, all

    what's the pros and cons about enforcing foreign key constraint? For my case, the table will be insert or update only through store procedures using lookup tables (no bad value will be insert into the tables). So, what's the pros and cons for me to keep the foreign key constraint here?

    Thanks

  • The foreign key enforces the data at a level below the proc.

    This means that, if someone tries to delete data from the primary table, without a foreign key constraint, it will delete, and you'll have orphans in the foreign key table. With a foreign key, you'll either get an error, or a cascading delete that cleans up the foreign key table.

    I've had to go in and clean up databases where, "it's all in procs from the lookup tables, so we don't need foreign keys or check constraints", and they can be a serious mess. Even worse is, "we control it all through the front-end code, and the business rules layer, so we don't have any rules or procs at all in the database because we don't need them". That statement is conceptually identical to, "our database is a nightmare and probably about as stable and performant and reliable as an LSD trip". At least, every time I've heard that, that's what's turned out to be the case. (Even that wasn't as bad as the place where the DBA told me he wasn't even sure how many database servers were in the building, or what they all were for.)

    - Gus "GSquared", RSVP, OODA, MAP, NMVP, FAQ, SAT, SQL, DNA, RNA, UOI, IOU, AM, PM, AD, BC, BCE, USA, UN, CF, ROFL, LOL, ETC
    Property of The Thread

    "Nobody knows the age of the human race, but everyone agrees it's old enough to know better." - Anon

  • iceheartjade (2/5/2008)


    Hi, all

    ...no bad value will be insert into the tables...

    I guess your organization, unlike where I work, has developers that never make mistakes.

  • Michael Valentine Jones (2/5/2008)


    iceheartjade (2/5/2008)


    Hi, all

    ...no bad value will be insert into the tables...

    I guess your organization, unlike where I work, has developers that never make mistakes.

    Even if they're perfect on the inserts, who says they're perfect on the deletes? what about key updates?

    There are several very large threads on here (most of which turned into "religious wars") revolving around this. The biggest hammer I come up with is actually two-fold: it goes against the core definition of a DBMS (see Codd's rules as to Integrity Independence: as quoted from wikipedia's version of the rules, "Integrity constraints must be specified separately from application programs and stored in the catalog. It must be possible to change such constraints as and when appropriate without unnecessarily affecting existing applications. "), and - you're one bad day away from screwing up your DRI. And then ol' Humpty Dumpty is there, because you probably won't be able to put it all back together (without restoring the DB).

    ----------------------------------------------------------------------------------
    Your lack of planning does not constitute an emergency on my part...unless you're my manager...or a director and above...or a really loud-spoken end-user..All right - what was my emergency again?

  • I see all pros here. Any cons? Like performance wise. Will it be faster to load the table without constraints? I don't believe without foreign keys will screw up your system for sure. If you have a well designed ETL system no bad data will be loaded into your tables.

  • iceheartjade (2/5/2008)


    I see all pros here. Any cons? Like performance wise. Will it be faster to load the table without constraints? I don't believe without foreign keys will screw up your system for sure. If you have a well designed ETL system no bad data will be loaded into your tables.

    It's a perfection standard I don't want to be held to. I don't build systems without them, or at least no front-ended, gui-griven, heavy on the OLTP systems. My job is to safeguard the data first and foremost, and that includes the relationships, so DRI isn't an option in my book. Yes - there can be a cost in performance which during bulk uploads processes like true ETL's might take a hit, but in my mind - it's worth it.

    If absolutely necessary, during those ETL's only, I might disable the constraints during the final load phase, but only after I've preloaded the data into a temporary table and ascertained that the constraint would in fact work against the data in play (meaning - I load the temp table, create the constraint WITH CHECK, and when it establishes correctly, THEN I push the data into the real table.)

    It's a different ball game when it's reporting or data-warehousing data, but then again - that data has by far and large previously lived in a related OLTP system, so it should STILL be able to pass the foreign constraint test.

    ----------------------------------------------------------------------------------
    Your lack of planning does not constitute an emergency on my part...unless you're my manager...or a director and above...or a really loud-spoken end-user..All right - what was my emergency again?

  • I pretty much agree with Matt... yeah, I'm all about speed and performance... but only when I've done job #1 which is to protect the data at any and all costs. Once I've established DRI for tables, I do things pretty much by the book... I don't even disable it for known good inserts because I also have to practice what I preach... I won't let anyone else do it so I can't even when I'm shore I'dmake nun mistrakes 😛

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.
    "Change is inevitable... change for the better is not".

    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)
    Intro to Tally Tables and Functions

  • The other thing I had read about recently was about SQL Server's treatment of constraints and whether or not it trusts them.

    You may be aware that apart from referential integrity constraints there are also check constraints and the like. Every constraint in the database can be disabled and later re-enabled. If you've disabled a constraint and modified any of the tables upon which that constraint constrains (:D) then SQL Server can no longer trust that constraint. The query optimiser uses trusted constraints to sometimes make things more efficient.

    If you don't have the constraints in the first place then the optimiser never has anything to work with and may perform less than optimal selects.

    See these two articles...

    http://sqlblog.com/blogs/tibor_karaszi/archive/2008/01/12/non-trusted-constraints-and-performance.aspx

    http://sqlblog.com/blogs/tibor_karaszi/archive/2008/01/12/non-trusted-constraints.aspx

    Of course, a constraint can slow down inserts, etc as the query plan involves a check to ensure the constraint is enforced. If performance really is a problem, make sure you have appropriate indices created on your tables so that the constraint can be checked as efficiently as possible. Additionally, if you do disable the constraints, somewhere in one of those articles it talks about how you can get MS SQL to ensure the constraint is still valid so that it can again be trusted.

    Finally, I personally define all constraints 🙂 I too have worked with other systems that don't have constraints mainly because they want to do a simple DTS transfer tables task and not worry about the order in which the tables are transferred. That's both lazy and silly because, as just described, they could have disabled or dropped the constraints and then re-enabled (and re-checked) or recreated the constraints after the transfer. That system often has ugly bits of data left over that shouldn't exist 🙁

    Another reason to have your constraints (after my "finally" above) is that many client tools, such as report writers, etc, can use them to help you join the tables. Whilst you may know how they join, an end-user making a simple report has a much easier time if the joins are already defined for them. My 10c 🙂

  • My take on this is that the database is responsible for the integrity of the data within it, regardless of the procedures/business layer/front end that's used.

    It's often the case that a database lasts longer than the application initially written for it, so even if the app currently in use is 100% perfect when it comes to datya integrity (yeah, right), that may not be the case in the future.

    Re performance. Inserts, updates and deletes may be slightly slower because the constraints have to be checked. If the foreign key columns are indexed (as per recomendations) then it's not that large a hit.

    Also, if there are constraints on the data (ones that are trusted), the optimiser can sometimes use those constraints to generate better exec plans because it knows something about the data.

    Gail Shaw
    Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
    SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

    We walk in the dark places no others will enter
    We stand on the bridge and no one may pass
  • "Whilst you may know how they join, an end-user making a simple report has a much easier time if the joins are already defined for them."

    Tell me more about that -- what products try to use foreign key constraints to provide (suggest?) joins.

    (Just for those keeping score, I don't define foreign keys in my database that gets refreshed biweekly from mainframe flat files; the overhead of validating keys would buy me nothing other than a failed load: I've got to use what comes in as it comes in, and detect the errors later.)

  • Jim Russell (2/7/2008)


    (Just for those keeping score, I don't define foreign keys in my database that gets refreshed biweekly from mainframe flat files; the overhead of validating keys would buy me nothing other than a failed load: I've got to use what comes in as it comes in, and detect the errors later.)

    I'm curious as to why you can't check them as they're inserted. I used to manage an SQL2K warehouse that took nightly loads of production data from a cobol db we had to crack. Due to the size of the db, it had to be indexed for reporting needs. We line item failed packages through active x scripts in dts with a fail cap of 200 (I think) per table. When I left that position that db was at about 750GB and the nightly inserts were costing in the neighborhood of 2 1/2 hours on a very good stand alone server that had separate partitions for temp db/log/data.

    On the failed items, we would write the failure to a log and investigate it (which was usually a data type issue because the cobol system did a terrible job of data type checking) and we'd pick up the change the following evening.

    I'm green with SSIS, but I'm sure there's a similar method to what we used in our DTS packages so many moons ago. IMHO, there is no reason to not have your parent/child relationships specified in the db. If you're not using the db for what it's designed to do, you might as well write your own flat file db and save yourself the license cost.

  • Jim Russell (2/7/2008)


    "Whilst you may know how they join, an end-user making a simple report has a much easier time if the joins are already defined for them."

    Tell me more about that -- what products try to use foreign key constraints to provide (suggest?) joins.

    (Just for those keeping score, I don't define foreign keys in my database that gets refreshed biweekly from mainframe flat files; the overhead of validating keys would buy me nothing other than a failed load: I've got to use what comes in as it comes in, and detect the errors later.)

    Actually - I'm not so sure it's the constraints themselves or the diagrams - but there are lots of products smart enough to read one or the other to draw out the relationships. Crystal, BusinessObjects, Access/Excel just to name a few.

    ----------------------------------------------------------------------------------
    Your lack of planning does not constitute an emergency on my part...unless you're my manager...or a director and above...or a really loud-spoken end-user..All right - what was my emergency again?

  • Matt Miller (2/7/2008)


    Jim Russell (2/7/2008)


    "Whilst you may know how they join, an end-user making a simple report has a much easier time if the joins are already defined for them."

    Tell me more about that -- what products try to use foreign key constraints to provide (suggest?) joins.

    (Just for those keeping score, I don't define foreign keys in my database that gets refreshed biweekly from mainframe flat files; the overhead of validating keys would buy me nothing other than a failed load: I've got to use what comes in as it comes in, and detect the errors later.)

    Actually - I'm not so sure it's the constraints themselves or the diagrams - but there are lots of products smart enough to read one or the other to draw out the relationships. Crystal, BusinessObjects, Access/Excel just to name a few.

    It's the constraints. I've put Access on top of databases (as a reporting tool), and if the constraints are defined, it will auto-detect the relationships and make it much easier for end-users to build ad hoc reports. It will also assume relationships on tables with identical column names, even if they aren't related, in some cases. But the FKs give it exactly what it's looking for.

    - Gus "GSquared", RSVP, OODA, MAP, NMVP, FAQ, SAT, SQL, DNA, RNA, UOI, IOU, AM, PM, AD, BC, BCE, USA, UN, CF, ROFL, LOL, ETC
    Property of The Thread

    "Nobody knows the age of the human race, but everyone agrees it's old enough to know better." - Anon

  • I'm pro DRI, and support the arguments presented above. One con I haven't seen mentioned is that sometimes you need to index the FK in the table with the FK, which might be somewhat expensive.

    Obviously you must have a PK or unique key on the referenced table. I'm talking about the other table, the one with the FK.

    For instance say your 100M row InvoiceHeaders table as a StateCode column and has an FK to the States lookup table. There is no business requirement for a StateCode index on the InvoiceHeaders table, so there isn't one. The business gets its 1st American Samoan customer, and a clerk mistakenly adds a new State with StateCode (the PK of States) = 'SM' instead of 'AS'. OK so far. But when they realize the mistake and go to either update or delete & replace this row in the States table, the entire InvoiceHeaders table has to be scanned to verify no rows reference 'SM'. The same goes for all other tables that reference States with an FK. This seemingly innocuous single row States update could well block most activity on the server for a few minutes while all rows in all referencing tables are scanned.

    Obvious enough stuff, but certainly can be a shock the first time it catches you off guard on a busy production system. I guess you can avoid it by outlawing deletions or PK updates in lookup tables like this. Or you can add an otherwise unnecessary index to all the referencing tables. Or you can say "this is pretty rare" and live with the occasional blocking and timeouts. So far I've done a little of all three.

  • Matt Miller (2/7/2008)


    Jim Russell (2/7/2008)


    "Whilst you may know how they join, an end-user making a simple report has a much easier time if the joins are already defined for them."

    Tell me more about that -- what products try to use foreign key constraints to provide (suggest?) joins.

    (Just for those keeping score, I don't define foreign keys in my database that gets refreshed biweekly from mainframe flat files; the overhead of validating keys would buy me nothing other than a failed load: I've got to use what comes in as it comes in, and detect the errors later.)

    Actually - I'm not so sure it's the constraints themselves or the diagrams - but there are lots of products smart enough to read one or the other to draw out the relationships. Crystal, BusinessObjects, Access/Excel just to name a few.

    Sure the constraints help a system build the diagram, but that is not the same as automating joins. (Nor could it be.)

Viewing 15 posts - 1 through 15 (of 18 total)

You must be logged in to reply to this topic. Login to reply