911: developers insist PK/FK constraints impede performance

  • The biggest problem I've experienced with PK/FK constraints is in attempting to export/import data via DTS (for example in moving a development database to a staging server for testing). However I agree that in my (over 15 years) experience working with relational databases (DB2, Oracle and SQL Server 7, 2000, and now 2005) I've never seen U,D,I statements in code (PL/SQL, SQL SP's) impede performance because of a FK constraint. In fact, the only way that I've seen Referential Integrity consistenly maintained is when PK/FK constraints are enforced. Higher level code can never trap all potential U,D,I errors that might cause a statement to fail.

    That being said, I've also seen over-use of UQ constraints imposed by zealous DBAs within tables (UQA-B-C, UQA-D-C-B) where they were just not required. And too many of those types of constraints *might* slow down performance (as the db grows and if they aren't properly thought out).

    So I tend to agree that it sounds like the developers are being lazy, don't understand the real function of PK/FK constraints (e.g., cascade deletes), or comes from the attitude of a previous, misinformed DBA.

    If it makes the developers happier, you can always point out that FK constraints can always be temporarily removed and then re-applied.

     

  • I've seen people fired for hand bombing RI for decades too...

    One issue tends to be that the developers act out of fear. In the bad old days dbms's could be a real bottleneck so people tended to avoid anything 'extra'. Regardless of the fact that all significant rdb platforms have made huge strides in performance people often act out of those lingering fears without actually testing what the performance hit really is. These fears can be passed on through generations of developers and can evolve into standard practice. Interestingly, the arguments against RI are an aweful lot like the arguments in favour of denormalizing - most of the time they are bunk.

    The other issue is that the costs of not using RI tend to be externalized. Developers tend not to bear the cost of screwed up data so it is easier for them to justify cutting the added development effort if RI were used. Force them to clean up the junk left behind by some legacy app every once in a while and see if they change their tune! (Be sure to include some of their own apps in the mix)

  • Maybe it's because I've been a consultant for so many years, I've gotten the opportunity to see so many different permutations of designs and problems...

    One case where DRI bogged massively (system not my design):

    It was voting/commenting/rating/forum software that on a regular basis moved, copied and deleted many, many rows in multirow insert, update, and delete statements.  This bulk movement/reassignment of the data was a regular business process.  The original developer had coded many, many code/description tables.  The end result was tables with 10-20 foreign key columns in the row. 

    So they, asked me in to help them figure out why the nightly run took so many hours to process.  The first thing I saw was all the DRI, so one of the first simple tests was to simply drop all the DRI and re-executing the regular run.  Processing time was 1 - 3 orders of magnitude faster.

    Extra indexes were not always a good solution because of the extra effort required to maintain these.

    As for DRI as a catch all for the lazy developer/DBA... Well, if you need this you probably shouldn't be a developer or DBA.

  • Any chance that that same developer had made that combination of FK's into a Primary Key/Clustered Index? I've seen that before and performance is hideous. That's where you change the PK to be non-clustered and try again. Still have to do index updates, but you don't re-org your table for each operation. (Darn ER tools and not checking the code they generate..... *grumble*) 🙂

  • If someone put 20 indexes on a heavily-updated table it would kill performance, but that doesn't mean all indexes should be avoided.  If every update kicked off a cascade of 20 triggers it would kill performance, but that doesn't make all triggers a bad idea.  Likewise an example of the overuse of foreign keys is not an argument for never using them.

    Developers saying they never use DRI because it kills performance would have a hard time convincing me they were speaking with any degree of experience and understanding rather than from ignorance, stupidity, or some other nefarious motive.  Maybe they can't stand the idea of even a little piece of the application being in the DBA's domain instead of theirs.  Maybe they just think a database is a glorified flat file system and have trouble with the concept of it exhibiting complex behavior.  Maybe they are too enamored of the elegance and infallibility of their own code to allow some poor little function to suffer being implemented by drawing a line in the database diagram.  Maybe it is a job-protection scheme to make sure there is a demand for developers to maintain all the middle-tier RI code.

    There are some excuses for not using DRI that could be justified, such as having to use tools, legacy processes, or third party applications that can't work with it.  These excuses are very case-specific.  Making a general statement that you should NEVER use DRI because it degrades performance is not supportable.

  • One thing no one is talking about is how important it is for third party application firms to maintain the proprietary nature of their applications. I've often thought that one big reason why RI is NOT used at the DB level with some applications is that it is too 'open'. RI at the DB level would require that third party application development give up some 'control' over the DB, and I believe this would be antithetical to the proprietary instincts of third party applicaation development firms.

    Another reason might be that the third party application  may never be stable enough in DB schema to allow for a decent RI structure on the DB level. This comes from applications whose selling point is a high level of 'customizability'. If you can't ever count on a DB schema to be the same, how can you set up DB-based RI?

    Any thoughts?

    wolf

  • use stealth_mode

    go

    Maybe you could try putting in DRI into the database without telling anyone, making sure the proper indexes are there, etc.

    Then see if your first complaint is about performance, or, more likely "What's this constraint violation error?"

    RJ above has some good observations about why we don't see much DRI in commercial products. In my opinion they are still crappy reasons, but it sounds like your project is in-house though, anyway. There should be few actual performance problems with properly done DRI, and those that actually arise should be dealt with on a case-by-case basis, after establishing with certainty that the DRI is at fault.

    It is heartening to hear of actual consequences (firings) for poor-quality work in these posts. In our industry, quality often goes unappreciated, and the more usual outcome seems to be layoffs of architects and DBAs whose contibutions to quality software and smooth-running operations are either never recognized or missed only in retrospect.

    It seems to me that there is an alarmingly popular trend in software development nowadays to regard relational databases as old-fashioned, and an impediment to rapid progress.

    This is not to knock all developers (at least your colleagues have thought about RI!), but many in the object-oriented world seem to think of a database as nothing more than some pigeon-hole where they can "persist" their objects, and the level of database ignorance among developers has remained astonishingly high over the years. Perhaps this is partly the fault of those of us in the "priesthood", for not working more closely with, and educating, our colleagues, managers and customers.

    Good luck with your new job.

  • This is a pretty amazing thread. What I haven't heard anyone mention or emphasize are the business requirements that the data is modeling. Surely the database design has to meet the business needs and part of the business needs could be that only valid countries are used in the address table, or that when a country's name changes the addresses within the new country can still be found within the database. These simple business rules are what DRI does best and where code, quite simply, will likely fail, allowing data to be deleted or modified, creating orphans or incorrect data. Surely the company would like the data to be clean?

    I'm currently working on a project that has a requirement for a database that maintains history of all data (inserts only, no updates or deletes). As such, some data in child tables may be for different versions of the data than the parent tables. So, we're not maintaining DRI on the tables. However, we've got insert/delete triggers in place to ensure data integrity (not the OLTP part of the system, don't panic). The key word here is "insure" as in insurance. The referential integrity checks within the DB are insurance that the data is protected for the business just in case the developers do an oopsie, not that they ever would since their code is without flaw.

    To those who NEVER have the database insure RI, whether through DRI, triggers or functions, I sure wish you could communicate your methods for perfection to the rest of us who, on occasion, make a mistake, forget about a foriegn key or a table or constraint until the insurance we put in place protects our data for us.

    ----------------------------------------------------The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood... Theodore RooseveltThe Scary DBAAuthor of: SQL Server 2017 Query Performance Tuning, 5th Edition and SQL Server Execution Plans, 3rd EditionProduct Evangelist for Red Gate Software

  • This doesn't stop someone with sysadmin rights from mucking with the data, though.

    Sure, nothing will stop someone with sysadmin rights from mucking with the data if they want. They could always find away around DRI if they really wanted to.

    If DRI is causing performance problems, then you probably aren't using it right. i.e. linking every column in a table to another table, for example.

    While using Stored Procs is a great way of maintaining control, I sure hope they are well documented and no one ever comes along and decides to code input a different way. With DRI, even if they did, the data would still maintain RI.

    -- J.T.

    "I may not always know what I'm talking about, and you may not either."

  • Gotta love it when developers take a RDBMS and make it a DBMS, eh?

    If this is the way the company thinks and you think you will have a hard time convincing anyone who matters that DRI should be used, you might want to update that ol' resume and find someplace else to go.

    Then again, you may wind up with a bunch of developers who use DRI but who return ALL the rows of a 1 million plus row table during a web query and let the .net code do the filtering on the client side.

    -- J.T.

    "I may not always know what I'm talking about, and you may not either."

  • Let's add Remedy by BMC to that list. The have tables that hold nothing but the next ID value for another table because they don't even use ID columns. Sure, this is to make it cross-platform to Oracle but I've heard from Oracle people that there is an ID column "work around".

    Remedy databases can get REALLY hairy if the front-end is developed badly. All RI is maintained by code.

    -- J.T.

    "I may not always know what I'm talking about, and you may not either."

  • You do not have to allow cascading deletes. I cannot think of an instance where I would use cascading deletes based on DRI because of the issues you raise. Also becasue if someone with direct DB Right sruns delete from TableA and forgets the where clause or puts where CreationDate < '1/1/06' instead of '1/1/05'. DRI without cascading deletes protects against htese types of errors.

  • Jack,

    I think you've touched on the sensitivity that most developers (me too) have regarding DRI, but I think some would argue that DRI is not complete without cascading deletes.

    A business requirement to automatically delete data is rare in my experience, but the consensus is that if a parent is deleted then child records should be deleted too in order to maintain integrity. That seems to be crux of the argument.

    --Peter

  • Cascading deletes are a different requirement than that if a parent is deleted, so must the children. The question is one of automated actions vs. intentional actions. With DRI in place, you have to delete all the children in order to delete the parent, but nothing says you have to do this automatically.

    Again, the principal point of DRI is the 'I', Integrity. In order to maintain data integrity, you put PK's, FK's, Unique Constraints, in place. Depending on implementation, there may be a performance hit or benefit, but the goal, in this instance, is to protect the data for the business. While small systems, in small shops may be able to single thread the access so that everything going against database X is through application Y, large systems & large shops have more than one application under development, or even multiple development teams on a single application, production hot fixes, DTS (soon SSIS) packages, warehouses, data marts, third party feeds, etc., that require you to place as much protection around the data as possible. DRI is as much a part of that protection as the security system (or do people program everything through 'sa' too?).

    ----------------------------------------------------The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood... Theodore RooseveltThe Scary DBAAuthor of: SQL Server 2017 Query Performance Tuning, 5th Edition and SQL Server Execution Plans, 3rd EditionProduct Evangelist for Red Gate Software

  • Cascading deletes are an absolute nightmare!

    Let us suppose you accidentally attempt to delete a parent object (shouldn't happen but we are talking about the real world here). DRI will give a constraint violation and you are saved any embarrassment.

    Specify cascade deletes and in the above scenario you have just created a major brown trouser incident.

Viewing 15 posts - 16 through 30 (of 40 total)

You must be logged in to reply to this topic. Login to reply