Referential Integrity Checking Burden during Updates

  • This is a second post on the same project as yesterday's post in this forum:

    I am reviewing a critical DB application with extensive application logic in SPs, functions, and views. 

    Three very large tables in the DB (the largest--tens of thousands of records) have clustered indices for PKs that don't follow the natural order in which the records are laid down.  E.g., one table has a PK of (SubOrganizationID, RecordType, RecordSubType, Date).  Every year about 2,500 records are added to this table (and similar for the other two large tables) covering every combination of the RecordType, etc., for each SubOrganization for every date in the coming year. 

    The three largest tables are also weak entities with cascading primary keys.  If I have five parent tables, A-E, and my three large tables are designated L1, L2, and L3, the PKs are as follows:

    L1: (PKA, PKB, PKC)

    L2: (PKA, PKB, PKC, PKD)

    L3: (PKA, PKB, PKC, PKD, PKE)

    On a daily basis thousands of records in L1, L2, and L3 are updated.  PKs never change but attribute data is updated--PKs are used in the WHERE clause of the update statements.

    The application gives us a pretty high level of confidence that once the records are created the first time in the bulk add there will be no changes in the PK.  So I am thinking that I may want to drop the relationships from L1, L2, and L3 (my weak child tables) if each update requires a referential integrity check to confirm the validity of the PK.  But I really don't know what burden the referential integrity checks impose during an update operation.  Any ideas?

    By the way, tables A-E are not horribly large nor are their PKs complex.  Also, this application is for a decision support/management information system--almost a data warehouse-type activity.  Transaction processing takes place elsewhere so any mistake in PKs that might creep in would not be mission critical and could easily be caught with a maintenance utility. 

  • Hi Larry,

    You mentioned that you have a high level of confidence that no changes will occur to the PK so therefore each update will not require a referential integrity check because the PK won't be updated.  A referential integrity check only takes place if you update the column(s) on which a relationship exists.

    Even if you did update the PK I would not recommend that you drop the relationships as they are there for a reason and the burden is not as high as performing the checks yourself.

    Tables with tens of thousands of records should give you no cause for concern.  SQL Server can easily handle hundreds of millions or billions of records so numbers in the thousands are nothing.

    I just tried updating a record within a table containing 25 million records (referenced by another table containing 12 million records) and the referential integrity check came back instantly telling me it couldn't do it so I really wouldn't worry about the performance impact.

    Hope that helps,

  • I'm pretty new to this level of analysis (hence the question).  But the question comes from my work with triggers where an Update gives you access to the deleted and inserted version of the row--which I assumed indicated that a delete occurred followed by an update--in that order.

    So my fundamental question now seems to be, when updating a row without affecting any PK columns:

    a) is the original row deleted followed by the insertion of the new row, or

    b) is the location of the original row located and the new version laid down in the same space?

    Thanks for the reply. 

  • Hi Larry,

    That's a good question, which made me think

    You're right in saying that when performing an update, behind the scenes SQL Server performs a delete and then an insert.

    However, when working with triggers you'll have notice that you can ask SQL Server whether a particular column was updated or not.  So SQL Server must have knowledge of whether a column (that requires referential integrity) is being updated. 

    So my guess is that SQL Server ignores the need to perform a referential integrity check when it knows that the column being updated doesn't need that check.

    What I'm getting at is, that I think the answer to your question is a) but it probably isn't a delete followed by an insertion as you or I know it.  It is something that SQL Server is handling behind the scenes in it's own way so it isn't a classic SQL delete and SQL insert.

    This is just a guess though and I can't claim to know the answer.  I'd be interested to hear if anyone has a definitive answer to this.

    In either case though, as I pointed out, the referential integrity check happens very quickly anyway with large tables so it shouldn't concern you too much.

  • Updates are potentially expensive operations because the default IS to actually perform a delete followed by an insert. In certain circumstances the server can perform an "update in place", but the list of requirements is long, and includes the stipulations that the new data must be <= in length than the old data, that the column(s) must not participate in any indexes, along with a dozen or so additional requirements as well. A particular cause of update slowdowns is when the new row version takes up enough additional space that it will no longer fit on the same page along with its neighbors, thus requiring a "page split" operation. In an extreme example suppose several varchar(512) columns are are updated from NULL to full-length values - many fewer rows will be able to fit on the same 8K page. In situations where updates of large tables are anticipated, it sometimes makes sense to separate columns subject to update into a separate table from those likely to remain static.

Viewing 5 posts - 1 through 4 (of 4 total)

You must be logged in to reply to this topic. Login to reply