Very basic question - Insert or Update?

  • Assuming that I do not know if a record exists (and about a 50% chance that it does), what is the most efficient method of either inserting or updating? 

    Check first for existence.

    Try the update.  If nothing updated, insert.

    Anything else?

    TIA

    David

     

  • G'day,

    Strictly as a matter of personal preference, I prefer to check first then either update or insert as appropriate.  Why?  Because tyring the update and handling a failure is essentially building an error handler for a "normal" condition.  My preference, as a matter of style, is to use error handlers for errors.  But I freely admit, this is largely a matter of personal preference and style.

    Wayne

  • Wayne,

    For some reason I thought that a failed update did not set up an error.  I thought that it "succeeded", but the row count returned 0. 

    David

     

  • If Exists()

       Update

    Else

       Insert

  • Why not check the rowcount after initiating the update statement.  If nothing is updated, then issue an insert statement.

    UPDATE ...

    IF @@ROWCOUNT=0  -- No records updated, so it doesn't exist and needs to be inserted

    INSERT ...

    Brian

     

  • Brian,

    That is what I was thinking - try to update and if the rowcount is 0 do the insert.

    I guess that my question boils down to which is more efficient:

    select count(*)... and then update or insert (whichever is appropriate)

    or just try to update, inserting if no records are updated

    If the update attempt is not much more costly than count(*), then it is probably better because the update will be successful a number of times. 

    David

  • It is probably more efficient to use IF EXISTS() instead of Count(). Depends on the size of your table. Exists checks for the values and stops when it finds the first one. Count will count everything whether it finds something or not.

    Just a thought.

    Quand on parle du loup, on en voit la queue

  • EXISTS() will almost certainly be faster than COUNT(). EXISTS can stop once it has detected one single row that matches the condition and returns TRUE, while COUNT needs to run through all rows. Even after it has found a matching row.

    But, what I would like to know. It this a regular load process? Just asking, because it might be a chance that you can truncate the table first and than insert all rows rather than trying costly to determine if there are already existing rows or not.

    --
    Frank Kalis
    Microsoft SQL Server MVP
    Webmaster: http://www.insidesql.org/blogs
    My blog: http://www.insidesql.org/blogs/frankkalis/[/url]

  • Hi

    I agree with Wayne and Edwin. It might be a matter of taste, but it just looks neater (and better planned) to go the route of

    if exists()

    update

    else

    delete

    Schalk

  • Thanks Everyone,

    Frank, this is an Internet application where the insert/update happens one record a time, so I cannot truncate the table.  

    Looks like if exists() gets the nod.  It does seem cleaner to me too. 

    David

     

  • On the other hand... it depends on how critical performance is, but actually performing an exists check before doing either an update or an insert is just a waste of time and resources

    Why then, you may wonder?

    Well, you are 100% certain that you want either an update (50% of the time) or an insert (50% of the time). Since it's a tie, it doesn't matter much if you attempt an update before an insert or the other way around, what you do know, is that you already have the PK for the table known... If you hadn't, you couldn't check for existance in the first place, right?

    Anyway, instead of always doing two statements 100% of the time (ie always exists followed by either update or insert) you could instead do two statements 50% of the time (ie when there was nothing to update, an insert was required) and just one statement the other 50% of the time (ie it was an update, and then nothing more)

    For that purpose, it would be the most efficient route to just attempt the update right away, without wasting time on other stuff. When the update is done, check @@rowcount as suggested earlier. 50% of the time, the proc ends here, the other 50% it moves on to do the insert. If you always check for existance first - you will always do 100% work - not 50/50 + 100/50

    /Kenneth

  • And Adding to Kenneth the

    if Exists()

     Update

    else

     Insert

    is a BAD assumption. Supposed you are on a heavily transactional environment (Web App is a good candidate) and the exists Fails for two users at the same time, that will meant that one of them will be able to insert and the other one won't unless you use higher isolation level lockings which will indeed reduce performance if you follow that route.

    Please, do as Kenneth explained above  (DIRECT UPDATE with Key or DIRECT INSERT TESTING FOR KEY NON EXISTENCE) and you will be OK

    HTH

     


    * Noel

  • Just when I thought that it was all settled... 

    This is not a heavily transactional environment and the only person likely to update or insert a record is the person who belongs to the ID affected, but this issue will come up again. 

    I had thought that the deciding factor would be how efficient/inefficient an attempted update is.  i.e., if it is efficient, why not try the update (you'll succeed a percentage of the time).  If it is not efficient, then it makes sense to test for existence first. 

    Noel brought up a good point about two exists() failing and both users then attempt to insert.  But what happens if two users try to update a record that does not exist,  at the same time?  Wouldn't that be the same as both testing for existence?  Wouldn't they both attempt to insert at the same time if their updates failed?  How does record locking work here?  Would it lock for the update, but not the exists()?  Couldn't there still be a problem? 

    David

     

  • David, Let me simplify the scenario so that you understand what I mean:

    Update TblName Set fld1 = x, Fld2 =y ... where pk =@pkey

    if @@rowcount = 0

    begin

     insert into TblName (fld1, fld2)

     select x, y  -- no from clause, just a single record

     where not exist (select * from TblName where pk = @pkey)

    end

    I am leaving @@error checking out of the equation for simplicity but have a look at it. If both updates fail one of the inserts is going to succeed before the other and then you still get atomicity in the process. Some designers opt for the Table lock mechanism to simulate pessimistic locking and others will simply perform 2 round trips with a single action but without a doubt this is the fastest (although not the most flexible) approach

    Do you see it?

    HTH

     


    * Noel

  • hmmm.  It seems we have made the original question a little broader by introducing the related issues of reliability and atomicity.  Dare I suggest we might consider a transaction wrapping the actual insert/update statements?  If we introduce transactions then I suspect the overhead of begin/commit/rollback may well make the original question moot.

    Next question: how is "update followed by a row check" better or worse than an "if exists followed by an update"?  In either case we will subsequently do an insert 50% of the time.

    One possibility might be the actual server activity associated with implementing the "exists" check versus the server activity implementing the update to @@rowcount and the subsequent test of @@rowcount.  I do not personally know the internals well enough to assess which is faster under the hood.

    Thoughts?

    Have a great day!

    Wayne

Viewing 15 posts - 1 through 15 (of 18 total)

You must be logged in to reply to this topic. Login to reply