UNIQUE constraint vs checking before INSERT

  • I have a SQL server table RealEstate with columns - Id, Property, Property_Value. This table has about 5-10 million rows and can increase even more in the future. I want to insert a row only if a combination of Id, Property, Property_Value does not exist in this table.

    Example Table -

    1,Rooms,5

    1,Bath,2

    1,Address,New York

    2,Rooms,2

    2,Bath,1

    2,Address,Miami

    Inserting 2,Address,Miami should NOT be allowed. But, 2,Price,2billion is okay. I am curious to know which is the "best" way to do this and why. The why part is most important to me.

    Check if a row exists before you insert it.

    Set unique constraints on all 3 columns and let the database do the checking for you.

    Is there any scenario where one would be better than the other ?

    Thanks.

  • I would let the database do the work for me. Just to be clear, a single constraint that involves all 3 columns should be fine instead of 3 unique constraints.

    Luis C.
    General Disclaimer:
    Are you seriously taking the advice and code from someone from the internet without testing it? Do you at least understand it? Or can it easily kill your server?

    How to post data/code on a forum to get the best help: Option 1 / Option 2
  • Really it depends on how fine you want your error controls, and if you're doing bulk or singleton inserts.

    If your inserts come down one at a time, that's fine. If it's bulk, you'll reject the batch on a single failure (the entire insert is one transaction). You have to determine if that's valid.

    If you're going to test yourself, you have to test twice. First you have to test the batch coming in to make sure it doesn't contain duplicates within itself, then check all the rows in the batch against the target.

    Regarding error controls, either you just work with the error code that SQL barfs up at you (which can change between versions), or you check yourself for a custom check and finer controls as to what happens on a failure.

    Also... EAV table. Ewwwwwww.


    - Craig Farrell

    Never stop learning, even if it hurts. Ego bruises are practically mandatory as you learn unless you've never risked enough to make a mistake.

    For better assistance in answering your questions[/url] | Forum Netiquette
    For index/tuning help, follow these directions.[/url] |Tally Tables[/url]

    Twitter: @AnyWayDBA

  • blasto_max (2/19/2014)


    I have a SQL server table RealEstate with columns - Id, Property, Property_Value. This table has about 5-10 million rows and can increase even more in the future. I want to insert a row only if a combination of Id, Property, Property_Value does not exist in this table.

    Example Table -

    1,Rooms,5

    1,Bath,2

    1,Address,New York

    2,Rooms,2

    2,Bath,1

    2,Address,Miami

    Inserting 2,Address,Miami should NOT be allowed. But, 2,Price,2billion is okay. I am curious to know which is the "best" way to do this and why. The why part is most important to me.

    Check if a row exists before you insert it.

    Set unique constraints on all 3 columns and let the database do the checking for you.

    Is there any scenario where one would be better than the other ?

    Thanks.

    I normally check rows before I insert them even if the proper constraints are in place. Rollbacks are expensive and I'd rather reserve things like TRY/CATCH for real errors instead of "see if it sticks to the wall" types of code.

    You could also create a UNIQUE index on the columns with the "IGNORE DUPES" option. The problem with that is it returns an informational message about dupes being ignored that GUIs can misinterpret as errors. It also doesn't allow me to identify the duplicates for troubleshooting or data repair or just for keeping track of "bad rows" sent by a 3rd party.

    That's the big reason why I prequalify the data before doing an insert. Of course, something could change between the time you do such validation and the actual insert. And, of course, I'll do a rollback in such an instance so that I can get the correct reporting when I tray again.

    The other thing that you might want to consider (and I haven't because I'm still stuck on 2005 at work) is the MERGE command which has the best of all of those thing especially when combined with the OUTPUT operator.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.
    "Change is inevitable... change for the better is not".

    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)
    Intro to Tally Tables and Functions

Viewing 4 posts - 1 through 3 (of 3 total)

You must be logged in to reply to this topic. Login to reply