Is it a good practice to mix surrogate and natural keys?

  • When designing a database, some tables do not have natural keys, but many of them  have natural candidate keys. So, I want to know which is considered a good practice, if I am sure going to use some surrogate keys, it it better to keep natural keys for valid tables or just created surrogate keys for the whole table even if some have natural ones? Is it a good practice to mix some surrogate and natural keys, or take only natural or only surrogate keys?

    • This topic was modified 5 years, 5 months ago by  elea.grig.
  • So, I'm not a design purist. If I were, I'd argue that ONLY the natural keys should ever be used. However, there are very valid performance and design reasons why you might want to use a surrogate key, such as an identity (ID) column or a globally unique identifier (GUID) column. When you do this, as most people will, not only is it OK to mix the surrogate and the natural key, I'd argue that it's vital to ensure that the table meets functional requirements (ID or GUID to manage behavior & speed) and business requirements (the natural key that makes the row unique) both. Now, the real debate starts on which of these two unique constraints should be made into the clustered index. That's where things get fun. Personally, I've found the easiest way to answer that question is to use the most common path to the data. Depending on the situation, this could be either the surrogate or the natural key (although, it can be other column or columns that are not even unique, but I'd shy from those as much as possible). The behavior of the application and the queries drives this decision.

    Hope that helps.

    ----------------------------------------------------The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood... Theodore RooseveltThe Scary DBAAuthor of: SQL Server 2017 Query Performance Tuning, 5th Edition and SQL Server Execution Plans, 3rd EditionProduct Evangelist for Red Gate Software

  • I tend to agree with Grant's views. I usually use surrogate keys because what I find is that many business people swear there's a natural key, and there is, until we decide there isn't and we're changing the meaning of our data slightly.

    I like surrogate keys.

  • What I want to know is why people give supposed "design purists" so much credit.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.
    "Change is inevitable... change for the better is not".

    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)
    Intro to Tally Tables and Functions

  • elea.grig wrote:

    Is it a good practice to mix some surrogate and natural keys...?

    Heh... look at sys.objects for an answer to that question.  Both the object_id and the name must be unique.  In my own tables, I'll frequently have both a PK and an AK.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.
    "Change is inevitable... change for the better is not".

    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)
    Intro to Tally Tables and Functions

  • I use natural keys when available, and sometimes they are available.  Assigning numbers to states when OH will always mean Ohio makes perfect sense.  And in this case it often avoids a join as in many cases OH is sufficient.  A join would only be necessary if the full name were needed.  If OH is instead identified by "34", a join would always be necessary.

    Often there is no good natural key.  Names are an obvious example.  In that case, I use a surrogate key.

    These rules apply to OLTP designs.  OLAP designs always require a surrogate key.  Even with a good natural key available, the need to possibly create a 2SCD dimension trumps this.

  • You need to enforce meaningful uniqueness in each row somehow. But that doesn't necessarily have to be done via the clustered index, it could be done via a non-clustered index on your natural key, while also having a clustered index on your surrogate key. There seems to be some good argument for use of surrogate keys for your clustered index since surrogate keys can be "unique, narrow, and static" and "ever-increasing":

    https://www.sqlskills.com/blogs/kimberly/ever-increasing-clustering-key-the-clustered-index-debate-again/

    It seems to me kind of a personal choice you might make on a case-by-case basis.


    "If I had been drinking out of that toilet, I might have been killed." -Ace Ventura

Viewing 7 posts - 1 through 6 (of 6 total)

You must be logged in to reply to this topic. Login to reply