The Identity Debate

  • I have been a DBA for 15 years. Whenever I have the control, I create synthetic keys using identities. I have taken over many projects where the developers used multi-column natural primary keys which I simply find awkward to work with. In a recent case, a series of batch error processing tables has 6 columns to uniquely identify a batch, which means these 6 columns appear in each of the 4 related tables:w00t:

    As for using GUID's vs. Integers, in most cases I prefer integers, for performance reasons. http://cookingwithsql.com/index.php?option=com_content&task=view&id=52&Itemid=58

  • I am pleasantly surprised to see that a majority is open to the smart way of doing things: identities as surrogate keys. Celko is clueless on this.

  • Wow, no anti-identity post yet. I like identity for all the reasons mentioned thus far. I also agree that there are times when an identity column is not needed. As far as the sex/gender table there is also an ISO standard for that that has 4 entries I believe 0 - Unknown, 1 - Male, 2 - Female, 9 - Not Applicable.

    I use identity and many times leave it as the clustered index, but do try to identify the natural key and assign a unique index to that. I do it because as many have already mentioned I have seen unique indexes suddenly not become unique or have the id field that will never change change.

    Someone mentioned not using identity on a States_provinces table, well I do because the natural key is a composite of country and state/province and I am lazy and like to join on 1 column.

  • I use GUIDs as my PK for pretty much every table that has "meaningful" data, relegating any natural keys to a secondary role. Our reasoning behind GUIDs versus identity IDs is from a security perspective (we're a US government organization). If a user can see record IDs of 1, 2, 4 and 5 it doesn't take a rocket scientist to figure out that there is probably a record 3 too. The user may not have permission or visibility to record 3 though and that just begs a hack attempt. GUIDs pretty much eliminate this issue for us.

  • identity columns are like the fast food of the database world.

    ..."Super Size Me" ring a bell.

    Basically a lack of understanding on the upfront work involved in modeling. Does anyone still know what logical modeling, forward engineering, or even relational is??

    Way to much focus on the technology and not enough focus on DESIGN.

    I guess you can't blame it all on the IT world. Part of the problem is that businesses want a jack of all trades to be the DBA/network admin/developer rather than have a true

    DBA.

  • Apparently there is an important difference between Canadian and American systems law. Up here, we are not permitted to use Social Insurance Numbers for anything except reporting to Revenue Canada. Organizations are entitled to request them, but individuals are not required to provide them except for purposes relating to Revenue Canada.

    In addition, there have been several cases of duplicates being issued.

    Arthur

    Arthur Fuller
    cell: 647-710-1314

    Only two businesses refer to their clients as users: drug-dealing and software development.
    -- Arthur Fuller

  • I agree with the comments that we need both a natural key and a surrogate key.

    We need the correct natural key or the table is not normalized, but we also need a monotonically increasing key (identity, sequential GUID, etc.) to get the best SQL Server performance.

    It is good to see some consensus on what has been, for many many years, a very contentious and confusing issue. The still confusing item is that we only have one “Primary Key”. Maybe we need a “Logical Primary Key” and a “Physical Primary Key”?

    I would like to add a reference to “Pro SQL Server 2005 Database Design and Optimization” by Louis Davidson who has one of the best and clearest discussions of natural and surrogate keys that I have ever seen.

  • I tend to use identities for every table, and do not tend to expose them to the users (if they said product ids are unique and they later end up with two products with the same id and can't tell which one is which on the screen, it is their problem not mine - I don't have to do an emergency rewrite)

    I hadn't heard about the standard of 0,1,2, & 9 for sex, but I limit the meaning of Null to "[font="Times New Roman"]the user did not input it[/font]" so I never had a problem when writing health care systems (2 cases for unknown for you to consider: a trauma admission when you can't tell the sex of the person + it doesn't matter, and an insufficiently developed fetus who must be assigned a medical record number because of a procedure done on it or its mother)

    I'm about to create an id field when I already have a GUID supplied from an external source, anyone have an opinion on that? The reason is historical high turnover and a desire to be able to debug by hand. I know it might not be necessary, but does it rise to the level of "[font="Comic Sans MS"]don't do it[/font]"? It is such a minimal waste of space 🙂

  • I find that I'm with the you need both an artificial key and a natural key camp. Usually, but not always, the artificial key is an identity. However, I also find that we've done more & more designs that sort of combine the two. For example we have three or four objects that are identified by an identity field, as well as some kind of natural key, within the database. Then we begin relating these tables to each other to describe business processes & storage and there, we use the combined identity columns as the key, usually combined with lookup tables to identify extra attributes about the relationships and behaviors. In this case, the "natural key" is a combination of the artificial keys from the table above. It would have been even if the tables above had used only natural keys. Clustering on these compound keys using the most likely access pattern has made for huge performance benefits since we don't have book mark lookups.

    ----------------------------------------------------The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood... Theodore RooseveltThe Scary DBAAuthor of: SQL Server 2017 Query Performance Tuning, 5th Edition and SQL Server Execution Plans, 3rd EditionProduct Evangelist for Red Gate Software

  • One of the best things about identity is that it is SEQUENTIAL! So using it as your clustered (usually also your primary) index, will not cause any page splits on your actual data, meaning inserts would always occur as fast as possible. Then also, the benefit of using identity int over guid is that an integer is much smaller. This means that linking by integer would take less time than by the guid. I would say always use identity (for fast inserts), then optionally additionally add a guid column if you need to merge data. The middle tier can always call a stored procedure to insert any data and that same procedure could immediately return the SCOPE_IDENTITY()/NEW_ID() ensuring that the middle tier has the correct key to work with anyways.

  • I am amazed at the lack of responses from the religious left this time around.

    We always design for replication, we use it fairly often. And records generally get created at a distance in the app and get inserted via ADO recordsets or Stored Procs. So the primary key always turns out to be a GUID. Easy to write joins (almost always 1-1), and less trips to the server. I'll never argue that inserting GUID based records is faster from a server-only perspective, but we've always found that saving trips on the network is far more performance enhancing that tweaking out a few CPU cycles on the server.


    Student of SQL and Golf, Master of Neither

  • fuller.artful (2/11/2008)


    ... The Sex column is obvious, but I think so are columns such as StateOrProvince, in which there are a small number of uniquely identified entries. Countries is another example: there are unique 2- and 3-character codes mandated by the ISO, so why bother with sids?

    It's funny you should mention that. I recently had to put together a data warehousing application that had to map country codes from 5 different organizational systems (the parent company had just bought 2 more companies and needed to integrate all of their data into a warehouse). None of the 5 systems used the same codes for all countries, and only one of the systems was actually ISO compliant. I ended up having to manually map the country codes for all of the different systems to the standard ISO codes, but even then I ended up with some codes that were not ISO-compliant, which I still had to account for.

  • I use the identity fields a lot but I do not go crazy with them. I would never use them on lookup tables like states, ZIP codes, gender or such. Practicality is my major motive in all of my database designs and I probably brake a lot of 'pure relational' database design rules because it is more practical to have them broken then to go 'by the book'.

    One of the posts on this thread claimed that the identity field is being used because we (who design the databases) do not know the design process well enough or are too lazy (or dumb) to follow it.

    I would like to provide one simple example where identity makes a lot of sense and no natural keys exist.

    The following information needs to be stored in the database: employer, employee, insurance plans offered by the employer and employee elected insurance plans.

    I would like to challenge any of the database purists to design the database in this case using the natural keys. Good luck.

    As far as the reason Andy provided for one of Joe Celko’s reasons for not using identity:

    "what did missing row #32 contain, was it a failed insert or someone stealing something and trying to hide evidence".

    Excuse me but from Joe I would expect a more sophisticated reason. First of all, at least the missing row will indicate to me that there was something going on so at least I can investigate what is going on. And if you are using a natural key and someone wipes half of your records there is no way you will even notice.

    ---------------------------------------------
    [font="Verdana"]Nothing is impossible.
    It is just a matter of time and money.[/font]

  • jaholbrook (2/11/2008)


    identity columns are like the fast food of the database world.

    ..."Super Size Me" ring a bell.

    Basically a lack of understanding on the upfront work involved in modeling. Does anyone still know what logical modeling, forward engineering, or even relational is??

    Way to much focus on the technology and not enough focus on DESIGN.

    I guess you can't blame it all on the IT world. Part of the problem is that businesses want a jack of all trades to be the DBA/network admin/developer rather than have a true

    DBA.

    <sarcasm>

    I have to say - I'm a BIG fan of the "everyone is clueless" argument. It makes for a REALLY constructive conversation. I vaguely remember that from my "verbal abuse for technical folks" class in college.

    Oh masterful Sensei - next time you DO come down from that mountain of wisdom, perhaps you could attempt to enlighten us troglodytes on why we are so misguided. I for one always like to hear about my failings.

    </sarcasm>

    Anyway - it really can be pretty remarkable what "unique and will never change" keys have a nasty tendency to change on a dime. Especially in healthcare it seems. Throw one more hat into the "private surrogate keys aren't evil" camp...

    ----------------------------------------------------------------------------------
    Your lack of planning does not constitute an emergency on my part...unless you're my manager...or a director and above...or a really loud-spoken end-user..All right - what was my emergency again?

  • When Newfoundland (NF) changed it's official name to Newfoundland and Labrador (NL), the use of a natural key meant a 25 million row update, and weekend work to do it.

    I'd much rather have updated a single row in 1/10 of an instant.

    There must be a candidate key on each table, however, otherwise it's hard to know what the data means? What does it mean when there are identical rows?

    P

Viewing 15 posts - 16 through 30 (of 129 total)

You must be logged in to reply to this topic. Login to reply