Using Large Integer Or GUID Vs Using Small Integers For Primary Key

  • Jeff Moden (12/28/2016)


    ScottPletcher (12/28/2016)


    Then you encode the actual BankAccount value into a related int, as I stated above. That value initially could be gotten from an identity column in another table or a SEQUENCE value, but it wouldn't change once assigned, even if the BankAccount value did. But it would always represent that one account.

    Man, do I ever agree with THAT! Like I said, I have to go through hell and back because, although they did such a mapping where I work, they didn't bloody well use it.

    They also built (can't say "designed" because they put no thought into it, whatsoever) most of the big tables in a highly denormalized fashion and the CI doesn't actually perform well because it's between 100 and 140 columns wide on the big tables. As a result, both the CI (the mapped IDENTITY column) and the AK (LoanNumber, ClientID) are fairly well useless by themselves for anything having to do with performance including SELECTs and so they've necessarily got a fair number of indexes and FKs (sometimes in the dozens) on many of the big tables.

    I'm talking about overall performance, not just the INSERT. After all, there's typically Ks, 100Ks, Ms, etc., of SELECTs per INSERT. If a given approach speeds up every SELECT -- even if it did mildly degrade the INSERT in a particular case, which often it doesn't that much anyway -- it could still easily be worth it. On very large tables, such an approach often allows merge joins rather than hash or, yikes, loop joins.

    Yep, absolutely understood and agree. I'm talking about both, though. For the Expedia site, INSERTs where as high as SELECTs and sometimes higher because they logged everything (similar to what DoubleClick.Net does with their bloody "spotlight" pixels on graphics). For the company I currently work for, there's a constant flux of new data for large batch jobs and GUI inputs alike. The performance of INSERTs, UPDATEs, and DELETEs is just as critical as the SELECTs. Heh... and it's damned painful at times especially with the logging that we have to do to be compliant (banking, loans, and lender placed insurance industry). It used to take up to a minute to do a single row delete from the LOAN table because they didn't understand the concept of how to make the dozens of FK checks quick (because the table IS highly denormalized and has way too many other tables that depend on it).

    Even when the PK/CI column is the leading column in a query, it's dog slow for multi-row returns simply because of the low row count (usually just 1 or 2) per page and so it's become necessary to "duplicate" some of the data using an NCI with the same leading column as the CI as well as some includes. When it comes to the heavy batch processing (which a lot of people forget to consider, they usually just thing GUI performance), it's a vicious circle. The indexes are necessary to make the SELECTs for the batches fast but has the exact opposite effect on the INSERTs/DELETEs because those affect ALL the indexes and FKs.

    It's all a huge waste of disk space and, in many cases, memory.

    Sorry... that sounds like a rant towards you and I don't mean it that way. I'm just trying to explain that things aren't as simple as "selecting the right CI".

    That sounds great, and makes perfect sense to me. If it would help, you could try horizontal partitioning to reduce the row size. Presumably some sets of those columns are used for some purposes and not others, and the columns could be split functionally so that row size could be reduced somewhat. On the other hand, the big row size probably actually cuts down on the number of rows locked by insert/update since fewer rows are on the page(s) locked.

    SQL DBA,SQL Server MVP(07, 08, 09) "Money can't buy you happiness." Maybe so, but it can make your unhappiness a LOT more comfortable!

Viewing post 16 (of 15 total)

You must be logged in to reply to this topic. Login to reply