Are There That Many GUIDs?

  • I've been thinking about this issue for a few days. Like everyone, I accept that using a single integer field is going to be the best performance, but I don't believe that sequential GUID's are better performing than random-like non-sequential GUID's. In fact, I think they're possibly worse.

    Consider how you would compare two GUID's. A GUID is 16 bytes which can be broken up into four 32-bit integers. Each of which can be handled as a single unit on the processor. When comparing the two GUID's, you would compare the four integers. But the first rule of comparison is that you STOP when you hit the first mismatch.

    Non-Sequential GUID's

    When comparing a random-like non-sequential GUID, the first integer has a 1 / 2^32 probability of matching. That's a one in four billion chance that you'll ever have to compare that second integer! 99% of the time, your comparison time will equal that of a normal integer column. Only the exact record you are searching for will normally need to you compare all four integers. That's a 1 / 2^96 probability which is so statistically small, it's negligible. Even if they were bad enough to compare bytes instead of integers, you would still have a smaller chance of getting to the second byte, then to the third, etc.

    Sequential GUID's

    Compare that to sequential GUID's. Until you have more than four billion rows in your table, your first three integers of the GUID are ALWAYS MATCHING 0's. Thus, you always have to compare all four integers and have four times worse performance. The obvious solution would be to compare from right to left, but I don't think that's the way it is done because of indexes.

    Indexes

    All indexes in SQL servers are built as balanced search trees ("Inside Microsoft SQL Server 2005 T-SQL Querying" by Itzik Ben-Gan). And for efficient search trees, you require them to be sorted. Unless Microsoft is doing something very clever, sorting GUID's needs them to be sorted from left to right. Using sequential GUID's would mean that every operation against the index has to compare all four integers of the GUID each time against the index record. Random GUID's would typically require only checking one integer.

    Now, people say that sequential GUID's are better for indexes because they reduce fragmentation, especially in clustered indexes. But as I mentioned, ordered indexes are all BALANCED search trees (not binary search trees). The size of a balanced search tree is a function of how many items in the tree, NOT the key values being stored. Even if you were using sequential GUID's when inserting to a clustered index, the tree would still be re-paginated every time the far right leaf reached its upper item size.

    Also, SQL server doesn't know that you're inserting sequential GUID's despite your column default so it searches the tree every time to insert that new one. You're forcing it to traverse down ALL the right nodes of the tree to find the spot to insert at. If the leaf node maximum is reached, the node will be split in half and some items will move to a parent node, others to a new left node. That's how balanced search trees work. With a random GUID, it may have found a place to insert at higher up in the tree and would have been more efficient again.

    So, even if Microsoft did compare GUID's right to left, it would simply make them as efficient as non-sequential GUID's, and practically and statistically as good as plain integer columns. Given that only one integer of the four would need to be checked 99% of the time.

  • Grant Fritchey (10/12/2010)


    They should be used where appropriate...

    I apologize for puting you on the spot but what do you consider an appropriate use for them is especially since even MS does not guarantee their uniqueness?

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.
    "Change is inevitable... change for the better is not".

    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)
    Intro to Tally Tables and Functions

  • Eric Russell 13013 (10/12/2010)


    As for global uniqueness, I don't see why the column combination (center_id + cashregister_id + transaction_date) or database id + transaction identity id would not be as reliable, or more reliable, than a GUID. A GUID is nothing but a timestamp + (network card id or random number).

    Actually, it's just a random number now (Type 4 GUIDs are used). It was a security risk having the network card id in the GUID.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.
    "Change is inevitable... change for the better is not".

    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)
    Intro to Tally Tables and Functions

  • Although having two systems come up with the same GUID is a bit like having two comets appear in the same mile across 14 quadrillion Milky Way galaxies, it can and reportedly has happened.

    If you're going to use it, at least put a unique constraint on it. Although they say it's highly unlikely to ever find a dupe, even MS doesn't guarantee it's uniqueness.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.
    "Change is inevitable... change for the better is not".

    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)
    Intro to Tally Tables and Functions

  • Jeff Moden (10/17/2010)


    Grant Fritchey (10/12/2010)


    They should be used where appropriate...

    I apologize for puting you on the spot but what do you consider an appropriate use for them is especially since even MS does not guarantee their uniqueness?

    First & easiest, you have to have them if you're doing merge replication. But that's cheating...

    I have worked with apps that are largely offline, that build out data structures and collect data on a remote machine and then sync up to the mother ship. In this case, it really is easier to have a unique key that is generated on the remote machine, a data structure built around it, and then all you have to do is move it. No recreating the structure, no rebuilding the data. It's worked really well.

    As to the duplicates... yeah, it can happen, but it's an exceedingly rare event. For the type of work I've supported, I'm much more worried about deadlocks and crappy code than a duplicate key. Our nHibernate app is starting to surface after 2+ years of secret development. That's going to provide about 30-50 blog posts and hours and hours of explanation for why their code is breaking the database, not that it'll do any good.

    ----------------------------------------------------The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood... Theodore RooseveltThe Scary DBAAuthor of: SQL Server 2017 Query Performance Tuning, 5th Edition and SQL Server Execution Plans, 3rd EditionProduct Evangelist for Red Gate Software

  • tdennis-674196 (10/17/2010)


    I've been thinking about this issue for a few days. Like everyone, I accept that using a single integer field is going to be the best performance, but I don't believe that sequential GUID's are better performing than random-like non-sequential GUID's. In fact, I think they're possibly worse.

    I think you're partially right here. There might not be a ton of different in the comparisons, but the fragmentation can be a big deal. It can substantially slow things down because the tree isn't contiguous on disk.

  • I have used GUIDs in similar situations to what Grant has. When you have offline work, or remote clients that need to generate a "key" of some sort for a transaction. You can use identities, but the management of ranges is a pain, and it's a huge problem if not done well.

  • Steve Jones - SSC Editor (10/18/2010)


    tdennis-674196 (10/17/2010)


    I've been thinking about this issue for a few days. Like everyone, I accept that using a single integer field is going to be the best performance, but I don't believe that sequential GUID's are better performing than random-like non-sequential GUID's. In fact, I think they're possibly worse.

    I think you're partially right here. There might not be a ton of different in the comparisons, but the fragmentation can be a big deal. It can substantially slow things down because the tree isn't contiguous on disk.

    There is also excessive page splitting which is one of the root causes of fragmentation. This is really where the value of the newsequentialid function comes in. It dramatically reduces page splitting.

    "Beliefs" get in the way of learning.

  • Hope you guys don't mind me chiming in... We're working on a new system now, and my goal is to have a globally unique ID (whether INT or UNIQUEIDENTIFIER) for each record in our entity tables. This way I could create entity dependent tables for comments, addresses, etc which can be used by any other table joined only by the Global UID. In testing thus far it's working great, but I'm finding the pitfalls of using UNIQUEIDENTIFIER might not be worth it down the road.

    I've tried a few workarounds to create a Numeric global unique ID, but short of writing a custom CLR function (which I'm about to work on) there's no simple way of getting around it.

    Some notes about the DB I'm writing, each record in the entity tables have both an ID (Numeric Identity) and GUID (UNIQUEIDENTIFIER) where the ID is used for reports or within the application, and the GUID is used only for internal linking. We have times where we need to change the publicly seen ID of a record, so keeping this separate from the ID used for linking within the database is ideal.

    It'd be nice if Microsoft would create a numeric data type that works as streamlined as a GUID but being more akin to an Identity field.

    Sam

  • Just out of curiosity, is there any particular reason why the id has to be "globally" unique? Is there some reason why it can't just be unique within your enterprise?

    "Beliefs" get in the way of learning.

  • samalex (10/18/2010)


    Hope you guys don't mind me chiming in... We're working on a new system now, and my goal is to have a globally unique ID (whether INT or UNIQUEIDENTIFIER) for each record in our entity tables. ...

    Actually, could you post this in one of the design forums. You can include a link here, but I'd rather that go into its own thread.

  • samalex (10/18/2010)


    Hope you guys don't mind me chiming in... We're working on a new system now, and my goal is to have a globally unique ID (whether INT or UNIQUEIDENTIFIER) for each record in our entity tables. This way I could create entity dependent tables for comments, addresses, etc which can be used by any other table joined only by the Global UID. In testing thus far it's working great, but I'm finding the pitfalls of using UNIQUEIDENTIFIER might not be worth it down the road.

    I've tried a few workarounds to create a Numeric global unique ID, but short of writing a custom CLR function (which I'm about to work on) there's no simple way of getting around it.

    Some notes about the DB I'm writing, each record in the entity tables have both an ID (Numeric Identity) and GUID (UNIQUEIDENTIFIER) where the ID is used for reports or within the application, and the GUID is used only for internal linking. We have times where we need to change the publicly seen ID of a record, so keeping this separate from the ID used for linking within the database is ideal.

    It'd be nice if Microsoft would create a numeric data type that works as streamlined as a GUID but being more akin to an Identity field.

    Sam

    I see don't how a CLR would come into play when implementing a globally unique integer id. You can create an incrementing integer id, one that spans multiple tables while retaining it's uniqueness, using an identy seed and a check constraint.

    For example:

    create table Table1

    (

    ID bigint not null identity(10000000000,1)

    check (ID between 10000000000 and 19999999999)

    primary key

    );

    create table Table2

    (

    ID bigint not null identity(20000000000,1)

    check (ID between 20000000000 and 29999999999)

    primary key

    );

    create table Table3

    (

    ID bigint not null identity(30000000000,1)

    check (ID between 30000000000 and 39999999999)

    primary key

    );

    "Do not seek to follow in the footsteps of the wise. Instead, seek what they sought." - Matsuo Basho

  • samalex (10/18/2010)


    ...

    It'd be nice if Microsoft would create a numeric data type that works as streamlined as a GUID but being more akin to an Identity field.

    Sam

    In Oracle there is an object called a SEQUENCE, which is used in situations where we would typically use an identity in SQL Server. The difference is that it can't be used in a default constraint, so it's not tied to a specific table, and the next available value is fetched explicitly using PL/SQL function calls within a stored procedure or trigger.

    "Do not seek to follow in the footsteps of the wise. Instead, seek what they sought." - Matsuo Basho

  • Robert Frasca (10/18/2010)


    Just out of curiosity, is there any particular reason why the id has to be "globally" unique? Is there some reason why it can't just be unique within your enterprise?

    Robert, this is actually all we need, but there's no simple way I've seen within SQL to get it. Identity will give you unique within a table, GUID unique across the board, but I need the best of both -- a numeric field that's unique across the database. I don't care if someone else out there has the same GUID because I don't plan on running concurrent systems where data needs to be merged.

    My first attempt at doing this was to create a UniqueID table with an identity field, then somehow write a User defined function or CLR Function to insert a new record into UniqueID then grab the identity field, but functions can't run Inserts which makes it pretty hard to do. For stored procedures that modify the database one record at a time it's not difficult to add this logic, but for bulk inserts a function is ideal -- but alas functions won't work (given the scenario I'm looking at anyway).

    I also looked at doing it via a Trigger, but that comes with its own pitfalls... though it might be the simplest solution.

    Sam

  • samalex (10/18/2010)


    Robert Frasca (10/18/2010)


    Just out of curiosity, is there any particular reason why the id has to be "globally" unique? Is there some reason why it can't just be unique within your enterprise?

    Robert, this is actually all we need, but there's no simple way I've seen within SQL to get it. Identity will give you unique within a table, GUID unique across the board, but I need the best of both -- a numeric field that's unique across the database. I don't care if someone else out there has the same GUID because I don't plan on running concurrent systems where data needs to be merged.

    My first attempt at doing this was to create a UniqueID table with an identity field, then somehow write a User defined function or CLR Function to insert a new record into UniqueID then grab the identity field, but functions can't run Inserts which makes it pretty hard to do. For stored procedures that modify the database one record at a time it's not difficult to add this logic, but for bulk inserts a function is ideal -- but alas functions won't work (given the scenario I'm looking at anyway).

    I also looked at doing it via a Trigger, but that comes with its own pitfalls... though it might be the simplest solution.

    Sam

    You lost me. You make it sound like unique within a table isn't good enough. If it's unique within a table it's unique across the database. There can only be one instance of tableName.IDField within your database. The identity attribute will assign unique values whether you bulk load or insert one row at a time into the table. Using the newsequentialid function as the default value for a field will accomplish the same thing for a guid. I guess I don't understand your problem.

    "Beliefs" get in the way of learning.

Viewing 15 posts - 61 through 75 (of 169 total)

You must be logged in to reply to this topic. Login to reply