Are There That Many GUIDs?

  • I've started putting into place what Eric suggested so each Entity has it's own range of unique ID's, and given I don't anticipate any Entity having more than 1 billion records I'm using Numeric(9,0) with the first two digits identifying which Entity table it comes from (Customer = 10, Product = 20, etc). This should work great...

    In my situation the reason I want to have database wide unique primary keys in each table is so we can have entity independent tables to save addresses, comments, etc. This way these entity independent tables can have a LinkUID of numeric(9,0) which can be populated by the PK of any entity table. Granted I could create entity-level comment and address tables, but since the tables would be identical across all entities I'd rather not go that route.

    Any caveats to this being an alternative to a GUID? Numeric(9,0) is only 5 bytes in the database as opposed to 16 bytes, plus being the numbers would be sequential the indexes and clustered indexes should work like a champ.

    Sam

  • SanDroid (10/12/2010)


    David Walker-278941 (10/12/2010)


    I originally didn't like GUIDs because I thought they used too much "space". They really don't. Disk space is cheap, and GUIDs are only 16 bytes long.

    As for the "are there that many" question, it's interesting to use an identifier that is essentially infinite. There are enough GUIDs to enumerate every atom in the known universe (at least THIS universe).

    David,

    Spoken like a true developer. My hat goes off to you for being so brave and bold in the conviction of your statements.

    Can I get the contact information for business owners or companies of your last ten projects that you used GUID's for table column row id's? I have a daughter in college learning knowledge and I need the extra cash.

    David Walker-278941 (10/12/2010)


    There are enough GUIDs to enumerate every atom in the known universe (at least THIS universe).

    I am afraid that most of the physicists in this known universe might disagree with your statment. If you would have kept it to just planets and stars maybe you would be closer to facts.

    The approximate number of GUIDs is 3.4X10^38. The approximate number of atoms in the observable universe is minimally 10^80, per microwave background measures. The actual quote on the number of GUIDs, if I remember correctly, is we have enough to assign one to every atom in Earth. (A typical star has about 10^57 hydrogen atoms in it, so it doesn't even begin to hit that limit.) As an interesting coincidence, IPv6 has the same number of possible values as GUIDs do.

    Not that it really matters, but I thought I'd set the record straight just for the sake of trivia.

    - Gus "GSquared", RSVP, OODA, MAP, NMVP, FAQ, SAT, SQL, DNA, RNA, UOI, IOU, AM, PM, AD, BC, BCE, USA, UN, CF, ROFL, LOL, ETC
    Property of The Thread

    "Nobody knows the age of the human race, but everyone agrees it's old enough to know better." - Anon

  • e3h4 (10/12/2010)


    People often assume you need to use GUIDs for unique values across databases and for replication. However, there is an easy way to get around that using an int or other suitable IDENTITY data type: vary the seed and the increment. If you have two databases, one database can start at 1 with an increment of 2, the other can start at 2 and increment by 2. No collisions. Or have one start with a seed of 1 increment 1 and the other at 1 billion - or some other large number that realistically your database will never hit. Or if you have created the true killer app then have one use negative (1/2 of the available values in signed variable) values, e.g.,

    CREATE TABLE [dbo].[Minus1](

    [IdMinus] [int] IDENTITY(-1,-1) NOT NULL,

    [Test] [nchar](10) NOT NULL,

    CONSTRAINT [PK_Minus1] PRIMARY KEY CLUSTERED

    (

    [IdMinus] ASC

    )WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]

    ) ON [PRIMARY]

    Different variations on this theme can be used to support more than 2 databases.

    Or use a tinyint or smallint column to combine with the identity column, with the added column being unique per instance of the database.

    - Gus "GSquared", RSVP, OODA, MAP, NMVP, FAQ, SAT, SQL, DNA, RNA, UOI, IOU, AM, PM, AD, BC, BCE, USA, UN, CF, ROFL, LOL, ETC
    Property of The Thread

    "Nobody knows the age of the human race, but everyone agrees it's old enough to know better." - Anon

  • GSquared (10/18/2010)


    SanDroid (10/12/2010)


    David Walker-278941 (10/12/2010)


    I originally didn't like GUIDs because I thought they used too much "space". They really don't. Disk space is cheap, and GUIDs are only 16 bytes long.

    As for the "are there that many" question, it's interesting to use an identifier that is essentially infinite. There are enough GUIDs to enumerate every atom in the known universe (at least THIS universe).

    David,

    Spoken like a true developer. My hat goes off to you for being so brave and bold in the conviction of your statements.

    Can I get the contact information for business owners or companies of your last ten projects that you used GUID's for table column row id's? I have a daughter in college learning knowledge and I need the extra cash.

    David Walker-278941 (10/12/2010)


    There are enough GUIDs to enumerate every atom in the known universe (at least THIS universe).

    I am afraid that most of the physicists in this known universe might disagree with your statment. If you would have kept it to just planets and stars maybe you would be closer to facts.

    The approximate number of GUIDs is 3.4X10^38. The approximate number of atoms in the observable universe is minimally 10^80, per microwave background measures. The actual quote on the number of GUIDs, if I remember correctly, is we have enough to assign one to every atom in Earth. (A typical star has about 10^57 hydrogen atoms in it, so it doesn't even begin to hit that limit.) As an interesting coincidence, IPv6 has the same number of possible values as GUIDs do.

    Not that it really matters, but I thought I'd set the record straight just for the sake of trivia.

    Theoretically speaking, there may be enough GUIDs to cover every atom exisitng on Earth at any given moment in time. However, even the most efficient futuristic storage medium would require several hundred atoms to encode a single GUID, so you practically can't build a database to assign one GUID to each atom on Earth. The database itself would have to reside on another planet or spacial dimension. Just querying the database would destroy existing atoms and new ones.

    "Do not seek to follow in the footsteps of the wise. Instead, seek what they sought." - Matsuo Basho

  • Eric Russell 13013 (10/18/2010)


    GSquared (10/18/2010)


    SanDroid (10/12/2010)


    David Walker-278941 (10/12/2010)


    I originally didn't like GUIDs because I thought they used too much "space". They really don't. Disk space is cheap, and GUIDs are only 16 bytes long.

    As for the "are there that many" question, it's interesting to use an identifier that is essentially infinite. There are enough GUIDs to enumerate every atom in the known universe (at least THIS universe).

    David,

    Spoken like a true developer. My hat goes off to you for being so brave and bold in the conviction of your statements.

    Can I get the contact information for business owners or companies of your last ten projects that you used GUID's for table column row id's? I have a daughter in college learning knowledge and I need the extra cash.

    David Walker-278941 (10/12/2010)


    There are enough GUIDs to enumerate every atom in the known universe (at least THIS universe).

    I am afraid that most of the physicists in this known universe might disagree with your statment. If you would have kept it to just planets and stars maybe you would be closer to facts.

    The approximate number of GUIDs is 3.4X10^38. The approximate number of atoms in the observable universe is minimally 10^80, per microwave background measures. The actual quote on the number of GUIDs, if I remember correctly, is we have enough to assign one to every atom in Earth. (A typical star has about 10^57 hydrogen atoms in it, so it doesn't even begin to hit that limit.) As an interesting coincidence, IPv6 has the same number of possible values as GUIDs do.

    Not that it really matters, but I thought I'd set the record straight just for the sake of trivia.

    Theoretically speaking, there may be enough GUIDs to cover every atom exisitng on Earth at any given moment in time. However, even the most efficient futuristic storage medium would require several hundred atoms to encode a single GUID, so you practically can't build a database to assign one GUID to each atom on Earth. The database itself would have to reside on another planet or spacial dimension. Just querying the database would destroy existing atoms and new ones.

    Also, atoms don't answer when you call them (must be more cat than dog), so assigning GUIDs to them wouldn't be all that useful anyway. 🙂

    - Gus "GSquared", RSVP, OODA, MAP, NMVP, FAQ, SAT, SQL, DNA, RNA, UOI, IOU, AM, PM, AD, BC, BCE, USA, UN, CF, ROFL, LOL, ETC
    Property of The Thread

    "Nobody knows the age of the human race, but everyone agrees it's old enough to know better." - Anon

  • Anyone that has tried to merge two databases that use identity values would have paid their year's salary to have had the original designers use guids for PKs. A COMB guid will perform nearly as well as an integer and not have the nasty problems that identity values have.

    Frankly, it is a GOOD thing that developers cannot remember them. Surrogate keys should never be displayed to users. Using integers simply invites some knucklehead developer to show the PK to the user.

    Robert Frasca (10/12/2010)There's also something to be said for the ability to assign ranges of values. I have used this capability many times when integrating data from diverse data sources into one table, i.e. zero to one billion represents one source, one to two billion another source etc

    So what do you do on the day you discover that developers have reused your ranges? What if you have 10K sources instead of 10? Frankly, that's a solution that is fraught with peril.

    The values are long and obscure. This makes them difficult for users to type correctly, and more difficult for users to remember.

    That's a bonus. There is no reason a developer should be regularly manually typing guids and copy/paste is your friend.

    The values are random and cannot accept any patterns that may make them more meaningful to users.

    Again, a bonus. We are talking about a *surrogate* key. It should never be displayed to the user.

    There is no way to determine the sequence in which uniqueidentifier values were generated.

    Another bonus since it does not delude you into thinking that the order represents the order of events such as with an identity value. You shouldn't rely on identity values to imply a sequence of events. That's for what datetime columns were designed.

  • 1) DISK SPACE isn't what is important, it is THROUGHPUT, and that is dramatically affected by the quadruple-int-sized GUIDs. MEMORY is a precious commodity (on most systems) for the buffer pool - also hobbled by GUIDs.

    I disagree. Yes it is 4x as big but we are talking about 12 *bytes* in an era of many gigabytes of memory and unbelievable disk performance. The difference in most small and medium sized databases is negligible.

  • Robert Frasca (10/12/2010)


    Other than for merge and updatable transactional replication, I've yet to hear an argument as to why an identifier has to be globallyunique. Sure, it sounds grand but could someone explain to me how ProductID '6F9619FF-8B86-D011-B42D-00C04FC964FF' is somehow "more" unique or better than ProductID 27?

    Imagine your database driven product system is going to be used by 1000 customers. Now imagine one of your customers buys another one of your customers. Imagine trying to merge two databases built on identity values. From personal experience, it is the 10th level of hell. Btw, it is a misnomer to think that proponents of guids want them to literally be "globally" unique. The just need to be more unique than a sequential value.

    Another business case is deep object hierarchies. To implement a save with identities requires a round trip for each parent value where it does not with a guid. Further, in a OO middle-tier, you can have confidence that if two objects have the say ID value they are the same instance of the same row in the db. Can't do that with integers.

    My social security number is an integer. In fact, my SSN is so unique my identity can be stolen!

    Your SSN is also not unique nor is it a surrogate key. It is in fact an "intelligent" key where the numbers are reused!

  • Eric Russell 13013 (10/18/2010)


    In Oracle there is an object called a SEQUENCE, which is used in situations where we would typically use an identity in SQL Server. The difference is that it can't be used in a default constraint, so it's not tied to a specific table, and the next available value is fetched explicitly using PL/SQL function calls within a stored procedure or trigger.

    In appropriate cases, SQL Server users can use a Sequence Table to approximate Oracle Sequences:

    http://sqlblog.com/blogs/paul_white/archive/2010/10/19/sequence-tables.aspx

  • Thomas-282729 (10/24/2010)


    1) DISK SPACE isn't what is important, it is THROUGHPUT, and that is dramatically affected by the quadruple-int-sized GUIDs. MEMORY is a precious commodity (on most systems) for the buffer pool - also hobbled by GUIDs.

    I disagree. Yes it is 4x as big but we are talking about 12 *bytes* in an era of many gigabytes of memory and unbelievable disk performance. The difference in most small and medium sized databases is negligible.

    A GUID is actually 16 bytes, but perhaps you only meant that a GUID is 12 bytes more than a 4 byte Int. In addition to table storage, we must also consider that a unique clustered key is used by SQL Server as a table's row id, so a GUID used as a clustered index gets included in all indexes as well.

    Regarding the database application situation where where you occasionally need to merge databases, for example client A merges with client B, sequential GUIDs created using NEWSEQUENTIALID() can potentially be duplicated across multiple database instances. Now, in the event of a merger, the client will typically want to segment the data by profit center or division for accounting purposes, so that client_id or center_id column can be combined with a sequential integer identifier to make a key that is reliably unique across all database instances.

    "Do not seek to follow in the footsteps of the wise. Instead, seek what they sought." - Matsuo Basho

  • Thomas-282729 (10/24/2010)


    Robert Frasca (10/12/2010)


    Other than for merge and updatable transactional replication, I've yet to hear an argument as to why an identifier has to be globallyunique. Sure, it sounds grand but could someone explain to me how ProductID '6F9619FF-8B86-D011-B42D-00C04FC964FF' is somehow "more" unique or better than ProductID 27?

    Imagine your database driven product system is going to be used by 1000 customers. Now imagine one of your customers buys another one of your customers. Imagine trying to merge two databases built on identity values. From personal experience, it is the 10th level of hell. Btw, it is a misnomer to think that proponents of guids want them to literally be "globally" unique. The just need to be more unique than a sequential value.

    That may be a consideration for commercially produced software, but for applications written in house and only used by the company who wrote the software (and this sort of thing is all over the place), I don't think you can write into the development plan "what if we get acquired?" You plan for what you have, and if you have to write your own app and support it for years, I can testify as a DBA in that environment that GUIDs are useless for troubleshooting and make things take so much longer.

  • Thomas-282729 (10/24/2010)


    1) DISK SPACE isn't what is important, it is THROUGHPUT, and that is dramatically affected by the quadruple-int-sized GUIDs. MEMORY is a precious commodity (on most systems) for the buffer pool - also hobbled by GUIDs.

    I disagree. Yes it is 4x as big but we are talking about 12 *bytes* in an era of many gigabytes of memory and unbelievable disk performance. The difference in most small and medium sized databases is negligible.

    It amazes me that many folks forget about such things as the load on backups and maintenance plans and that just because you have memory and blazing disk speeds that you can use it any way you want. It also amazes me that people still plan on small and medium sized databases without ever thinking they may provide huge utility and grow to a size where such small differences are no longer negligible.

    For those that think a GUID is truly unique throughout the world, I suggest you make sure the column has a unique constraint on it... UPS found out the hard way that, although it's incredibly rare to get a duplicate, it is possible. Even Microsoft acknowledges that the type 4 GUIDs it uses are NOT guaranteed to be unique.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.
    "Change is inevitable... change for the better is not".

    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)
    Intro to Tally Tables and Functions

  • jeff.mason (10/25/2010)


    Thomas-282729 (10/24/2010)


    Robert Frasca (10/12/2010)


    Other than for merge and updatable transactional replication, I've yet to hear an argument as to why an identifier has to be globallyunique. Sure, it sounds grand but could someone explain to me how ProductID '6F9619FF-8B86-D011-B42D-00C04FC964FF' is somehow "more" unique or better than ProductID 27?

    Imagine your database driven product system is going to be used by 1000 customers. Now imagine one of your customers buys another one of your customers. Imagine trying to merge two databases built on identity values. From personal experience, it is the 10th level of hell. Btw, it is a misnomer to think that proponents of guids want them to literally be "globally" unique. The just need to be more unique than a sequential value.

    That may be a consideration for commercially produced software, but for applications written in house and only used by the company who wrote the software (and this sort of thing is all over the place), I don't think you can write into the development plan "what if we get acquired?" You plan for what you have, and if you have to write your own app and support it for years, I can testify as a DBA in that environment that GUIDs are useless for troubleshooting and make things take so much longer.

    In the event of a merger, new primary keys conforming to the parent company's database schema are assigned as transactional records are migrated over. For example, a new identity key, GUID, alpha-numeric code (or whatever) is generated and assigned to each acquired purchase order during the bulk load process. The orginal purchase order id used by the legacy system is typically retained in a seperate column for accounting or customer support reference only. The same goes for situations where a company migrates from a legacy in-house system to a 3rd party database. Going forward, nobody cares about the old legacy system or it's making use of it ids.

    "Do not seek to follow in the footsteps of the wise. Instead, seek what they sought." - Matsuo Basho

  • Eric Russell 13013 (10/25/2010)[hrIn the event of a merger, new primary keys conforming to the parent company's database schema are assigned as transactional records are migrated over. For example, a new identity key, GUID, alpha-numeric code (or whatever) is generated and assigned to each acquired purchase order during the bulk load process. The orginal purchase order id used by the legacy system is typically retained in a seperate column for accounting or customer support reference only. The same goes for situations where a company migrates from a legacy in-house system to a 3rd party database. Going forward, nobody cares about the old legacy system or it's making use of it ids.

    Completely agree with this and that is how we dealt with companies we acquired -- putting their old legacy key in a special separate field for reference.

  • It amazes me that many folks forget about such things as the load on backups and maintenance plans and that just because you have memory and blazing disk speeds that you can use it any way you want.

    Again, in small to medium sized database (say under 1 TB), you can comfortably perform a full backup. You should only apply restrictions when the hardware forces you to. Imagine the "wasteful" developers that used two bytes to store the year instead of an eight byte datetime. Oh, the horror.

    It also amazes me that people still plan on small and medium sized databases without ever thinking they may provide huge utility and grow to a size where such small differences are no longer negligible.

    It amazes me that top notch developers still try to prematurely optimize. Medium sized databases top out at about 1-2 TB. After they you can start considering that they are getting into the large database realm. The vast majority of the databases will never hit that level even using clustered indexes 4x as large as an integer.

    Furthermore, if you never, ever show the surrogate key to the user, you can change the surrogate structure behind the scenes should you ever hit 100 TB.

    For those that think a GUID is truly unique throughout the world, I suggest you make sure the column has a unique constraint on it... UPS found out the hard way that, although it's incredibly rare to get a duplicate, it is possible.

    Which is why you have unique constraints to ensure they really are unique. It should be noted that it is also possible (more probable even) to get a collisions with identity values.

Viewing 15 posts - 76 through 90 (of 169 total)

You must be logged in to reply to this topic. Login to reply