Lots of Key Lookups vs. UniqueIdentifier Clustered Index

  • opc.three:

    Currently we have an integer identity, so I imagine I could draw a line in the sand and say all records with an identity of < X go to partition 1, otherwise partition 2.

    Right now we do generate some of the UUIDs in the application layer. I imagine I'll have the application reach out to the dB for the UUID... not ideal, but I can't think of a better way to do this.

    For generating sequential GUIDs I'm using a sequence and newID() and mashing them together. I do this instead of using newSequentialID() because I need some randomness in my UUID.

    This is what I've come up with for getting the randomized, sequential UUID:

    CREATE FUNCTION makeGUID

    (

    @GUID AS UNIQUEIDENTIFIER,

    @number AS BIGINT

    )

    RETURNS UNIQUEIDENTIFIER

    AS

    BEGIN

    RETURN CAST(LEFT(@guid, 24) + RIGHT(CONVERT(VARCHAR(16), CAST(@number AS VARBINARY(50)), 2), 12) AS UNIQUEIDENTIFIER)

    END

    GO

    A couple of things I don't like about it. One, I'd rather use bitwise operators to mash these values up rather than string functions but the data is too wide (128 bits).

    Also, I have to pass newID() and NEXT VALUE FOR in as parameters because I can't seem to find a tricky way of using them in a UDF.

  • Why do you need randomness in your UUID?

    There are no special teachers of virtue, because virtue is taught by the whole community.
    --Plato

  • Our web-based app exposes these IDs in the URL. Although their is additional security checking when executing queries based on the passed in URL UUID, in our shop it is generally viewed as additional security to have these randomized to some extent to prevent the end user from making simple changes and getting the "next" record.

    After reading further on Books Online (thanks @jeff Moden), it appears that NEWSEQUENTIALID() doesn't provide sequential UUIDs between restarts, so I have to build some sort of function for sequential UUIDs, whether or not I require them to be random-sequential or not.

    I did clean up my function a bit, removing variable length types, implicit casting, tightening up the lengths and removing some of the string manipulation:

    RETURN CAST(LEFT(CAST(@guid AS CHAR(36)), 24) + CONVERT(CHAR(16), CAST(@Number AS BINARY(6)), 2) AS UNIQUEIDENTIFIER)

  • dave-L (11/21/2012)


    Our web-based app exposes these IDs in the URL. Although their is additional security checking when executing queries based on the passed in URL UUID, in our shop it is generally viewed as additional security to have these randomized to some extent to prevent the end user from making simple changes and getting the "next" record.

    After reading further on Books Online (thanks @jeff Moden), it appears that NEWSEQUENTIALID() doesn't provide sequential UUIDs between restarts, so I have to build some sort of function for sequential UUIDs, whether or not I require them to be random-sequential or not.

    I did clean up my function a bit, removing variable length types, implicit casting, tightening up the lengths and removing some of the string manipulation:

    RETURN CAST(LEFT(CAST(@guid AS CHAR(36)), 24) + CONVERT(CHAR(16), CAST(@Number AS BINARY(6)), 2) AS UNIQUEIDENTIFIER)

    Uggh, you have been dealt a tough hand. You are spot on with regard to the "prediction" security issue, and on the "restart issue" with NEWSEQUENTIALID() which was the motivation behind my question about randomness and which partitioning key you would choose. Even if you employed partitioning there is a chance that after a restart NEWSEQUENTIALID() could theoretically spill into your "frozen partition." I think you are on the right track developing your own sequential (yet random :-)) UUID generator.

    There are no special teachers of virtue, because virtue is taught by the whole community.
    --Plato

  • Thanks for your help opc.three.

    One last thing: I hate to leave less than optimal code out there for posterity so here is my last, optimized function with all string manipulation removed 🙂

    CREATE FUNCTION makeGUID

    (

    @GUID AS UNIQUEIDENTIFIER,

    @number AS BIGINT

    )

    RETURNS UNIQUEIDENTIFIER

    AS

    BEGIN

    RETURN CAST(CAST(@guid AS BINARY(10)) + CAST(@Number AS BINARY(6)) AS UNIQUEIDENTIFIER)

    END

    GO

  • dave-L (11/21/2012)


    Thanks for your help opc.three.

    One last thing: I hate to leave less than optimal code out there for posterity so here is my last, optimized function with all string manipulation removed 🙂

    CREATE FUNCTION makeGUID

    (

    @GUID AS UNIQUEIDENTIFIER,

    @number AS BIGINT

    )

    RETURNS UNIQUEIDENTIFIER

    AS

    BEGIN

    RETURN CAST(CAST(@guid AS BINARY(10)) + CAST(@Number AS BINARY(6)) AS UNIQUEIDENTIFIER)

    END

    GO

    One query on that: is @GUID providing the randomness while number is just a counter? If so, I don't understand why you have them in that order in the returned value.

    Tom

  • Now wait a minute here.... I thought you had to use the UUID! If you can get away creating your own number, then why are we even worried about this problem? Preserve the external GUID in a column and do everything internally with an IDENTITY value.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.
    "Change is inevitable... change for the better is not".

    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)
    Intro to Tally Tables and Functions

  • Hi Tom.

    It's counter intuitive, but that is how SQL orders UniqueIdentifiers.

  • Jeff, there are zillions of lines of app code that have queries joining on and referencing the UUID.

  • dave-L (11/21/2012)


    Jeff, there are zillions of lines of app code that have queries joining on and referencing the UUID.

    So why are you talking about building a function? The Key Lookups just aren't going to be that bad. Certainly no worse than you maing a function.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.
    "Change is inevitable... change for the better is not".

    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)
    Intro to Tally Tables and Functions

  • dave-L (11/21/2012)


    Hi Tom.

    It's counter intuitive, but that is how SQL orders UniqueIdentifiers.

    :blush: One of my thicker moments! :blush:

    Tom

  • Hi Jeff. That's my whole question really. Is there a good way to fix this and is it worth trying to fix.

    True using a UDF to generate the UUIDs will have more over head than native NEWID(), but my thought was I would save so much time on the scores of thousands of key lookups that occur because my CI is on a useless identity that this would be worth it.

    Dave

  • CELKO,

    I'm learning a lot though this process, thinking much more carefully about how I should organize my CIs.

    Dave

  • CELKO (11/22/2012)


    Most often non-RDBMS people use their generated physical locator as the cluster because this is how a magnetic tape or disk access modle of data would have done it.

    That may be true for non-RDBMS people but others realize the value of it in preventing page splits of the data involved in the CI. Page splits are horribly expensive things both CPU and I/O wise on heavily inserted OLTP tables. Page splits are a frequent cause of massive GUI timeouts.

    If a table is mostly static, then I agree... the clustered index should be used for something else. If the table suffers a lot of inserts, then a CI on a narrow, ever increasing, and unique column such as an IDENTITY column or a DATETIME column and an IDENTITY column as a uniquefier is generally the way to go. And, yes... it's very much like mag tape requirements. It worked for mag tape and it works here.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.
    "Change is inevitable... change for the better is not".

    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)
    Intro to Tally Tables and Functions

  • dave-L (11/22/2012)


    Hi Jeff. That's my whole question really. Is there a good way to fix this and is it worth trying to fix.

    True using a UDF to generate the UUIDs will have more over head than native NEWID(), but my thought was I would save so much time on the scores of thousands of key lookups that occur because my CI is on a useless identity that this would be worth it.

    Dave

    I'm still confused about it all. I thought you said that the GUID from the GUI was absolutely required and has been used in thousands of places throughout the code. The only way that you'd get rid of the Key Lookups is to do one of two things. Either put the CI on the GUID with a decent FILL FACTOR to prevent rampant page splits during inserts or build a covering index with the GUID as the leading column. All of this other work will do nothing to get rid of the Key Lookups.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.
    "Change is inevitable... change for the better is not".

    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)
    Intro to Tally Tables and Functions

Viewing 15 posts - 16 through 30 (of 41 total)

You must be logged in to reply to this topic. Login to reply