Database design to handle Millions of records

  • TheSQLGuru - Monday, October 23, 2017 12:32 PM

    I've consulted on SQL Server for over 2 decades for scores of clients ranging from Mom-and-Pops to Fortune 100 and guess what, they ALL produce crappy designs and crappy code (language edited for younger readers). It doesn't matter whether they do logical modeling first, just physical modeling, let the code write the data structures or just some whiteboard sessions - if that. Some of the most egregious stuff I have seen actually comes from the big boys with teams of staff to cover all phases of the application life cycle.

    In my experience, what really happens is that they do not do a proper logical model.  And that inevitably yields crappy designs.  Instead, they just slap a physical table together -- "design" is typically a vast overstatement. [And almost inevitably every table starts with the unkillable myth of the "identity as clustering key on 'every' table".  While at times a unique identifying number can exist in a logical design (at least for me, Celko may differ on that), although only when absolutely necessary, such as a customer number, an "identity" per se of course cannot.]

    One good hint: If you don't/can't have business people in the initial design meeting because they won't understand it, then you're not starting with a logical-only design.

    SQL DBA,SQL Server MVP(07, 08, 09) "Money can't buy you happiness." Maybe so, but it can make your unhappiness a LOT more comfortable!

  • ScottPletcher - Monday, October 23, 2017 4:19 PM

    TheSQLGuru - Monday, October 23, 2017 12:32 PM

    I've consulted on SQL Server for over 2 decades for scores of clients ranging from Mom-and-Pops to Fortune 100 and guess what, they ALL produce crappy designs and crappy code (language edited for younger readers). It doesn't matter whether they do logical modeling first, just physical modeling, let the code write the data structures or just some whiteboard sessions - if that. Some of the most egregious stuff I have seen actually comes from the big boys with teams of staff to cover all phases of the application life cycle.

    In my experience, what really happens is that they do not do a proper logical model.  And that inevitably yields crappy designs.  Instead, they just slap a physical table together -- "design" is typically a vast overstatement. [And almost inevitably every table starts with the unkillable myth of the "identity as clustering key on 'every' table".  While at times a unique identifying number can exist in a logical design (at least for me, Celko may differ on that), although only when absolutely necessary, such as a customer number, an "identity" per se of course cannot.]

    One good hint: If you don't/can't have business people in the initial design meeting because they won't understand it, then you're not starting with a logical-only design.

    In a logical model, yes.  The problem is that some people can't get out of that mindset and end up creating a 265-byte clustering key that isn't ever-increasing, nowhere close to being narrow and can change over time.  While that may make for a fine key in the logical world, it makes for a miserable clustering key in the physical world and I've seen the suffering that it can cause.

  • ScottPletcher - Monday, October 23, 2017 4:19 PM

    TheSQLGuru - Monday, October 23, 2017 12:32 PM

    I've consulted on SQL Server for over 2 decades for scores of clients ranging from Mom-and-Pops to Fortune 100 and guess what, they ALL produce crappy designs and crappy code (language edited for younger readers). It doesn't matter whether they do logical modeling first, just physical modeling, let the code write the data structures or just some whiteboard sessions - if that. Some of the most egregious stuff I have seen actually comes from the big boys with teams of staff to cover all phases of the application life cycle.

    In my experience, what really happens is that they do not do a proper logical model.  And that inevitably yields crappy designs.  Instead, they just slap a physical table together -- "design" is typically a vast overstatement. [And almost inevitably every table starts with the unkillable myth of the "identity as clustering key on 'every' table".  While at times a unique identifying number can exist in a logical design (at least for me, Celko may differ on that), although only when absolutely necessary, such as a customer number, an "identity" per se of course cannot.]

    One good hint: If you don't/can't have business people in the initial design meeting because they won't understand it, then you're not starting with a logical-only design.

    Heh.... I always end up with the myth that a clustered index always has to be based on some natural key.  As a result, we have to go through absolute hell to find everywhere that the natural key of Client and LoanNumber have been used in a couple of hundred places because of a merger and either or both the Client or the LoanNumber has changed.

    Also, and this is where the physical design comes in to play, a clustered index on an identity column isn't always present for the sake of being present although I'll almost always have such a thing.  I use it to control frequent multi-row inserts while still keeping the rest of the table accessible.

    Since this is physical design rather than logical design, I'm not sure why you brought that up in this particular conversation. 😉

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.
    "Change is inevitable... change for the better is not".

    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)
    Intro to Tally Tables and Functions

  • Ed Wagner - Monday, October 23, 2017 5:44 PM

    ScottPletcher - Monday, October 23, 2017 4:19 PM

    TheSQLGuru - Monday, October 23, 2017 12:32 PM

    I've consulted on SQL Server for over 2 decades for scores of clients ranging from Mom-and-Pops to Fortune 100 and guess what, they ALL produce crappy designs and crappy code (language edited for younger readers). It doesn't matter whether they do logical modeling first, just physical modeling, let the code write the data structures or just some whiteboard sessions - if that. Some of the most egregious stuff I have seen actually comes from the big boys with teams of staff to cover all phases of the application life cycle.

    In my experience, what really happens is that they do not do a proper logical model.  And that inevitably yields crappy designs.  Instead, they just slap a physical table together -- "design" is typically a vast overstatement. [And almost inevitably every table starts with the unkillable myth of the "identity as clustering key on 'every' table".  While at times a unique identifying number can exist in a logical design (at least for me, Celko may differ on that), although only when absolutely necessary, such as a customer number, an "identity" per se of course cannot.]

    One good hint: If you don't/can't have business people in the initial design meeting because they won't understand it, then you're not starting with a logical-only design.

    In a logical model, yes.  The problem is that some people can't get out of that mindset and end up creating a 265-byte clustering key that isn't ever-increasing, nowhere close to being narrow and can change over time.  While that may make for a fine key in the logical world, it makes for a miserable clustering key in the physical world and I've seen the suffering that it can cause.

    I'm sure that does happen as well.  But just because someone fails at converting a logical model into a proper physical model is no reason to skip creating genuine logical models in the future.  Instead, redo that conversion and learn from that mistake.

    Besides, there are occasions when a () long text key is appropriate.  Remember, the "narrow, ever-increasing, etc." mantra is a general rule, not the absolute commandment some take it as.  I prefer that the majority of access to be via the clustered index, not via a covering index, when possible.  Because then I/O is fully shared across all users.  Yes, separate nonclustered covering indexes help each given query(ies) they're designed for, but having to maintain all those mini-tables can really hurt overall performance. 
    Edit: This does increase the importance of proper table design, including avoiding too wide a row.  If a very wide row is inevitable, sometimes that forces using identity or a similar "faux" key.  Often, simply creating a separate 1-1 to off-load rarely accessed but wide columns is the best approach.  "But that's a separate table."  Well a nonclustered index is effectively that also, but in a separate table data only has to be stored once, not both in the main table and in covering index(es).

    One possible hint on clus key issue(s): if you have your "standard" clustered identity column, but some other column is the lead key on multiple (3+) non-clus covering indexes, and/or such lookups are much more heavily used than the identity column, it's a good candidate, at least, for the clustering key.

    SQL DBA,SQL Server MVP(07, 08, 09) "Money can't buy you happiness." Maybe so, but it can make your unhappiness a LOT more comfortable!

Viewing 4 posts - 16 through 18 (of 18 total)

You must be logged in to reply to this topic. Login to reply