Database design to handle Millions of records

Question

Database design to handle Millions of records

Vinayak15580

SSC-Addicted

Points: 436
More actions
October 3, 2017 at 4:01 am

#340578

Dear Techies,
Please suggest how should design database to handle millions of records. And also let me know strong points to handle that much of records instead of another database(Like Oracle, DB2, Postgresql..)
Some point i had considered for the same. Please guide.
1. Set fill factor
2. Filegroups.(Is it possible to create groups for ex. month wise file group 1 for Jan, File group 2 for Feb...)
3. Indexing
Regards,
Vinayak

Viewing 15 posts - 1 through 15 (of 18 total)

You must be logged in to reply to this topic. Login to reply

TheSQLGuru SSC Guru Points: 134017 More actions · Answer 1

Vinayak15580 - Tuesday, October 3, 2017 4:01 AM
Dear Techies,
Please suggest how should design database to handle millions of records. And also let me know strong points to handle that much of records instead of another database(Like Oracle, DB2, Postgresql..)
Some point i had considered for the same. Please guide.
1. Set fill factor
2. Filegroups.(Is it possible to create groups for ex. month wise file group 1 for Jan, File group 2 for Feb...)
3. Indexing
Regards,
Vinayak

I'm sorry, but that is nothing at all useful to go on. I can process millions of records in Excel and Access, even a flat file.

Best,
Kevin G. Boles
SQL Server Consultant
SQL MVP 2007-2012
TheSQLGuru on googles mail service

Eric M Russell SSC Guru Points: 125255 More actions · Answer 2

Read up on third normal form, primary keys, index covered joins.

"Do not seek to follow in the footsteps of the wise. Instead, seek what they sought." - Matsuo Basho

sgmunson SSC Guru Points: 110588 More actions · Answer 3

Vinayak15580 - Tuesday, October 3, 2017 4:01 AM
Dear Techies,
Please suggest how should design database to handle millions of records. And also let me know strong points to handle that much of records instead of another database(Like Oracle, DB2, Postgresql..)
Some point i had considered for the same. Please guide.
1. Set fill factor
2. Filegroups.(Is it possible to create groups for ex. month wise file group 1 for Jan, File group 2 for Feb...)
3. Indexing
Regards,
Vinayak

Yeah, I have to agree with Kevin. You're basically asking folks for advice on database design as if it is something relatively simple and doesn't need much detail to be understood or made use of. That would be a colossal mistake. It takes years of experience to become good at that task, so don't plan on thinking that a single YouTube video is going to magically educate you on that task. There are far too many considerations and we have zero detail on your objectives beyond a larger record count. You'll have to be considerably more specific, and if you don't already have an existing set of ideas on what you think should be done, then it's unlikely you'll get a lot of assistance, if for no other reason than folks don't usually get excited about trying to help someone when that someone has already clearly demonstrated a significant lack of knowledge in a given area of expertise, and can't even provide any details about what they want to accomplish, and then want significant help in doing something they've never actually done before....

Steve?(aka sgmunson)?:) 🙂 :)?
Health & Nutrition
Rent Servers for Income (picks and shovels strategy)

ScottPletcher SSC Guru Points: 100262 More actions · Answer 4

The single most important thing is to do a logical data model before doing a physical database model.

Hints:
1) Google "logical data modeling".
2) Fillfactor, indexes and filegroups are not part of a logical model. The most important thing first is the logical data structure. You can convert it to a physical model, and worry about performance, later.

SQL DBA,SQL Server MVP(07, 08, 09) "Money can't buy you happiness." Maybe so, but it can make your unhappiness a LOT more comfortable!

Jeff Moden SSC Guru Points: 1000539 More actions · Answer 5

It always comes to that. "and worry about performance later". While I whole heartedly agree that the logical model must come first, I find that the only time that many people worry about performance is when performance has gone sour. My recommendation is to consider the physical world as you're designing the logical world. Contrary to popular belief, I've found that the two are not mutually exclusive and that doing only logical modeling frequently causes design blocks in the physical world labeled "Insert Miracle Here".

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.
"Change is inevitable... change for the better is not".

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)
Intro to Tally Tables and Functions

ScottPletcher SSC Guru Points: 100262 More actions · Answer 6

Jeff Moden - Sunday, October 22, 2017 9:48 PM
It always comes to that. "and worry about performance later". While I whole heartedly agree that the logical model must come first, I find that the only time that many people worry about performance is when performance has gone sour. My recommendation is to consider the physical world as you're designing the logical world. Contrary to popular belief, I've found that the two are not mutually exclusive and that doing only logical modeling frequently causes design blocks in the physical world labeled "Insert Miracle Here".

That's fundamentally opposed to the essential nature of a logical model, which is supposed to assume "infinite capacity" to run. In every case I've seen, letting physical considerations corrupt the logical design has turned out to cause problems, not solve them.

If the person/people converting the logical model to a physical one can't handle the performance aspects of it, they need additional assistance for that task. But that doesn't mean you should corrupt the logical modeling process.

SQL DBA,SQL Server MVP(07, 08, 09) "Money can't buy you happiness." Maybe so, but it can make your unhappiness a LOT more comfortable!

sgmunson SSC Guru Points: 110588 More actions · Answer 7

ScottPletcher - Monday, October 23, 2017 7:43 AM
Jeff Moden - Sunday, October 22, 2017 9:48 PM
It always comes to that. "and worry about performance later". While I whole heartedly agree that the logical model must come first, I find that the only time that many people worry about performance is when performance has gone sour. My recommendation is to consider the physical world as you're designing the logical world. Contrary to popular belief, I've found that the two are not mutually exclusive and that doing only logical modeling frequently causes design blocks in the physical world labeled "Insert Miracle Here".
That's fundamentally opposed to the essential nature of a logical model, which is supposed to assume "infinite capacity" to run. In every case I've seen, letting physical considerations corrupt the logical design has turned out to cause problems, not solve them.
If the person/people converting the logical model to a physical one can't handle the performance aspects of it, they need additional assistance for that task. But that doesn't mean you should corrupt the logical modeling process.

Then I have to believe you've led a sheltered life... I agree with Jeff. Logical modelling should be undertaken with full knowledge and understanding of what the physical model is likely going to need to be, and NOT just ignore it and make that ludicrous assumption of infinite capacity. I can't tell you how many times I've seen the crap results of doing exactly that. Ever hear of EAV ? (Entity, Attribute, Value). Talk about a prime example of putting the cart before the horse...

Steve?(aka sgmunson)?:) 🙂 :)?
Health & Nutrition
Rent Servers for Income (picks and shovels strategy)

ScottPletcher SSC Guru Points: 100262 More actions · Answer 8

sgmunson - Monday, October 23, 2017 7:56 AM
ScottPletcher - Monday, October 23, 2017 7:43 AM
Jeff Moden - Sunday, October 22, 2017 9:48 PM
AIt always comes to that. "and worry about performance later". While I whole heartedly agree that the logical model must come first, I find that the only time that many people worry about performance is when performance has gone sour. My recommendation is to consider the physical world as you're designing the logical world. Contrary to popular belief, I've found that the two are not mutually exclusive and that doing only logical modeling frequently causes design blocks in the physical world labeled "Insert Miracle Here".
That's fundamentally opposed to the essential nature of a logical model, which is supposed to assume "infinite capacity" to run. In every case I've seen, letting physical considerations corrupt the logical design has turned out to cause problems, not solve them.
If the person/people converting the logical model to a physical one can't handle the performance aspects of it, they need additional assistance for that task. But that doesn't mean you should corrupt the logical modeling process.
Then I have to believe you've led a sheltered life... I agree with Jeff. Logical modelling should be undertaken with full knowledge and understanding of what the physical model is likely going to need to be, and NOT just ignore it and make that ludicrous assumption of infinite capacity. I can't tell you how many times I've seen the crap results of doing exactly that. Ever hear of EAV ? (Entity, Attribute, Value). Talk about a prime example of putting the cart before the horse...

I've designed multiple systems for multiple-billion-dollar companies. Any needed adjustments to the logical model can, and should, be made on the physical side, not the logical side. Too many people are in too big a hurry to get to code, when data design is far more critical.

A physical model from 30 years ago is virtually worthless. A logical model should still be basically sound, just needing adjusted to account for newer entities and attributes.

SQL DBA,SQL Server MVP(07, 08, 09) "Money can't buy you happiness." Maybe so, but it can make your unhappiness a LOT more comfortable!

sgmunson SSC Guru Points: 110588 More actions · Answer 9

ScottPletcher - Monday, October 23, 2017 8:02 AM
sgmunson - Monday, October 23, 2017 7:56 AM
ScottPletcher - Monday, October 23, 2017 7:43 AM
Jeff Moden - Sunday, October 22, 2017 9:48 PM
AIt always comes to that. "and worry about performance later". While I whole heartedly agree that the logical model must come first, I find that the only time that many people worry about performance is when performance has gone sour. My recommendation is to consider the physical world as you're designing the logical world. Contrary to popular belief, I've found that the two are not mutually exclusive and that doing only logical modeling frequently causes design blocks in the physical world labeled "Insert Miracle Here".
That's fundamentally opposed to the essential nature of a logical model, which is supposed to assume "infinite capacity" to run. In every case I've seen, letting physical considerations corrupt the logical design has turned out to cause problems, not solve them.
If the person/people converting the logical model to a physical one can't handle the performance aspects of it, they need additional assistance for that task. But that doesn't mean you should corrupt the logical modeling process.
Then I have to believe you've led a sheltered life... I agree with Jeff. Logical modelling should be undertaken with full knowledge and understanding of what the physical model is likely going to need to be, and NOT just ignore it and make that ludicrous assumption of infinite capacity. I can't tell you how many times I've seen the crap results of doing exactly that. Ever hear of EAV ? (Entity, Attribute, Value). Talk about a prime example of putting the cart before the horse...
I've designed multiple systems for multiple-billion-dollar companies. Any needed adjustments to the logical model can, and should, be made on the physical side, not the logical side. Too many people are in too big a hurry to get to code, when data design is far more critical.
A physical model from 30 years ago is virtually worthless. A logical model should still be basically sound, just needing adjusted to account for newer entities and attributes.

I hear you Scott, but I rarely work for the billion dollar juggernauts, who can afford to get the talent they need and while they don't always do that, it's at least more likely. The rest of the universe seems to think that fixing things is somehow less expensive than doing them the right way the first time - because their actions clearly indicate that doing things right the first time just isn't going to happen because they mostly cheap out on their IT budgets. For those of us that so often live in that world, we're lucky to see anything that resembles common sense, much less a good idea, at least where data is concerned. I can also tell you that a physical model I created roughly 35 years ago (and before relational databases existed), would function quite nicely today, despite the changes in technology, because it took into account both the physical and logical aspects of the data.

Steve?(aka sgmunson)?:) 🙂 :)?
Health & Nutrition
Rent Servers for Income (picks and shovels strategy)

ScottPletcher SSC Guru Points: 100262 More actions · Answer 10

sgmunson - Monday, October 23, 2017 8:14 AM
ScottPletcher - Monday, October 23, 2017 8:02 AM
sgmunson - Monday, October 23, 2017 7:56 AM
ScottPletcher - Monday, October 23, 2017 7:43 AM
Jeff Moden - Sunday, October 22, 2017 9:48 PM
AIt always comes to that. "and worry about performance later". While I whole heartedly agree that the logical model must come first, I find that the only time that many people worry about performance is when performance has gone sour. My recommendation is to consider the physical world as you're designing the logical world. Contrary to popular belief, I've found that the two are not mutually exclusive and that doing only logical modeling frequently causes design blocks in the physical world labeled "Insert Miracle Here".
That's fundamentally opposed to the essential nature of a logical model, which is supposed to assume "infinite capacity" to run. In every case I've seen, letting physical considerations corrupt the logical design has turned out to cause problems, not solve them.
If the person/people converting the logical model to a physical one can't handle the performance aspects of it, they need additional assistance for that task. But that doesn't mean you should corrupt the logical modeling process.
Then I have to believe you've led a sheltered life... I agree with Jeff. Logical modelling should be undertaken with full knowledge and understanding of what the physical model is likely going to need to be, and NOT just ignore it and make that ludicrous assumption of infinite capacity. I can't tell you how many times I've seen the crap results of doing exactly that. Ever hear of EAV ? (Entity, Attribute, Value). Talk about a prime example of putting the cart before the horse...
I've designed multiple systems for multiple-billion-dollar companies. Any needed adjustments to the logical model can, and should, be made on the physical side, not the logical side. Too many people are in too big a hurry to get to code, when data design is far more critical.
A physical model from 30 years ago is virtually worthless. A logical model should still be basically sound, just needing adjusted to account for newer entities and attributes.
I hear you Scott, but I rarely work for the billion dollar juggernauts, who can afford to get the talent they need and while they don't always do that, it's at least more likely. The rest of the universe seems to think that fixing things is somehow less expensive than doing them the right way the first time - because their actions clearly indicate that doing things right the first time just isn't going to happen because they mostly cheap out on their IT budgets. For those of us that so often live in that world, we're lucky to see anything that resembles common sense, much less a good idea, at least where data is concerned. I can also tell you that a physical model I created roughly 35 years ago (and before relational databases existed), would function quite nicely today, despite the changes in technology, because it took into account both the physical and logical aspects of the data.

I get most of what you are saying. Although I can't imagine how a physical design for a nonrelational file system would hold up for a modern relational database. That just doesn't make sense.

International Paper unfortunately also gave developers too much control for SQL Sever (but not for Oracle) dbs, at least initially. However, after one with truly disastrous performance from their "design", they "allowed" the SQL DBA to design the SQL databases as well.

Btw, Oracle existed 35 years ago (and of course the granddaddy of them all, System R, before that).

SQL DBA,SQL Server MVP(07, 08, 09) "Money can't buy you happiness." Maybe so, but it can make your unhappiness a LOT more comfortable!

sgmunson SSC Guru Points: 110588 More actions · Answer 11

ScottPletcher - Monday, October 23, 2017 8:25 AM
sgmunson - Monday, October 23, 2017 8:14 AM
ScottPletcher - Monday, October 23, 2017 8:02 AM
sgmunson - Monday, October 23, 2017 7:56 AM
ScottPletcher - Monday, October 23, 2017 7:43 AM
Jeff Moden - Sunday, October 22, 2017 9:48 PM
AIt always comes to that. "and worry about performance later". While I whole heartedly agree that the logical model must come first, I find that the only time that many people worry about performance is when performance has gone sour. My recommendation is to consider the physical world as you're designing the logical world. Contrary to popular belief, I've found that the two are not mutually exclusive and that doing only logical modeling frequently causes design blocks in the physical world labeled "Insert Miracle Here".
That's fundamentally opposed to the essential nature of a logical model, which is supposed to assume "infinite capacity" to run. In every case I've seen, letting physical considerations corrupt the logical design has turned out to cause problems, not solve them.
If the person/people converting the logical model to a physical one can't handle the performance aspects of it, they need additional assistance for that task. But that doesn't mean you should corrupt the logical modeling process.
Then I have to believe you've led a sheltered life... I agree with Jeff. Logical modelling should be undertaken with full knowledge and understanding of what the physical model is likely going to need to be, and NOT just ignore it and make that ludicrous assumption of infinite capacity. I can't tell you how many times I've seen the crap results of doing exactly that. Ever hear of EAV ? (Entity, Attribute, Value). Talk about a prime example of putting the cart before the horse...
I've designed multiple systems for multiple-billion-dollar companies. Any needed adjustments to the logical model can, and should, be made on the physical side, not the logical side. Too many people are in too big a hurry to get to code, when data design is far more critical.
A physical model from 30 years ago is virtually worthless. A logical model should still be basically sound, just needing adjusted to account for newer entities and attributes.
I hear you Scott, but I rarely work for the billion dollar juggernauts, who can afford to get the talent they need and while they don't always do that, it's at least more likely. The rest of the universe seems to think that fixing things is somehow less expensive than doing them the right way the first time - because their actions clearly indicate that doing things right the first time just isn't going to happen because they mostly cheap out on their IT budgets. For those of us that so often live in that world, we're lucky to see anything that resembles common sense, much less a good idea, at least where data is concerned. I can also tell you that a physical model I created roughly 35 years ago (and before relational databases existed), would function quite nicely today, despite the changes in technology, because it took into account both the physical and logical aspects of the data.
I get most of what you are saying. Although I can't imagine how a physical design for a nonrelational file system would hold up for a modern relational database. That just doesn't make sense.
International Paper unfortunately also gave developers too much control for SQL Sever (but not for Oracle) dbs, at least initially. However, after one with truly disastrous performance from their "design", they "allowed" the SQL DBA to design the SQL databases as well.
Btw, Oracle existed 35 years ago (and of course the granddaddy of them all, System R, before that).

Yes, the company existed, but their first commercially available product that could run on a PC wasn't released until 1983, a year AFTER my model was created and implemented on a machine much less capable than an IBM PC. See the history here: http://www.dba-oracle.com/t_history_oracle.htm

Steve?(aka sgmunson)?:) 🙂 :)?
Health & Nutrition
Rent Servers for Income (picks and shovels strategy)

sgmunson SSC Guru Points: 110588 More actions · Answer 12

ScottPletcher - Monday, October 23, 2017 8:25 AM
sgmunson - Monday, October 23, 2017 8:14 AM
ScottPletcher - Monday, October 23, 2017 8:02 AM
sgmunson - Monday, October 23, 2017 7:56 AM
ScottPletcher - Monday, October 23, 2017 7:43 AM
Jeff Moden - Sunday, October 22, 2017 9:48 PM
AIt always comes to that. "and worry about performance later". While I whole heartedly agree that the logical model must come first, I find that the only time that many people worry about performance is when performance has gone sour. My recommendation is to consider the physical world as you're designing the logical world. Contrary to popular belief, I've found that the two are not mutually exclusive and that doing only logical modeling frequently causes design blocks in the physical world labeled "Insert Miracle Here".
That's fundamentally opposed to the essential nature of a logical model, which is supposed to assume "infinite capacity" to run. In every case I've seen, letting physical considerations corrupt the logical design has turned out to cause problems, not solve them.
If the person/people converting the logical model to a physical one can't handle the performance aspects of it, they need additional assistance for that task. But that doesn't mean you should corrupt the logical modeling process.
Then I have to believe you've led a sheltered life... I agree with Jeff. Logical modelling should be undertaken with full knowledge and understanding of what the physical model is likely going to need to be, and NOT just ignore it and make that ludicrous assumption of infinite capacity. I can't tell you how many times I've seen the crap results of doing exactly that. Ever hear of EAV ? (Entity, Attribute, Value). Talk about a prime example of putting the cart before the horse...
I've designed multiple systems for multiple-billion-dollar companies. Any needed adjustments to the logical model can, and should, be made on the physical side, not the logical side. Too many people are in too big a hurry to get to code, when data design is far more critical.
A physical model from 30 years ago is virtually worthless. A logical model should still be basically sound, just needing adjusted to account for newer entities and attributes.
I hear you Scott, but I rarely work for the billion dollar juggernauts, who can afford to get the talent they need and while they don't always do that, it's at least more likely. The rest of the universe seems to think that fixing things is somehow less expensive than doing them the right way the first time - because their actions clearly indicate that doing things right the first time just isn't going to happen because they mostly cheap out on their IT budgets. For those of us that so often live in that world, we're lucky to see anything that resembles common sense, much less a good idea, at least where data is concerned. I can also tell you that a physical model I created roughly 35 years ago (and before relational databases existed), would function quite nicely today, despite the changes in technology, because it took into account both the physical and logical aspects of the data.
I get most of what you are saying. Although I can't imagine how a physical design for a nonrelational file system would hold up for a modern relational database. That just doesn't make sense.
International Paper unfortunately also gave developers too much control for SQL Sever (but not for Oracle) dbs, at least initially. However, after one with truly disastrous performance from their "design", they "allowed" the SQL DBA to design the SQL databases as well.
Btw, Oracle existed 35 years ago (and of course the granddaddy of them all, System R, before that).

Yes, the company existed, but their first commercially available product that could run on a PC wasn't released until 1983, a year AFTER my model was created and implemented on a machine much less capable than an IBM PC. See the history here: http://www.dba-oracle.com/t_history_oracle.htm

Steve?(aka sgmunson)?:) 🙂 :)?
Health & Nutrition
Rent Servers for Income (picks and shovels strategy)

TheSQLGuru SSC Guru Points: 134017 More actions · Answer 13

I've consulted on SQL Server for over 2 decades for scores of clients ranging from Mom-and-Pops to Fortune 100 and guess what, they ALL produce crappy designs and crappy code (language edited for younger readers). It doesn't matter whether they do logical modeling first, just physical modeling, let the code write the data structures or just some whiteboard sessions - if that. Some of the most egregious stuff I have seen actually comes from the big boys with teams of staff to cover all phases of the application life cycle.

Best,
Kevin G. Boles
SQL Server Consultant
SQL MVP 2007-2012
TheSQLGuru on googles mail service

Jeff Moden SSC Guru Points: 1000539 More actions · Answer 14

ScottPletcher - Monday, October 23, 2017 8:02 AM
sgmunson - Monday, October 23, 2017 7:56 AM
ScottPletcher - Monday, October 23, 2017 7:43 AM
Jeff Moden - Sunday, October 22, 2017 9:48 PM
AIt always comes to that. "and worry about performance later". While I whole heartedly agree that the logical model must come first, I find that the only time that many people worry about performance is when performance has gone sour. My recommendation is to consider the physical world as you're designing the logical world. Contrary to popular belief, I've found that the two are not mutually exclusive and that doing only logical modeling frequently causes design blocks in the physical world labeled "Insert Miracle Here".
That's fundamentally opposed to the essential nature of a logical model, which is supposed to assume "infinite capacity" to run. In every case I've seen, letting physical considerations corrupt the logical design has turned out to cause problems, not solve them.
If the person/people converting the logical model to a physical one can't handle the performance aspects of it, they need additional assistance for that task. But that doesn't mean you should corrupt the logical modeling process.
Then I have to believe you've led a sheltered life... I agree with Jeff. Logical modelling should be undertaken with full knowledge and understanding of what the physical model is likely going to need to be, and NOT just ignore it and make that ludicrous assumption of infinite capacity. I can't tell you how many times I've seen the crap results of doing exactly that. Ever hear of EAV ? (Entity, Attribute, Value). Talk about a prime example of putting the cart before the horse...
I've designed multiple systems for multiple-billion-dollar companies. Any needed adjustments to the logical model can, and should, be made on the physical side, not the logical side. Too many people are in too big a hurry to get to code, when data design is far more critical.
A physical model from 30 years ago is virtually worthless. A logical model should still be basically sound, just needing adjusted to account for newer entities and attributes.

There is no question in my mind of how valuable the logical model is and is, as you imply, about the only thing about databases that can withstand the test of time. I have no argument at all there. The problem is just exactly what you say, though... too many people are in too big a hurry to get to the code... and they usually do so without a physical model design. That's why, for new projects, I suggest that the physical model (which doesn't actually require any coding to be realized) at least be considered during the logical design. That way, people aren't tempted to do the code before a physical design has been laid out.

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.
"Change is inevitable... change for the better is not".

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)
Intro to Tally Tables and Functions