Worst Practices - Not Using Primary Keys and Clustered Indexes

Question

Worst Practices - Not Using Primary Keys and Clustered Indexes

Viewing 15 posts - 61 through 75 (of 184 total)

You must be logged in to reply to this topic. Login to reply

harvinder Newbie Points: 2 More actions · Answer 1

1) Article is good but required some more information how integrity is at risk without having primary key/clustered index in controlled application....

2) more relevant to primary key than clustered index.....

asifrashid Right there with Babe Points: 762 More actions · Answer 2

Not having primary key/cluster index is a bad practice, but not having FK/relationships is disaster. It is the main ingredient to maintain data integrity and Data integrity is vital part of any database.

I am totally agreed with Andy; no way can we ignore PK’s, but intelligent primary key can play better role to maintain data integrity during data migration process. It can impose data duplication check and is used for reference to build relationships. PK on identity col is a dump pkey, which generates IDs during inserting record in the DB and one has to use unique key as well to avoid data duplication.

I have gone thru implementing PK/FK throughout a big application which has more than 200 tables but without relationships and proper indexes. I can portray how stable this application is right now after implementing Pk/FK with cluster index on PK /index on FK.

Asif

mcdonep SSC Journeyman Points: 87 More actions · Answer 3

Hi Andy,

I will agree that from a logical perspective, all entities must have a primary key. I will also agree that in the transition to the physical realm, if these entities remain intact then the key should also follow. Depending on how the application (not the user) intends on accessing information a surrogate key may be the best selection in the physical environment.

We may, however, agree to disagree, on the generalized implementation of primary keys and clustered indexes. This is especially true in heavy transactional environments where relational integrity is secondary to transactional integrity and capacity/performance is critical. In correctly constructed transactional systems the business object and its internal data structure become the guardian of integrity. RI management overhead is as much a performance problem as it is redundant. This is especially true in n-tier transaction environments.

OLAP environments are different beasts altogether. While the Atomic warehouse is populating data-marts indexes are invaluable. And if you are providing drill down capability into the atomic warehouse you must have a unique identifier. However, when you update it, a clustered index on large tables can be cost prohibitive. I have found that it is necessary to drop all indexes to make large batch inserts to some of our largest tables. You can't do this with a primary key.

The bottom line here is that the role of the DBA is more than just to execute rote and generalized rules. It is to make considered and pragmatic decisions about the physical deployment of logical models based upon the performance characteristics of the physical platform and the targeted application environment. This is what makes them so valuable and why experience is so critical.

Andy Warren SSC Guru Points: 119747 More actions · Answer 4

McDoneP,

Cant really argue with any of that. Im here to evangelize that you should stick to the rules until you have the experience to know when its ok to break them, and even then...sometimes things change and you should still reconsider before breaking them! By change, I mean that sometimes things that weren't possible/doable in one version may become possible in another version. Or what you can't do on last years hardware becomes totally plausible on this years.

I like your comment about the business objects even if I don't agree 100%. I like to put as many constraints in place at the db level as I can, even if I mirror those in the app somewhere. Really depends on how you use the db. If you exclusively use the app, using the bo's makes sense. Where I work we do a LOT of data manipulation external to the app, nice to have a few safeguards in place. Another one of those experienced decision points maybe.

Thanks for your comments!

Andy

http://qa.sqlservercentral.com/columnists/awarren/

Andy
SQLAndy - My Blog!
Connect with me on LinkedIn
Follow me on Twitter

Chris Hedgate One Orange Chip Points: 25041 More actions · Answer 5

quote:
I thought I had a reference that shows where a clustered index is better than a heap in terms of performance, but I can't find it. If I do I'll post it.

Steve, could it be the issue of forward-pointers (they are used in heaps, but not with clustered indexes) that you are referring to?

As for the article, in my opinion it is great. AI do agree with most of the comments made here in the discussion about identity and such, but as Andy says, the recommendations in the article is better than nothing at all.

Chris Hedgate @ Apptus Technologies (http://www.apptus.se)

--
Chris Hedgate http://www.hedgate.net/
Contributor to the Best of SQL Server Central volumes
Articles: http://www.sqlservercentral.com/columnists/chedgate/

Steve Rosenbach SSCrazy Points: 2181 More actions · Answer 6

This is an excellent thread -- I learned a number of new things! Thanks, Andy.

As to the use of Identity columns as Primary Keys verus using "natural" keys -- I've designed tables both ways and I think Identity columns are a very good way to go. For example, when you have several layers of 1:many realtionships, you invariably end up with natural keys that are 3, 4, 5 or more columns long. This means that you will have to often join on 3, 4, 5 or more columns.

I tend to use an int[eger] Identity column as a PK and then, if there is a natural key in the table (even one composed of several columns), I'll make it a Unique Index.

This seems to give the best of both worlds -- I can do my joins with my efficient "int" PKs and I have the unique "natural" keys for searching, filtering, etc, as well as the unique constraint that prevents inadvertant duplication of the natural key.

The one situation that might give me pause in using Identity PKs is when the database might get replicated. Then I'll have to evaluate whether I'll want to create the tables with the NOT FOR REPLICATION option and use different Identity seeding on the subscribers or use some other scheme, such as GUIDs for PKs (which may slow down joins.)

Best regards,

SteveR

Stephen Rosenbach

Arnold, MD

Andy Warren SSC Guru Points: 119747 More actions · Answer 7

Identity cols and replication are a trial. Merge is set up to work with them to a degree, letting you specify ranges for subscribers. Transactional doesn't have that flexibility. It only really matters when you'll be doing inserts on the subscriber. If you need inserts I think GUID's have no peer.

Andy

http://qa.sqlservercentral.com/columnists/awarren/

Andy
SQLAndy - My Blog!
Connect with me on LinkedIn
Follow me on Twitter

Eric Wilson SSChasing Mays Points: 611 More actions · Answer 8

quote:
...where relational integrity is secondary to transactional integrity and capacity/performance is critical. In correctly constructed transactional systems the business object and its internal data structure become the guardian of integrity.
...

Please clarify what you mean by "transactional integrity."

Do you propose to enforce data integrity *outside* the database in lieu of within it? (As was done years ago and eventually abandoned because of the many and disparate problems with such an approach.)

It continues to amuse me how often we in technology re-invent the same flawed wheels year after year everytime they put a new coat of paint on it.

jwiner SSCrazy Points: 2241 More actions · Answer 9

One rumor I have heard is that all DBMS's do not implement primary keys in the same fashion and because of that, if you wanted to create a cross platform, cross application database application, you could not use primary keys, but instead would need to put a unique constraint on the table in place of the primary key.

Is this correct? Is there any validity to this rumor?

david wootton Old Hand Points: 338 More actions · Answer 10

I think you state the obvious in an "idle" way to workaround relational algebra in creating and defining data models. Databases depend on solid data structures and relationships with a minimum normalisation threshold (BCNF)of level 3. What this means is there should be no contention over choosing a "seed" column to act as a primary key on a table, after all in the long term it's more overhead to INSERT and DELETE as well as fragmentation. Which leads me nicely on to blowing your clustered index theory out of the water as performance enhancements are only gained on mainly read only databases, NOT heavily used production databases. The moral of the story is data modelling. All data structures and all DML statements are considered and implemented at a model level, not on the whim of a DBA or senior developer. After all who administers the database administrator ?

Simon Sabin SSCrazy Eights Points: 8142 More actions · Answer 11

Well joined the party a bit late but will add my 2 pennies anyway.

Rules are great, but should be there as a basis, i.e to be challenged

FKs really have to have indexes if you are going to delete records from the parent. Ever tried deleting data when the FK table has millions of rows but no index.

Size is key to performance. If you have a PK on a char(10) and its clustered then its 2.5 times larger than using an integer. And all your other indexes are based on it and thus a knock on affect.

Using composite keys results in multi column joins, increased index size, and in worse performance. It does however allow you to filter grandchild tables based on the a value from the parent without joining to the child and parent (if the filter column is part of the PK)

I am a great believer that SQL Server is too easy and has shot itself in the foot, as developers have designed bad databases, bad sps and thus SQL Server has crap performance. It is interesting that at an MS presentation they were very adamant that one of the key components of Yukon will be documents and guidelines of how the CLR code should be used with the server.

Simon Sabin

Co-author of SQL Server 2000 XML Distilled

http://www.amazon.co.uk/exec/obidos/ASIN/1904347088

Simon Sabin
SQL Server MVP

http://sqlblogcasts.com/blogs/simons

ckempste SSCoach Points: 17983 More actions · Answer 12

Hi

First of all from a comment made:

>>I am a great believer that SQL Server is too easy and has shot itself in the >>foot, as developers have designed bad databases, bad sps and thus SQL Server >>has crap performance.

is a complete junk, I have a 20Gb db with over 5800 online users, it performs very well and I am more than happy with the overall performance, its the developers and their transactions you need to watch, coupled with indexing etc etc (the list goes on).

As for the article:

a) pks are essential, not only in terms of normalisation and the whole 1st principal of a relational db, but also in terms of the db doing the validation and not your so called "app"

b) save your clustered index for columns that really benefit from it.

c) key size shouldnt be your dominating answer, clustered bulk-column index can be very beneficial when design correctly against your app. remember, we are doing key lookups primarily, size is not a major issue in many cases.

d) index defrag commands, you cluster, you get your data defraged as well (to a degree) unlike heaps!

e) pkey should always be indexed, clustered or not! small tables may result in a scan but its of little concern.

Cheers

Chris K

Chris Kempster
www.chriskempster.com
Author of "SQL Server Backup, Recovery & Troubleshooting"
Author of "SQL Server 2k for the Oracle DBA"

ckempste SSCoach Points: 17983 More actions · Answer 13

btw.. i didnt even vote yet, where did my 3 stars come from?! sorry, it should be 5 stars! 🙂

Chris Kempster
www.chriskempster.com
Author of "SQL Server Backup, Recovery & Troubleshooting"
Author of "SQL Server 2k for the Oracle DBA"

Gabor Nyul SSCrazy Eights Points: 9459 More actions · Answer 14

Andy,

If Primary keys AND (clustered) indices are so important why MS is not using them on the system tables?

there is ablosutly no PK on system tables and on some tables there are no index at all.

BTW I do agree with you.

I do implement PK on every of my tables even tough I think that on several small tables (wich will remain small) the index is wrong.

I am not happy to put clustered index on the Identity column (what I do use quite often).

If I design the database I can define where to put the clustered indexes. Otherwise i'm analysing it vai the Profiler. The clustered index candidate are for me the range selects, the "like" queries end the composit indexes.

Bye
Gabor

Andy Warren SSC Guru Points: 119747 More actions · Answer 15

I dont know why they dont do it on system tables. I guess because they enforce it internally and dont want to risk having someone 'tune' it and break something because they removed a constraint. You'd think under the hood they'd be using the same mechanism, but who knows. I'll try to look around, perhaps Inside SQL Server 2000 has something, dont remember seeing it but a good place to start.

Andy

http://qa.sqlservercentral.com/columnists/awarren/

Andy
SQLAndy - My Blog!
Connect with me on LinkedIn
Follow me on Twitter