How to set the primary key for each table

  • Jeff Moden - Friday, August 3, 2018 8:49 AM

    coolchaitu - Thursday, August 2, 2018 6:02 AM

    barry.nielson - Tuesday, July 31, 2018 6:26 PM

    Thanks again Johnathan for your quick response.
    Unfortunately this is not generating a value in my messages window and not applying a PK to any table, even if I switch on the exec command. 
    I have removed the PK from 3 tables to see this working, but it seems not to, and sadly I do not have the know how to figure out why.  
    Thanks for your assistance.

    Try writing a script using cursor

    Why?

    Because it will loop through each table which is what the OP wants

  • coolchaitu - Friday, August 3, 2018 12:31 PM

    Jeff Moden - Friday, August 3, 2018 8:49 AM

    coolchaitu - Thursday, August 2, 2018 6:02 AM

    barry.nielson - Tuesday, July 31, 2018 6:26 PM

    Thanks again Johnathan for your quick response.
    Unfortunately this is not generating a value in my messages window and not applying a PK to any table, even if I switch on the exec command. 
    I have removed the PK from 3 tables to see this working, but it seems not to, and sadly I do not have the know how to figure out why.  
    Thanks for your assistance.

    Try writing a script using cursor

    Why?

    Because it will loop through each table which is what the OP wants

    No, he want's to generate code to add a column as a primary key for each table.  This does not require a loop.

    Michael L John
    If you assassinate a DBA, would you pull a trigger?
    To properly post on a forum:
    http://www.sqlservercentral.com/articles/61537/

  • barry.nielson - Monday, July 30, 2018 10:29 PM

    Don’t worry, kid, we will knock all hope out of you 🙂

    >> So I've imported a pile of tables over from MS Access and discovered the PRIMARY KEYs are not automatically set. <<

    You’re about to find out why SQL people hate MS Access. You really don’t understand what a key is. A key should be a sub–set of one or more columns in the table which is always unique in each row of that table. You can’t set it; you discover it in the nature the data that you are modeling. For example PRIMARY KEY for a table models a motor pool might be the VIN number of each automobile. But the PRIMARY KEY for a table models cities might be (longitude, latitude).

    There is no universal, magical, generic key in RDBMS. Looking for this is like looking for the universal elixir. The people who put an ID on every table in their schema are referred to as “ID-iots†in SQL circles.

    This comes from the original physical files that had physical record numbers the days before we had logical modeling. I like to think it goes back a little further than that to Kabbalah. That system of Jewish mysticism and magic believed that God put a magic number on everything in creation, and you could gain power over these things if you knew the Kabbalah number.

    >> So now I need to loop through each table and set the PRIMARY KEY for each table, typically this would be the ID field [sic]. <<

    might want to learn the difference between a column (it can be materialized or virtual, you can have constraints, defaults, references, etc. in short, columns are pretty smart. Since SQL is declarative language, we don’t use loops or if-then-else control flow. It’s very different way of programming the procedural languages you probably learned.

    In SQL, the term “field†refers to part of the scalar value that has no meaning by itself (i.e. the year part of a date).

    >> I have found the string that allows for modifying at least one table, being:

    ALTER TABLE
    Tbl_MyTable
    ADD
    PRIMARY KEY (ID)<<

    the prefix “tbl_†is design error that is so common and so awful that it has a name. It is called a Tibble. The reason it’s a design error is that it mixes metadata with data in the same table. This goes back to the early days of compilers, which didn’t have good symbol tables and it started with Fortran, and was continued in BASIC.

    >> But I have a ton of tables and need to be able to loop through them all. How do I do this? <<

    Hopefully, you will do it right and look at each individual table and try to find a natural key or an industry-standard (like the VIN for automobiles).

    If you want to take a stab at it, post some DDL and ask for help again. My rule of thumb when I was teaching SQL back in the 1980s and 90s was that you needed a year of full-time employment in a good shop (for people knew what they were doing) to become minimally proficient in SQL. I would change that estimate to 2 to 3 years now because language gotten complicated and so is the data. On the bright sign. This is less time than it takes to learn a foreign language 🙂

    Please post DDL and follow ANSI/ISO standards when asking for help. 

  • jcelko212 32090 - Sunday, August 26, 2018 4:06 PM

    barry.nielson - Monday, July 30, 2018 10:29 PM

    Don’t worry, kid, we will knock all hope out of you 🙂

    >> So I've imported a pile of tables over from MS Access and discovered the PRIMARY KEYs are not automatically set. <<

    You’re about to find out why SQL people hate MS Access. You really don’t understand what a key is. A key should be a sub–set of one or more columns in the table which is always unique in each row of that table. You can’t set it; you discover it in the nature the data that you are modeling. For example PRIMARY KEY for a table models a motor pool might be the VIN number of each automobile. But the PRIMARY KEY for a table models cities might be (longitude, latitude).

    There is no universal, magical, generic key in RDBMS. Looking for this is like looking for the universal elixir. The people who put an ID on every table in their schema are referred to as “ID-iots†in SQL circles.

    This comes from the original physical files that had physical record numbers the days before we had logical modeling. I like to think it goes back a little further than that to Kabbalah. That system of Jewish mysticism and magic believed that God put a magic number on everything in creation, and you could gain power over these things if you knew the Kabbalah number.

    >> So now I need to loop through each table and set the PRIMARY KEY for each table, typically this would be the ID field [sic]. <<

    might want to learn the difference between a column (it can be materialized or virtual, you can have constraints, defaults, references, etc. in short, columns are pretty smart. Since SQL is declarative language, we don’t use loops or if-then-else control flow. It’s very different way of programming the procedural languages you probably learned.

    In SQL, the term “field†refers to part of the scalar value that has no meaning by itself (i.e. the year part of a date).

    >> I have found the string that allows for modifying at least one table, being:

    ALTER TABLE
    Tbl_MyTable
    ADD
    PRIMARY KEY (ID)<<

    the prefix “tbl_†is design error that is so common and so awful that it has a name. It is called a Tibble. The reason it’s a design error is that it mixes metadata with data in the same table. This goes back to the early days of compilers, which didn’t have good symbol tables and it started with Fortran, and was continued in BASIC.

    >> But I have a ton of tables and need to be able to loop through them all. How do I do this? <<

    Hopefully, you will do it right and look at each individual table and try to find a natural key or an industry-standard (like the VIN for automobiles).

    If you want to take a stab at it, post some DDL and ask for help again. My rule of thumb when I was teaching SQL back in the 1980s and 90s was that you needed a year of full-time employment in a good shop (for people knew what they were doing) to become minimally proficient in SQL. I would change that estimate to 2 to 3 years now because language gotten complicated and so is the data. On the bright sign. This is less time than it takes to learn a foreign language 🙂

    Now Joe, that's particularly harsh--and utterly wrong.

    Identities (especially INT identities) have a number of very valuable traits that lend themselves to primary keys. They are small (4 bytes), they are sequential (meaning appended rows don't fragment the table), they are immutable (i.e. there is no need to change them, unlike natural keys which seem to change every time you blink) and they reduce the overheard of FKs quite nicely.

    As for not setting primary keys, that's quite frankly wrong. You set a primary key in Access the same way you do in SQL Server--by choosing which column(s) are part of that primary key. There is also the fact that primary keys are indeed *physical*--in that there is a physical mechanism underlying the logical nature of the database, and that physical mechanism involves physical constraints, such as the amount of movement involved in the drive arm, the amount of spin to allow data to move under the head.

    As such we cannot ignore the physics of data storage, it's just another constraint on performance. There may indeed be times when an identity isn't suited to be a primary key, but by no means is it a bad rule of thumb--in fact I would bet in most cases it's actually a superior solution.

    Certainly the physical characteristics of using an INT identity make it worthy of consideration, especially in regards to physical index size (smaller is faster) and in situations where detail records tend to be created in chronological proximity to their header record (such as invoices).

    To dismiss a very useful technique out of hand is ill advised, in my opinion.

  • >>Joe, that's particularly harsh--and utterly wrong. <<

    Harsh yes; wrong. No. Let's go back to basic relational principles and definitions.

    >> Identities (especially INTEGER identities) have a number of very valuable traits that lend themselves to PRIMARY KEYs. They are small (4 bytes), they are sequential (meaning appended rows don't fragment the table), they are immutable (i.e. there is no need to change them, unlike natural keys which seem to change every time you blink) and they reduce the overhead of FKs quite nicely. <<

    First of all, and IDENTITY is not a column. It is a table property. That means, strictly speaking, it can never be a key, because keys are made up of columns. Essentially, it is like identifying an automobile by the parking space (IDENTITY number) in one building (SQL Server instance on one particular piece of hardware) SQL Server started on UNIX on small machines and the UNIX file system is based on magnetic tape. The IDENTITY value is a count of physical insertion attempts into physical storage, not even successes. This is because that's how UNIX would have located a record on a tape file. The original implementation of cursors in SQL Server could only read forward in a table because the original tape drives on those early minicomputers could only read forward.

    In RDBMS, we want logical locators and not physical counts. While the parking space number would be small, your insurance company and the DMV want to see the VIN on their documents. The VIN actually identifies the vehicle.

    If I need a sequence in an attribute, then I use the CREATE SEQUENCE and have complete control over it. If I use IDENTITY, I can get gaps in my numbering and have no control over resets, starting and ending values, datatypes, etc.

    Unfortunately, since there is no way to verify or validate an IDENTITY, when you move your table to a new machine, you have a hard time keeping the old values. Look back at SQL forums over the years and see how many people tried reloading tables, finding out that the physical ordering of those physical records was different when it came off of storage into the new implementation. But did you notice that a VIN is constant no matter what database it is in? The DMV and my insurance company can put my new car in both their databases, without consulting me.

    The other advantages that real keys have are (1) they are often maintained by an outside organization and can be verified (2) they have check digits and can be validated. Verification and validation are very important properties. If you care about data integrity.

    I found that IDENTITY actually adds to the size of the database. It becomes an extra column that has no value, or meaning in a data model that I have to carry on every row of every table. This is one of the reasons that people who use a generic, magic ID automatically on every table are called "ID_iots" in data modeling slang.

    >>As such we cannot ignore the physics of data storage, it's just another constraint on performance. There may indeed be times when an IDENTITY isn't suited to be a PRIMARY KEY, but by no means is it a bad rule of thumb--in fact I would bet in most cases it's actually a superior solution. <<

    And I think every time I see it, it's a kludge. It is usually born out of laziness, or habits that carried over from UNIX 50 years ago. It's one of the ways you know somebody didn't have a good data model. And as far as going from logical to physical, after 35 years, I found that if you start with a good logical model, then the physical model almost writes itself on whatever platform you're using. This is why I can charge insanely large consulting fees; my schemas support moves to SQL products, whether or not they use indexes, columnar storage, hashed access, bit vectors, etc.

    >> Certainly the physical characteristics of using an INT IDENTITY make it worthy of consideration, especially in regards to physical index size (smaller is faster) and in situations where detail records tend to be created in chronological proximity to their header record (such as invoices). <<

    Have you ever seen an integer invoice number? If I throw an IDENTITY on an invoice header table, then I still need the multicolumn actual invoice number. The actual number will have some kind of encoding, to indicate the kind of invoice, who issued it, whether it's taxable or not, etc.

    The closest thing I've ever seen to that is the law in Italy that requires invoice numbers be "sequential" to make it easier to manually check for gaps in missing invoice numbers.

    Please post DDL and follow ANSI/ISO standards when asking for help. 

  • >> First of all, and IDENTITY is not a column. It is a table property. That means, strictly speaking, it can never be a key, because keys are made up of columns. Essentially, it is like identifying an automobile by the parking space (IDENTITY number) in one building (SQL Server instance on one particular piece of hardware) SQL Server started on UNIX on small machines and the UNIX file system is based on magnetic tape. The IDENTITY value is a count of physical insertion attempts into physical storage, not even successes. This is because that's how UNIX would have located a record on a tape file. The original implementation of cursors in SQL Server could only read forward in a table because the original tape drives on those early minicomputers could only read forward. <<

    Seriously? 🙂 That's being pedantic. In this context an identity is an INT column which is a simple serial number. You can use BIG INT the same way, but for many applications an INT field with a range of 2+ billion is sufficient.

    >> In RDBMS, we want logical locators and not physical counts. While the parking space number would be small, your insurance company and the DMV want to see the VIN on their documents. The VIN actually identifies the vehicle. <<

    An identifier is a logical locator. True, it has little intrinsic meaning, however that is in many ways beneficial, not detrimental. If you need something like a VIN then put a unique constraint on it and make it an alternate key.

    >> If I need a sequence in an attribute, then I use the CREATE SEQUENCE and have complete control over it. If I use IDENTITY, I can get gaps in my numbering and have no control over resets, starting and ending values, datatypes, etc. <<

    It seems you don't use identities much. You can control identities quite easily. And yes, you will get gaps. In many cases those gaps are a feature. One benefit with identities is the "true name" nature of them. Delete a row and no other row will ever have that value--which can be beneficial and actually add audit utility.

    >> Unfortunately, since there is no way to verify or validate an IDENTITY, when you move your table to a new machine, you have a hard time keeping the old values. Look back at SQL forums over the years and see how many people tried reloading tables, finding out that the physical ordering of those physical records was different when it came off of storage into the new implementation. But did you notice that a VIN is constant no matter what database it is in? The DMV and my insurance company can put my new car in both their databases, without consulting me. <<

    Um, what? 🙂 When you have identities in use the value becomes a FK in the new tables. If, for some reason, you need the identities themselves, they are quite easy to transfer intact, including preserving the next identity number to use.

    I fully admit a local identity isn't ideal for distributed database information, but then neither is a natural key. While VINs are indeed unique to a vehicle the huge infrastructure supporting VINs is not universal for all types of keys (issuing authority, for example) Identifying people by name is fraught with ambiguities, phone numbers likewise, etc. Those kinds of natural keys have no central authority to issue them.

    And don't get me started on the boondoggle that is the SSN...

    >> The other advantages that real keys have are (1) they are often maintained by an outside organization and can be verified (2) they have check digits and can be validated. Verification and validation are very important properties. If you care about data integrity. <<

    Only an extremely limited sub-set of natural keys have these properties, therefore only that sub-set can claim self-integrity. There's also the problem that often every company creates their own rules for keys. Part numbers and item serial numbers are the prime example. Not every company is part of an industry that has a "key supplier" like the VIN or ISBN.

    >>I found that IDENTITY actually adds to the size of the database. It becomes an extra column that has no value, or meaning in a data model that I have to carry on every row of every table. This is one of the reasons that people who use a generic, magic ID automatically on every table are called "ID_iots" in data modeling slang. <<

    Except... 🙂

    This "useless" column is the primary key. It is the "true name" of the row in all foreign keys. It is physically small, immutable, and easily used. These are not insignificant advantages, and certainly not useless.

    >> And I think every time I see it, it's a kludge. It is usually born out of laziness, or habits that carried over from UNIX 50 years ago. It's one of the ways you know somebody didn't have a good data model. <<

    It says nothing at all about the quality of the data model. Natural keys are often problematic for any number of reasons. A VIN is little more than a fancy serial number. It does, in fact, violate the principle of one column one unique piece of data. A VIN is a mish-mash of several different data columns. A serial number, check digit, product code, manufacturer...

    Is that good RDBMS design? No, not from a purist POV. Can it be valuable? Of course. Does it require additional processing to extract meaning from? Absolutely, if it's to be anything more than a simple text field.

    An identity, on the other hand, is a simple serial number yes, having only the meaning "this row and no other". But then again, usually that's all you need. A way to tie tables together. The logical model simply needs to identity the keys used to tie the table together for primary/foreign key relationships. Yes, it should have reasonable constraints (like uniqueness for alternate keys, etc.) but the actual logical model quality only requires primary and alternate keys. Requiring those keys be "natural" conveys no benefit.

    >> And as far as going from logical to physical, after 35 years, I found that if you start with a good logical model, then the physical model almost writes itself on whatever platform you're using. <<

    The choice between proxy and natural keys seldom plays any role in the quality of the model. In fact, abstracting the logical model too much can have serious consequences in the physical model. That's why GUIDs have such a horrid reputation. After all, a key that truly is globally unique is the holy grail of logical keys. Too bad the performance and size of GUIDs is so horrible...

    >> Have you ever seen an integer invoice number? If I throw an IDENTITY on an invoice header table, then I still need the multicolumn actual invoice number. The actual number will have some kind of encoding, to indicate the kind of invoice, who issued it, whether it's taxable or not, etc. <<

    I actually have seen many companies that use a simple sequential invoice number. Also? Never actually seen a company try to encode that kind of information into the invoice number. Most place that stuff in the invoice header, as Codd intended. 😀

Viewing 6 posts - 16 through 20 (of 20 total)

You must be logged in to reply to this topic. Login to reply