Worst Practices - Not Using Primary Keys and Clustered Indexes

  • quote:


    I dont know why they dont do it on system tables....


    The impression I've gotten is they don't enforce much of anything in system tables, other than methods of keeping someone from making them more useful.

    We use (i.e., I am stuck with) a vendor software package that shall go nameless, which employs in excess of 40,000 user tables in one database. That's right; that is not a misprint. (The really interesting thing is that, in a database with 40,000 tables, remarkably few of them exist because of normalization.) One thing you discover quickly in such circumstances is that Microsoft does not invest a lot of resources in helping to maximize metadata queries. They may have some "under the hood" mechanisms in lieu of explicit keys and indexes, but if they do, their effect is subtle, to say the most.

    I have had to write many metadata queries, and, for the intensive ones, have found it to be in my best interest to create temporary tables that I load with the contents of 'sysobjects', 'syscolumns', 'sysindexes', etc., and then index the heck out of them.

  • I'm not exactly sure what you mean by "metadata queries". What is it you are trying to do on the system tables?

  • quote:


    I'm not exactly sure what you mean by "metadata queries". What is it you are trying to do on the system tables?


    I don't know what the terminology is in UK for writing queries against the definitional data of one's databases. Here in the colonies ;-), I've always heard the term 'metadata' used to mean 'data about data.' That means, within this context, the contents of the 'sys%' tables in each database, and 'master' as well.

    I use the metadata in queries all the time, especially for the offending database with 40,000 tables. For example, there isn't 'one' table to contain information about participants in the system; there are eighty-five of them, and that's only for a one-year span (that is, last year's participants have another set of eighty-five tables). It's very clumsy to write SQL for so many tables, so we resort to using dynamic SQL. Fortunately, the table names follow a naming convention. We can declare a CURSOR for the table names that match our specification, and within the CURSOR loop, run our query against each of the eighty-five tables. The alternative is a SELECT with eighty-four UNION operators, and then, one such query for each year of participation.

  • I have exactly the same problem with my SAP databases as Lee Dise told.

    We hae 25000+ tables, all the system tables are terribly fragmented and I cannot do anything to maintain them. No reindex, no indexdefrag, nothing.

    And if the data access to the system tables are slow the data access (compile time) to your data tables can be slow



    Bye
    Gabor

  • quote:


    We hae 25000+ tables, all the system tables are terribly fragmented and I cannot do anything to maintain them. No reindex, no indexdefrag, nothing....


    Ayuh, it sounds like we are inhabiting different torture chambers in the same Purgatory. ;-)))

    Have you ever tried using Enterprise Manager to generate the SQL code for creating the schema? It takes hours to do this on our system. I wound up coding my own procedures for SQL generation, and to make it run with any kind of speed, I had essentially to dup the system tables as temp tables and then index them.

  • Pretty interesting. Lee, any chance of you writing that up as an article and showing us the code too? Probably not many with that many tables, but for those who do...could be handy indeed. At least you could vent some!

    Andy

    http://qa.sqlservercentral.com/columnists/awarren/

  • It is just impossible to generate the schema script using EM.

    I suppose that MS didn't think about managing as many tables. SQLServer is able to manage a lot of data within a few hundred of tables but it seems not to be able handle too many objects. This one shows me why we cannot do maintenance tasks on the system tables

    I thing the way how MS has programmed the listbox control of that form is just not right if you want to put in there several thousands of items. I had to shut down my EM after some hours.

    I am also writing my own schema generating and maintenance scripts.

    I can tell you. It is just a nightmare when I have to reindex all of my tables. I have to keep trace what tables are already done, what is remaining, taking care not to blow up the logs etc.

    I have tried to ask the guys on several forums to see how they are doing their maintenance of system and data tables on VLDB within a nearly 24/7 environment but I didn't have any useful answare.

    So I have found it out by myself



    Bye
    Gabor

  • The first thing I do following the creation of a new database (and back up of the master) is to expand the master database and user database to a size where they are not likely to need to expand for some time.

    I'm probably mistaken, but I believe this helps reduce fragmentation.

    I also note that BOL says that SQL Server decides whether the system tables and indices need defragging when it starts up and reacts accordingly.

    Obviously, in a 24/7 operation you can't keep restarting the SQL box, but isn't there some way you can have some sort of log shipping system so that you can flip/flop between two or more SQL boxes and do a periodic restart of the passive box?

  • quote:


    Pretty interesting. Lee, any chance of you writing that up as an article and showing us the code too?.... At least you could vent some!


    I'd be delighted to do so, but I see two risks: 1) that my article could devolve into a rant; and 2) the vendor company could take umbrage at any unfavorable characterizations of their software.

    I'd have to sanitize the prose, for sure.

  • quote:


    I have tried to ask the guys on several forums to see how they are doing their maintenance of system and data tables on VLDB within a nearly 24/7 environment but I didn't have any useful answare.

    So I have found it out by myself


    Sounds like you are even worse off than I am. Our shop is not 24/7, more like 10/5, and the number of indexes is fairly sparse. Most of the tables are fairly small, in fact, with only a small proportion going over 10,000 rows.

    In fact, it's a bit of a stretch even to call what we have a "database". It's more of a data "swamp", a swamp being a place where water flows into, but not out of, easily. Replace "water" with "data", and you have the concept. The system is an OLTP system that functions at eighty-five nodes using dBASE IV as the local database at every node. At night, an automagic process pushes the data from each node into a centralized SQL Server swamp -- my "problem child" database -- where it is "consolidated" (a figure of speech) and made available for region-level querying.

    Reindexing is therefore easy because the database is relieved of the pressure of supporting OLTP during the day, and used only for relatively few central queries, where day-old data doesn't harm the data analyses.

    I believe that our system would be much better if the vendor were a) to abolish the autonomous local databases and use SQL Server as the OLTP engine, and b) reduce the number of tables (which could easily be done by adding a 'year' column). There would be about 250 tables remaining, which is certainly within the realm of manageability.

    But the ingenuity of marketers is a wonder to behold. In their sales pitch (not to me, I hadn't been hired yet), they actually emphasized the "beauty" of the decentralized data approach, promoting the usefulness of a tool that operates even when the central database is down. They forgot to mention that the trade-off is having about ninety versions of "the truth" to sift through whenever you have to report on something.

    Edited by - Lee Dise on 01/14/2003 07:22:30 AM

  • Sounds like a very old legacy system if it is using dBaseIV.

    If the supplier was to upgrade to MSDE then at least there would be the possibility of using replication in one form or another.

    Have you the option of adding a middle tier anywhere that amalgamates up your data before it gets into your "data swamp"?

    I love your definition of data swamp!

    I have something I call a "Brain Drizzle", its like "Brain Storm" but a more accurate description of what goes on in a committee!

  • quote:


    Sounds like a very old legacy system if it is using dBaseIV.


    I think it is. Actually, you can tell just by looking at the design of some of the tables, if you're something of a software archaeologist. On some of them, for example, absences are recorded as fixed-length date strings and encoded explanations that are embedded within large VARCHAR globules; so if you want a list of all absences, you have to parse through the globules. This is something you used to see all the time in old COBOL applications, but is kind of a rarety inside of a modern DBMS.

    In other words, it looks like at the point when this system was ported from an IBM mainframe, rather than normalize the design, they just took a pair of pinking shears, cut their flat file template out of the old application, and sewed it into the new.

    quote:


    If the supplier was to upgrade to MSDE then at least there would be the possibility of using replication in one form or another.


    I question whether replication would even be needed, but you're right, if independent nodes were necessary, certainly SQL Server has the tools for it.

    quote:


    Have you the option of adding a middle tier anywhere that amalgamates up your data before it gets into your "data swamp"?


    No, it gets into the swamp first, but then there is a "consolidation" process that comes with the product, which distills the data for the eighty-five nodes into a couple of tables per entity. What we do here is consolidate the data even further, by copying data into a different database where we have only one table per business entity. This makes querying a lot easier for the programmers, but of course it adds another potential breaking point for data consistency.

    quote:


    I love your definition of data swamp!...


    Thanks... it's an analogy that just seems naturally to occur to someone who lives in eastern Virginia, in what we locals call the "Tidewater" region. I live about twenty five miles from the Great Dismal Swamp. We have the James River, the Elizabeth River, the Albemarle Sound, the Chesapeake Bay, and the Atlantic Ocean to drain off excess water, but somehow it still manages to pile up faster than we can get rid of it.

  • As I read you, Lee, I have to recognise that my life as DBA is not so terrible as yours.

    Therefore I am looking to the sun and smiling for ever .

    I understand your problem. In my former company I had to deal with local databases (It was Access 1.1) and the data exchange was just a nightmare. And at that time there was no MSDE, no replication, nothing.

    Anyhow just to give you a feedback of our tables:

    The biggest is a tables over 21 million rows and the size is 15 GB

    And we have 12000+ empty tables.

    David, The problem is not the master database but the system tables (sysobjects, syscolumns...) WITHIN the user database because there are a lot of objects.



    Bye
    Gabor

  • Lee, go for it anyway. You've got some good info to distribute! Lots of places where the tools dont work so good. In my case its 250 db's, things like log file placement and file groups become ridiculous, replicating all 250 is probably possible but difficult. Takes EM a while to paint too, though I think a lot of that is its reliance on DMO. I like DMO, but objects tend to be heavy.

    Andy

    http://qa.sqlservercentral.com/columnists/awarren/

  • quote:


    As I read you, Lee, I have to recognise that my life as DBA is not so terrible as yours.

    Therefore I am looking to the sun and smiling for ever .


    I don't consider my life bad at all. Every DBA has to cope, occasionally, with people or vendors who use tools in ways they weren't designed to be used. People can be perversely ingenious at times. Occasionally, databases are bad, but life is good.

    quote:


    I understand your problem. In my former company I had to deal with local databases (It was Access 1.1) and the data exchange was just a nightmare. And at that time there was no MSDE, no replication, nothing.


    It's very difficult to disabuse people of the notion that if one residence for a data item is good, then twenty residences for the same data must be better. One of the main ideas behind normalization is: each data item has just one place of residence. The "Truth" lives in only one place. When it lives in two or more places, then which version is the "Truth"? When programmers and DBAs start having to maintain the "Truth" in several places at once, the spinning plates start to fall.

    quote:


    The biggest is a tables over 21 million rows and the size is 15 GB And we have 12000+ empty tables.


    That's on a par with one of ours.

    Edited by - Lee Dise on 01/15/2003 08:18:49 AM

    Edited by - Lee Dise on 01/15/2003 08:27:23 AM

Viewing 15 posts - 76 through 90 (of 184 total)

You must be logged in to reply to this topic. Login to reply