Lookup Table Madness

  • OK, who has the time to design, code and test a database that needs 300+ lookup tables within the time that management have been promised the finished application

    I haven't had any problems with my ID, Type, Description design.  Yes the purist approach would be to have individual tables but you cut your cloth to suit your pocket.

    I'm still not clear in my own mind what the problems are with my design other than as an academic discussion.  Its like being back at college.  Do you give an answer that will work or a suitably brown nosed answer?

    When I use my ID, Type, Description design (and I don't always use it) Type 0 is always the list of available lookups.

    i.e.

    1 , 0 , Age Band

    2 , 0 , Geodemographics

    3 , 0 , Income Band

     

  • This is NOT a purist approach; it is fundemental!

    When you say 2 + 2 = 4, how do you know?  Not just because you've memorized it but how do you really know?

    That is the difference between primary, secondary, and post-secondary education: on the former you remember, the latter you learn.  In advanced mathematics class, Modern Algebra, Real Analysis, Theory of Functions, etc., you learn why you can express the statements that you do and have PROOF!

    Likewise, the Relational Theory is based on logic and PROOF!  It works because it has been proven.  There are alternative algebras because they too have been proven; however, there have been NO alternative proofs to the Relational Model of Data for data management.  So, when you deviate from the relational design, how do you gaurantee correctness if you have no proof that your "alternative" will work?

    Jamie and David, that is why you do not take shortcuts, because there is no proof that you will always be gauranteed correctness.  And correctness of data, data quality, is THE ONLY REASON for the existence of the database.  We work on performance at the physical level and at the hardware level to optimize it only AFTER the correctness of the design has been established.

    Thank you.

     

  • I am still in the dark as to why my MUCK table with a single field primary key can break referential integrity.

    If I've got a contacts table with an AgeBandId and I do an INNER JOIN to my MUCKTABLE.Id then it works, it has no possibility of failure.

    1 MUCKTABLE entry can be referenced by MANY contacts.

    If you are arguing that a developer might get confused as to what type field is correct for entries in the MUCKTABLE for AgeBand then you are going to get the same confusion debating which individual lookup tables are correct.

    Forget the specific example of AgeBand.  When you many hundreds of tables the naming of these tables becomes an issue.  I seem to remember that SAP actually has numeric names for its tables!

  • Very well said!

    I might add that the practical result of following the scientifically sound approach (the relational model) is that most of the complexity should be handled at the database level and thus the amount of code that you have to write is greatly reduced.  The problem I see over and over again is that people think that its "just too much work" to deal with the complexities of real world systems upfront and thus get the data model right before the coding effort begins.  As a result they start coding and the database design grows out of the coding requirements rather than the business requirements.  They start to use all sorts of poor database design practices and basically say "we'll write the code to deal with it."  Either way you must tackle the complexityies.  There are two fundamental problems with the "code" approach:

    1.  I have never seen, even in very simple systems, the code written in such a manner that it can prevent bad data.  And since logical errors are the hardest kind of errror to catch and fix they most often go undiagnosed until its too late.  How many times have you heard something like "we know we have bad data, but we'll just have to make the best of it."?  Untold man-hours and millions of dollars are spent in efforts to clean up data when it would have been immeasurably cheaper to prevent the bad data in the first place.

    2.  It destroys data independence.  Data means nothing outside of the rules in which it is created and maintained.  I might add that data which is created and maintained in the absence of consistent rules is also meaningless.  Application code comes and goes, data stays.  If your data quality is dependent on application code...well I hope I don't need to elaborate further.

    Last note: It is true that the current SQL DBMS products available allow for only a minimal set of declarative integrity constraints, but this is not a failing of the relational model itself as is sometimes claimed.  Rather it is a failing of those products, and of the SQL standard itself.  What is the solution?  More database professionals need to be educated to the real issues and put pressure on the DBMS vendors to more fully support the relational model and stop wasting time with crap like XML.  In the meantime those who actually understand data management will continue to learn how to do the best we can with what is available now.

    /*****************

    If most people are not willing to see the difficulty, this is mainly because, consciously or unconsciously, they assume that it will be they who will settle these questions for the others, and because they are convinced of their own capacity to do this. -Friedrich August von Hayek

    *****************/

  • You are correct in saying that the JOIN to a MUCK table works, but what you are overlooking is that the result of the JOIN (or any other query) is only as good as the data that resides in the table(s) queried.  The problem with MUCKs and other non-normalized tables is that they make enforcement of data integrity much more difficult than it should be.  Since you can't possibly enforce even domain (data type) constraints how can you possibly hope to enforce those that are more complex?  You can't, as I showed in the article.  You mention that your database would have around 300 "lookup tables" if you didn't use a MUCK, how in the world do you maintain any sort of integrity across 300 different types in a single table?  If you manage to pull it off, I'd hate to be in charge of maintaining those triggers or application code!

    At this point many folks say "well, my applicaton code won't allow an employee code to be applied to a customer" or something like that, but at this point you are essentially going back to the days of application managed data files.  You have also abandoned any data independence and most likely your code won't adequately deal with every situation either so you have also kissed data quality good bye.

    /*****************

    If most people are not willing to see the difficulty, this is mainly because, consciously or unconsciously, they assume that it will be they who will settle these questions for the others, and because they are convinced of their own capacity to do this. -Friedrich August von Hayek

    *****************/

  • It seems a lot of this debate is about "ease of use".  People are challenged when faced with 300 tables.  However, I've found that, if you learn how to work with metadata, maintaining 300 tables of the same structure is not that hard.   I've had great sucess with compiling proc code based on data in tables. 

    For example, say you have 300 files (or 1 file with 300 different types of auxilary data) to import into the database.  Structure being the same (as this article assumes), build a "MUCK" table and import all the data into it. 

    Then build a table that defines the table name for each auxilary type. Let's call it METAAuxTableName.  Loop through METAAuxTableName and use dynamic SQL to build a table for each auxilary type*.  In the same loop, create dynamic SQL for each row, creating insert/update SQL to move data from the MUCK into the new table**.  Then execute the Create Table SQL, then the Insert/Update SQL.

    Piece of cake, and uou can maintain referential integrity when referencing these tables.  The MUCK table is now just a staging table, and isn't referenced.  The newly created and populated tables are what are referenced.

    BTW, the METAAuxTableName could also be used to automatically build the "Create View" SQL for a unified view of all 300 tables.  Of course, DON'T use this view in Joins...reference the underlying tables directly (for performance reasons).

    *If this is an reoccurring process, you can use a bit flag in METAAuxTableName and set it so you only build this table once.

    **If this is an reoccurring process, only build the dynamic SQL once, then compile it in a proc.  If you store this proc name in METAAuxTableName, you'll know to only create the proc the 1st time; after that just execute it.

    cl

    Signature is NULL

  • I haven't ever had to maintain 300 lookup tables, but in an app that required around 150, each with between maybe 5 and 20 rows, it was pretty easy. The tables were all scripted along with INSERTs and the scripts were kept in Visual Source Safe. Any data updates were typed into the appropriate script. Come release time, all of the scripts were concatenated and run. Done. Of course, this process was started when there were only maybe 10 tables, so it was all incremental. Starting such a process after there are already lots of tables seems like a much bigger chore to the person who's faced with doing it, I suppose.

    --
    Adam Machanic
    whoisactive

  • Don,

    In your example the Id field is an IDENTITY field and therefore there is no way that EmployeeTypeId can return an entry for CompanyStatus or whatever.

    If you are talking about a self defined code, such as SIC number, or a single character key such as Marital Status then I can see the data integrity issues.

    I like your employee status vacant!  Ye Gods that rings a bell

    In the example within your article your Primary Key utilizes both your ID and type field, and so does your Foreign key.

    Using this method it is possible for a developer to SET IDENTITY_INSERT ON and force a duff Id entry, thus breaking your data integrity.

    In my tables my Primary Key is simply the ID field and I have a separate index on my type field.  If I may define a covering index over Type and ID in that order but I'm more likely to stick with just Type.

    For joins the MUCK table is retrieved by a INDEX SEEK.

    For returning the entire collection of entries for a particular type an INDEX SCAN is used.

    I have done this, as I say, because he deadlines in which my project was promised did not even allow a proper tech spec to be written, much less 300+ individual lookup tables.

    You would be correct in saying that having individual lookup tables would ultimately be more scaleable, but as I said, the time to develop everything that goes with maintaining 300+ tables would be excessive.

  • Its is great trying to maintain hundreds of tables when the tables have meaningless names.  Anyone care to guess what A35020 holds?

  • Whether the key is "natural" or a "surrogate" like IDENTITY makes no difference to this particular subject.  Yes it is always possible for someone who has direct INSERT/UPDATE access to the database to screw up your data.  This is true regardless of whether or not the key is an IDENTITY column.  Even with all the CHECK, Data type, or referential constraints done properly, the best one can do is to ensure that the data in a given table is at least plausible given the set of business rules that those constraints represent.  There is no way to prevent users from asserting facts (rows in a table) that aren't true.  (maybe the subject of my next article...)

    I was NOT arguing that you should use the CodeID and CodeType method.  That was provided as one of the various means available to try to constrain the MUCK and references to it.  The primary reason you should not use this approach is that your tables cannot possibly be properly normalized with it.

    /*****************

    If most people are not willing to see the difficulty, this is mainly because, consciously or unconsciously, they assume that it will be they who will settle these questions for the others, and because they are convinced of their own capacity to do this. -Friedrich August von Hayek

    *****************/

  • I do not know why 300 tables has been the magic number in this discussion; it is irrelevant.  How do you design, create, and maintain, say, 3000 tables?  Answer: the EXACT same way you did the first!  You use the same rules that guided you in the first extraction.  You follow the relational principles.  The number only matters as a function of scale.  If the first is not done right, how can you expect the Nth one to even come close?

    What hyperbole!  Are some really trying to tell me that every project they are put on requires the use of such complexity?  God, I hope not.  If the simplest of designs can not be managed, I would hate to be the recipient of any application they have created with even a modicum of complexity behind it...oh, wait a minute, I get the pleasure of working with those lovely, hapless, vended, abominable database designs every day.

    Maybe some day, enough knowledgeable DBA's, in forums like this, at conferences like SQL PASS, will get the message across...but I doubt it. 

    Forums like these also show how eager some are to jump on expediency in the guise of "elegant" solutions.

    I certainly hope they are blessed with the management and maintenance of such applications and database designs for years to come...maybe they will retire and let some real designers have shot at it.

     

  • If you have 3,000 or more tables then having a meaningful naming scheme for your tables becomes VERY important, otherwise your developers get lost in the mire.  This was one of my reasons for having a global lookup table.

    I hear you about the Vendor databases from hell!

    There are a few where someone has come up with a bad MS Access prototype, run the upsizing wizard and sold the resulting database as a £xx,xxxx product!

    There are others where the sales spiel is for a cross platform, cross database product and therefore the database is neither fish nor fowl.

    There are a few where the design is too clever for its own good.

  • Yes, we have one vended app that CAN NOT function--and I mean it will not run--unless you explicilty TURN OFF the auto create and auto update statistics.  I guess they think they can do a better job than the engine.  Funny thing is, it is one of the worse performing systems we have in our shop.

    Anyway, clearly 300 tables, much less than 3000, is clearly an exaggeration...not that there aren't a few applications that will require this many.  Nevertheless, I maintain that you handle the quantity exactly how you would handle 10 or 20.

    Everything starts with the Business Model--and I don't mean some hypothetical construct; I mean a real, on paper (or electronic) model--from which you derive both the logical and physical database models.  You manage the names and association of tables with the models...NOT WITH SQLEM (or you could at least use the database diagrams within SQLEM).  The point is that the models can be reduced to subject areas so you can deal with a 10 - 20 table list related to some subsetted functionality.

    You can't tell me that you wouldn't do something similar with the code.  "Oh, my!  How am I going to manage the naming of 1,000, 10,000, 1,000,000 functions, subprocedures, and classes in this application?"  It is more likely that you will have quantity within the code than within the database--It's just easier to see in the database.

    The management of the project should never dictate the form of the design.

    Sincerely,

    Anthony L. Thomas

     

  • David,

    Enlighten us as to the contents of the mysterious A35020 table.

    As for naming schemes, I personally think the common-sense scheme is best: Name the table based on whatever it contains!

    --
    Adam Machanic
    whoisactive

  • In the particular vendor database (I stress vendor) it held a list of countries.

    The problem you face with a database with thousands of tables are

    • Comming up with a meaningful, common-sense name whilst keeping the name reasonably short.
    • If you have a multi-thousand table database then you are probably talking about an application with international appeal, in which case which language should you use for your naming scheme?

     

Viewing 15 posts - 46 through 60 (of 202 total)

You must be logged in to reply to this topic. Login to reply