I am not a Schema

  • I've started adopting schemas, and have migrated a system that originally was designed with multiple databases to serve a single end data warehouse into schemas for two reasons:

    1. DRI across schemas - before schemas, i had the ugly option of a trigger, or would even sync multiple copies of the same data between databases within the same server to provide a base in each db to DRI against. If a deletion or update in one, or in a master set, conflicted, it would be discovered at sync time rather than at the time of the change itself. Generally, in an OLTP application avoided the architecture that requires DRI to cross schemas. In the OLAP-oriented solution, it has worked well to have a single schema containing measure metadata, to provide validation for user entries in another schema, and provide a base for star schema tables in another schema.

    2. Database objects and scripts can be written polymorphically - in a multi-tenant architecture, multiple databases required any views, sprocs, user-defined functions, or ad hoc scripts to reference the databases they touched (i.e. "SELECT * FROM {Customer}SourceData.dbo.RawData" with the {Customer} parameter having to precisely and manually be filled in...each... and...every... time a script was ran... and carefully too, to avoid accidental cross-polinations. With schemas, the SQL becomes "SELECT * FROM SourceData.RawData", and can be deployed across customer databases with no changes, and no risk of accidental cross-polination of data.

  • I believe that referential integrity constraints are important in a database. If you have data models for two or three tightly integrated applications with overlapping foreign keys and sharing of reference tables (ex: Customer, Billing, Workflow, Inventory), it's great to now have the options of containing them all in the same database, creating all appropruiate constraints and backing them up as a total unit, while still keep the tables logically segregated using schemas.

    "Do not seek to follow in the footsteps of the wise. Instead, seek what they sought." - Matsuo Basho

  • I do not like different schemas within the same database, since they break the ownership chain. Joining different schemas in a proc forces me to grant permssions on the underlying objects rather than just on the proc.

  • Henry_Lee (5/10/2010)


    I do not like different schemas within the same database, since they break the ownership chain. Joining different schemas in a proc forces me to grant permissions on the underlying objects rather than just on the proc.

    Unless you're using SQL 2000 (where we really shouldn't be talking about using Schemas as they aren't fully implemented) I have to disagree with your assertion.

    When you use schemas for logical separation in SQL Server 2005+ you don't have to have different owners on the schemas. If both schemas are owned by DBO then you get ownership chaining just like if all your SPROCs were in the same schema, but you can organize your tables into logical groups.

    Personally I like to put all of my tables in a set of schemas and my SPROCs in a different schema (all owned by dbo). Then I just GRANT EXECUTE ON SCHEMA::MySchema and voila, no permissions chaining problems, execute permissions easily granted, and I don't have to worry about permissions leaking for all of the system SPROCs in DBO.

    -DW

  • If both schemas are owned by DBO then you get ownership chaining just like if all your SPROCs were in the same schema, but you can organize your tables into logical groups.

    Hmmm...pretty poor on my part...I flat out missed this. Looks like I have some cleaning up to do this week.

    Thanks for the correction, Darren.

  • eric_russell 13013 (5/10/2010)


    I believe that referential integrity constraints are important in a database. If you have data models for two or three tightly integrated applications with overlapping foreign keys and sharing of reference tables (ex: Customer, Billing, Workflow, Inventory), it's great to now have the options of containing them all in the same database, creating all appropruiate constraints and backing them up as a total unit, while still keep the tables logically segregated using schemas.

    I'm not sure I follow. I can grant execute permission on a stored procedure, containing a cross schema join, to a public user without granting select on the underlying tables.

    use AdventureWorks;

    exec [dbo].[uspGetManagerEmployees] 3;

    go

    ManagerIDManagerFirstNameManagerLastName

    3RobertoTamburello

    158DylanMiller

    263OvidiuCracium

    select top 1 * from [HumanResources].[Employee];

    select top 1 * from [Person].[Contact];

    go

    The SELECT permission was denied on the object 'Employee', database 'AdventureWorks', schema 'HumanResources'.

    The SELECT permission was denied on the object 'Contact', database 'AdventureWorks', schema 'Person'.

    "Do not seek to follow in the footsteps of the wise. Instead, seek what they sought." - Matsuo Basho

  • Using schemas in SQL 2005 definitely makes my job easier in developing the database used for our Portal system. Being able to logically separate objects using schemas to match the various applications I build for the Portal makes it easier to organise and keep track of.

    I think using the built-in DRI within a database is also more robust than trying to develop triggers for cross-database integrity, which will be error prone depending on the skill of the developer.

    With all considered, having schemas be more powerful and flexible just gives us all more choice over how we structure our databases, etc, which I think in this instance can only be a good thing!

    🙂

  • How would you handle best practices as far as database file structure if you merged databases to one and used multiple Schemas? For instance if I put 2 databases together and split them into 2 schemas...would it be proficient to create 2 file groups and assign each 1 to each file group? Or at least 2 files within the same file group?

    Link to my blog http://notyelf.com/

  • shannonjk (5/12/2010)


    How would you handle best practices as far as database file structure if you merged databases to one and used multiple Schemas? For instance if I put 2 databases together and split them into 2 schemas...would it be proficient to create 2 file groups and assign each 1 to each file group? Or at least 2 files within the same file group?

    A general best practice is to put your tempdb and transaction logs on seperate drives from your user data files. You almost can't go wrong there, regardless of the application. However, I wouldn't consider splitting schemas into seperate file groups really a best practice, but rather a physical architecture decision depending on the data usage patterns of your specific application.

    "Do not seek to follow in the footsteps of the wise. Instead, seek what they sought." - Matsuo Basho

  • Yes I do that with the tempdb and logs already :-D.

    I was just curious if there was any sort of intuitive file/filegroup structure one should follow if designing a complex database schema system.

    Link to my blog http://notyelf.com/

  • My rule of thumb is to break it out as follows(minimum) as a starting point

    Data

    Indexes for Data

    Read Only Data

    Indexes for Read Only Data

    "Read Only" can encompass seldom updated data as well. You can also go further and create files/groups for High IO data and indexes as well. If you have large tables that are commonly joined you can separate them also. You can also break up tables containing large amounts of varchar data into separate tables and put them into different files/groups as well.

  • shannonjk (5/12/2010)


    ...

    I was just curious if there was any sort of intuitive file/filegroup structure one should follow if designing a complex database schema system.

    This MSDN article describes some of the design considerations regarding multple database files and a list of general recommendations. For example, would it be beneficial to perform seperate file group backups?

    Using Files and Filegroups

    http://msdn.microsoft.com/en-us/library/ms187087.aspx

    One interesting experiment would be splitting a large primary key table and it's foreign key tables on seperate disks, or splitting indexes belonging to a large table into a file group on another disk.

    "Do not seek to follow in the footsteps of the wise. Instead, seek what they sought." - Matsuo Basho

  • I can see the reason for keeping databases segregated when you have a number of databases that each services an individual client and the databases are of high visibility to those clients. If something goes wrong with 1 database you only have 1 client calling, but if all the clients shared 1 database… Well, I wouldn’t want to be in that situation.

  • Wow, great question. First, before I give my input, a little background on myself might help:

    My background is primarily in SQL Server 2000+ data warehousing (both relational and "cube"/dimensional). About 12 years experience mostly as what I would call a "Senior level database programming." I have some experience in Oracle 8, 9i, 10i, and IBM DB2 UDB v6 and v8 from prior employers as well. Also with Informix, DB2 (mainframe), Pervasive, IDMS (loved that "non-SQL" hierarchical data model !!), VSAM, yadda yadda, etc. I believe that in Oracle, a "schema" is the highest container level, with a schema "containing" databases and users, while in SQL Server a schema is more like an "owner," the SQL instance itself containing the databases. Regardless, in both instances, schemas are fundamentally nothing more than containers.

    So for many years lately I worked in an environment where we (I, keep reading) created lots of databases for arranging and organizing stuff based on functional, business area. Worked great. On a single server we'd have daily, or log shipped "restored" images of operational systems as databases, databases that were nothing more than a set of views onto those "restored" images (as an abstraction layer), "staging" databases that would have staging tables and code related to that functional area and any ETL, a "utility" database that would contain code to copy tables, drop tables, routines for logging, enable/disable jobs, analyze tables, create indices based on a config table, move data from server to server via linked servers, etc., "reporting" databases that would contain reporting procs, etc., and finally "consumption" databases that users would hit directly that would contain either final denormalized "subject matter" based tables and/or "operational data store" based tables that was simply smaller data"marts" using several source systems as their input.

    I also was the senior DBA for this same company "lately" (a small, privately held company with more than 1 billion dollars as income and with 2 dbas lol, me and one other "mid level" lady). Now, we've been purchased (acquired) by a public company with the same basic size but lots more "red tape" lol.... compliance, policies, etc out the ying yang lol.

    Another note... about 2 years ago I moved laterally from DBA to "programmer III" doing mostly database programming for warehouses and lots of ETL/SSIS/dts conversion, etc.

    Anyway, let's cut to the chase... This new company has chosen to use our primary system (of record) as the company wide best practice, so I'm on a project to incorporate the data from this system into the existing datamart of the "acquiring" company. Good for me since I've worked with this data for several yeas, I'm in managed health care by the way (a Medicare Advantage provider).

    So this existing datamart uses MANY "schemas" (and CTEs but don't even get me started there). At first glance, the system looks deceptively simple. A staging db, a "build" db, and then a consumption DB which is just views into the "build" db. That's it. 3 dbs on each server (dev, uat, prod). It's the first time I've worked outside of "dbo" land.

    While I understand the reason for doing it this way, and I've since caught up on the improvements and reasons why one would build a datamart this way, my first reaction was simply, why? I mean, 99% of us just stick to dbo and break thing out either with different dbs or by simply naming things well (a good name goes a LONG LONG way). My first gut reaction was simply of unneeded complexity. Sure there are some nice things you can do with schemas (esp permissions) but again, why? Properly diviided up databases give us nice separation of roles and therefore permissions required.

    The more I work with this, the more I realize the developers who wrote this corporate datamart were the kind of people that, when SQL Server 2005 came out, sat down with the "what's new in SQL Server 2005! document" and simply tried to use all the latest and greatest features for no particular reason except for experience and their resume. Take CTEs for example. They sound great right? But these folks used them for EVERYTHING. Personally, I think they are good for maybe a small subset of occasions but otherwise make code look MORE complicated than it needs to be, and, more importantly, goes against the grain of what 99% of current development looks like. So what happened? They tried to recently upgrade their warehouses to SQL 2008 (before the acquisition) and nothing would run because apparently performance stinks for CTEs in SQL 2008 (or 2008 R2), or there is some bug or difference with the way SQL 2008 (R2) generates query plans for CTEs, and remember we are talking about a datamart working with tens of millions of rows and tables that are dozens of gb of data. So here's another great example of why to NOT use the "latest and greatest." Microsoft's official solution: modify all code to use join hints (I'm guessing hash or merge) lol. We all know that join hints work great but are a LAST RESORT way to fix something !

    I know this has little to do with the topic at hand. But my point is, all things being about the same, I think it's always best to do things the way most everyone else does it. Schemas are a db concept that fall into this bucket. I find them intriguing and interesting but again they fall into this bucket of "why?" Just be consistent, don't be clever. Don't try to show off how "good you are" by doing things differently. Show how good you are with consistent deliverables and with code that can be easily maintained by any level of contractor your company might hire.

    We all love technology and we all love database engines (otherwise you wouldn't be reading this thread right now), but don't fall in love with things like schemas. Find some other technology to fall in love with 🙂

  • @Jeffery:

    Microsoft's official solution: modify all code to use join hints (I'm guessing hash or merge)

    Do you have any source available for that statement?



    Lutz
    A pessimist is an optimist with experience.

    How to get fast answers to your question[/url]
    How to post performance related questions[/url]
    Links for Tally Table [/url] , Cross Tabs [/url] and Dynamic Cross Tabs [/url], Delimited Split Function[/url]

Viewing 15 posts - 16 through 30 (of 31 total)

You must be logged in to reply to this topic. Login to reply