A Broken Quorum and Protecting the Witness

  • The witness

    This is very interesting and I think it's a bit of a flaw in the logic behind it. It does make sense, but it could be a huge problem. Imagine this scenario: you have a principal database running your CRM application. To protect it, you implement database mirroring to a second database at another location. Let's say you have the principal at a co-location facility and the mirror back at the office. Connectivity is good, no bandwidth issues, and you run in the high protection mode, so each transaction immediately moves to the mirror. You have users at the office going through the link and users coming into the data center from the Internet.

    Now you also implement a witness server to help ensure that you have a third server to watch the other two. This server lives at the office for cost reasons, and everything works great. So what happens if you lose your connection to the data center and the principal is cut off from the mirror?

    The principal goes down.

    It's actually an interesting problem because the mirror and witness see each other and form a quorum. Since they can't see the principal, they come up and serve your local clients in the office. The principal can't see either machine and thinks he's down, so he shuts down his mirrored database and your Internet clients can't access the application. If you have the witness with the principal, they keep going, but if there's really a problem with the data center, you can't get the mirror to come up. It seems like some sort of bug, and you might think that supporting more than one witness is the answer, but it's not.

    It's more of a limitation of the whole idea behind mirroring and I'm not sure there's a great solution. No matter where you put the witness you can have issues. I'm actually not even sure how I'd try to make this work if I could redesign it. Microsoft has done a good job of implementing this technology and I'm glad it's there, but you really should understand how it works and the limitations before selling to management as the best high availability solution.

    As much as SQL Server tries to be easier and easier for the average person to manage, you still need the skills to understand what you're doing. I don't view the changes in SQL Server so much as eliminating the need for a DBA as allowing one DBA to manage many more servers.

    By the way, I'm back in Denver as you read this, but if you're interested, check out the blogs. I was trying for some lengthy, real-time blogs on the sessions I attended. Some were quite interesting.

  • About a month ago we found ourselves wanting better availability, preferably with better scalability as well. As you may guess however, the machine & software budget is limited.

    I looked at each of the mssql options with a two machine topology. Mirroring was the first thing to get thrown out since having the witness on either machine means that any failure mode is catastrophic. It just doesn't work with two machines.

    At first I implemented a poor mans log shipping so I could use Express on a second machine as my warm failover. Not a bad solution, but of course it didn't gain us any scalability and didn't give us any automatic failover. I also wanted to be able to take a DB out of production to apply patches or replace drives from the raid array, and you can't really do this seamlessly with log shipping on a continuously updating backend (supporting a website).

    In the end we decided on bidirectional replication and I've been very happy with the results. I just have two Standard SKU installs so I can't use the peer-to-peer UI stuff for setting it up, but the tsql wasn't that bad. I devised a set of queries to add PK's and identity columns to where there were none already, so getting our current DB schema replication ready wasn't too bad. I also set up some simple identity range management so I can track what ranges I've assigned to each server, and easily set the current identity back to what it should be after reinitializing one of my boxes from a backup. Using the 'initialize from backup' option meant I was able to keep my indexed views in use even though they're definitions are not replicated (they're not supported in repl except Enterprise).

    Now we have both boxes in a cluster and I can easily take one out of service to apply patches or fix hardware. It takes me maybe an hour to blow away the database on one machine and get bidirectional replication back up and running from the current DB state on the other server (my DB is only about 500MB, so backup/restore is nice and quick).

    Of course the downside of NT Load Balancing is that the automatic failover only happens if the machine becomes unreachable. It is more common that the machine is still up but I wouldn't be able to connect to the database engine. Fortunately all of our middle tier code goes through one data access class so I just have one place I need to add some automatic failover retry stuff if connections start timing out.

  • "Microsoft has done a good job of implementing this technology"

    Do you really think so? Replication and clustering seems to work better on all other platforms. It's the only drawback of SQL Server. I have a feeling if they could get this right, they would pwn the market in a big way. Maybe I'm out of date, but I've never known mirroring to work right. Of course, the problems you mention here are universal, not just SQL Server problems. It just seems like there should be a better way.

    How about people connect to the witness and the application is broken if it loses a connection? I find it better to be 'down' than to try to stay up and possibly corrupt the database, but I can see cases where that's not an option. Is that the same as clustering though?

    It seems that caching and re-trying any failed transactions would make sense, but I can see how things could get out of sync pretty easily if one server was down for a long time.

  • [OT]

    What would have been really cool (instead of the image of the DVD box from the movie Witness)?

    An image of the *game* The Witness from Infocom.

    http://www.csd.uwo.ca/Infocom/witness.html

    Cheers,

    -m

  • I'm not sure where mirroring doesn't work right. I spoke with a number of consultants and it's worked great for them. Even automatic failover for those clients using the newest connection software. I think having failover of a db with commodity hardware is a good technology and it works well, but the overall architecture just has flaws. It makes sense since it's a hard problem to solve.

    I agree that other platforms have done a better job with clustering. Don't know enough about replication, but I've seen Oracle have problems there as well. I think if they could get a shared nothing/multi-server cluster, they'd really own the market.

  • Hehe... did you see that tagline on that? Totally NSFW...

    Did they misunderstand the meaning of the term Deep Six? Cuz it just sounds dirty that way. What kind of a game was that anyway? Like Clue or something?

  • There are three simple ways you could solve the stated problem.

    The first one is to have the witness at a third location.  I realize that the reason for putting it at the office was cost related, but really, how expensive is it compared to having your database server go down?  Having the witness at a third location ensures that no single location failure can fool the quorum.

    The second option is to have a network connection coming into your office.  The whole point of failover is that the mirror can become the primary and continue serving if the primary goes down.  Now you'd still be in a pickle if your office lost all connectivity, but it does cover off the problem as stated in the editorial without introducing new costs.

    The third option is to have all three servers at one location.  This doesn't provide any safety if the location suffers a connection failure, but it's the cheapest and easiest situation to manage.  In this case, you're just protecting against server-level failures, but those are the most common failure types, and it's the most appropriate solution for small to medium sized companies.

    I don't particularly like the idea of servers deciding for themselves when to go up or down, but if you implement it correctly it will work as advertised.

  • Good points Rick and I definitely picked a place where you could have issues. A third location doesn't help if the primary goes offline from both for some reason. It will still go down, but it definitely reduces the changes.

    Since the witness can be anything, even SQL Server Express, however, you can possibly move to another location with it relatively inexpensively.

  • I didn't realize we can use SQL Server Express for a witness, that's quite good to know!

    If your primary goes offline from both your mirror and your witness, doesn't that mean that it really is down, and your mirror needs to take over?  Presumably you'd have the witness communicating on the same channel as web users, so if it can't see the primary, then it's time to switch to the mirror.

    The one thing I did find really bad about the quorum when I first learned about it is that when you failover to your mirror and it becomes your primary, bringing back up your primary makes it become the mirror.  To make your original primary the primary again, you have to failover your mirror!  ie: A is primary, B is mirror, A fails, B becomes primary, A comes back up, A becomes mirror.  To make A the primary again, you have to failover B.

  • "The one thing I did find really bad about the quorum when I first learned about it is that when you failover to your mirror and it becomes your primary, bringing back up your primary makes it become the mirror.  To make your original primary the primary again, you have to failover your mirror!  ie: A is primary, B is mirror, A fails, B becomes primary, A comes back up, A becomes mirror.  To make A the primary again, you have to failover B."

    You need to failover B in order for A to become primary again.  You don't want A to become primary again, because of longer downtime, clients who are connected to B may experience connection problems if their transaction are pending as they will be rolled back.  Depending on the amount of transactions the failover may take several seconds if not minutes.  If your A and B are similar configuration then you should be fine serving clients from B without any issues for as long as you need.  Once the number of users drops to minimum then failover to original principal.

  • "You don't want A to become primary again, because of longer downtime, clients who are connected to B may experience connection problems if their transaction are pending as they will be rolled back."

    My point was that I wouldn't have designed it that the only way to change primaries was a failover.  Why should the primary need to go down to force the switch, couldn't you just manually say to A "Please finish applying the current transaction logs, then take over as primary" without taking down B?  No rollbacks, just business as usual, with the only delay being in the lag of applying the logs (basically the mirroring lagtime.)  It's the same behaviour as a manual failover, I just don't think you should have to drop a server out of the quorum to force that behaviour.

    Actually, that brings up a second point.  If I have quorum of 4 servers, primary A, mirrors B & C, and a witness D, do I have to failover A & B if I decide I want to make C the primary?  I assume the quorum decides on it's own which of the mirrors becomes the new primary if it has more than one option?  That leads back to your other point:

    "If your A and B are similar configuration then you should be fine serving clients from B without any issues for as long as you need."

    What if B is just your backup?  Most of us can't afford two powerful servers, the mirror B is often built with reliability as the primary trait, not speed.  Do I want to keep serving all my clients from B all day if A's only problem was that someone tripped on the power cord?

Viewing 11 posts - 1 through 10 (of 10 total)

You must be logged in to reply to this topic. Login to reply