Preferred RAID

  • DBA_Rob...wasn't that our disks weren't managed well, replacements were put in very quickly (and yes we had hot-spares too), the issue was more to do with the h/w guys having a batch of drives from the vendor, which had a manufacturing problem. The drives from the same batch started failing and we lost two before the hot-spare had re-constructed what the failed drive had. We even attempted to back up the tlog before it all went pear-shaped. Where possible, its always recommended that drives should be of mixed manufacture batches, but I doubt if this truely happens much.

    With the point about never lost a stripe, never had a double drive failure...I remember at a seminar given by Kimberly Tripp, when she gave some sound advice - never say 'never,' because one day it may come back to haunt you.

    At the end of the day, some people will never ever suffer a failure during their career, some people will suffer many, and some people will suffer an odd few. Its all about risk angainst cost, and some of the risk can be as high as the company going out-of-business, or costing it a small (or large) fortune because of the downtime. You just need to be aware of it and weigh up the odds and the cost of these problems against the cost of the hardware - at the end of the day RAID is a means of insurance, and it depends what sort of cover you want and what you're prepared to pay bearing in mind the risk.

     

  • With direct attached SCSI we've always tried to put tempdb, logs and data on three seperate RAID sets each on their own SCSI channel to balance the I/O as much as possible.  For very high volume implementations we've tried to move the index data to a dedicated RAID set/SCSI channel as well.  The choice for logs and tempdb was always RAID 1 theorizing that it provides a good balance of read/write performance.  For data it was always RAID 5 favoring read over write.  Indexes - the verdict is out on this one as to whether RAID 1 or 5 would be better.

    What I haven't seen written about much at all is what impact, if any, SAS technology has on our choice of topologies.  With each drive having a dedicated I/O path it is unclear to me as to how we might truly optimize the I/O across multiple RAID sets.  My hope is that the overall bandwidth of each SAS channel would support multiple RAID sets on the channel without significant blocking.  That said, with our latest SAS implementations we've been using RAID 10.

    Anyone with comments on SAS please chime in.

  • Small company, cost limitations, I go RAID 5 with a hot spare. No way we can afford a SAN or RAID 10.

  • Hi Chris...

    Can't argue on the DAS setup you put forward. For tlogs you always want to go for the best possible write performance, as thats what tlogs are going to spend a lot of their time doing (most important in OLTP), and to isolate if necessary to get the most out of sequential IO. Of course with RAID 5 if the balance of writes is greater than 10% (SQL Server Admin Companion, plus many other articles) then you will see a trail off in write performance as against RAID 10. This would pretty much equate the same for Indexes, as writes increase on indexed columns then performance would tail off as against RAID 10. Agree that it is a bit more work to calculate, but it is still possible.

    With regard to SAS, we have just put in our 1st MD1000's with this. So far it looks promising, but haven't got the volumes of data to compare against baselines yet. Hopefully it won't be too long. Have seen a link to an article about potential problems with SAS throughput, but haven't had time to look at it. If I can find the link, I'll post it here for you.

    Of course, I should balance out by saying that though I personally would prefer not to use RAID 5 I recognise that there are times when you have no choice or that it is acceptable because the DB is not so critical - this might be some types of DSS sytems, a development box, or some type of test box where performance is not part of the test criteria. On the whole, though, if its a production box and the DB is critical to the business then I will do my utmost to steer away from RAID 5.

  • Our server standards are as follows:

    Internal RAID 1 array (dual 36 Gb 15K rpm disks) with battery backed controller - 3 logical drives.

    • C: for the OS, pagefile
    • E: system logs, SQL errorlogs, SQL Agent logs and maintenance plan logs
    • F: SQL Server executables

    Database disk is SAN based. Dual fiber cards. Most are 2 Gb but we are upgrading to 4 Gb. Each fiber card is dual channeled and dual pathed connected to dual storage processors on the SAN and lots of cache (I forget either 8 or 16 Gb).

    • G: Database Data - Meta-LUN striped across 3 LUNs - each LUN is a 5 disk RAID 5 array of 72 Gb 10K rpm disks - each group of 3 RAID 5 arrays has one hot-spare.
    • H: Transaction Logs - Meta-LUN striped across 3 LUNs - each LUN is a 5 disk RAID 5 array of 72 Gb 15K rpm disks - each group of 3 RAID 5 arrays has one hot-spare.
    • I: Backups - Meta-LUN striped across 3 LUNs - each LUN is a 5 disk RAID 5 array of 300 Gb 7200 rpm SATA disks - each group of 3 RAID 5 arrays has one hot-spare.

    We are also testing database backups going to NAS storage arrays similar to the SAN arrays with the intent of performing Primary site to DR site backup file replication with a target delay time at the DR site of 15 minutes or less.

    RegardsRudy KomacsarSenior Database Administrator"Ave Caesar! - Morituri te salutamus."

  • Here's my skinny on RAID5 and then DR. Other then performance gains with RAID10 or some other exoitic flavor of RAID I don't feel one is better then the other.  Why?  Because there's a hell of a lot more that can go wrong with one server. 

    Look I use IBM servers, x346's and now the x3650's.  If it's a critical server it gets dual power supplies and a 2nd set of fans.  I even have one system with a backup set of RAM (spairing) in case the memory or the controller for that bank went bad.  But OS's and hardware can take a dump regardless of the disk.  So what to I do?  I replicate SQL on a block level to another box that I can use as a backup.  I have a mission critical SQL server.  I use 2 software packages.  1. BACKUPEXEC 11d with CPS.  I use CPS to replicate at a block level the SQL databases and log files to a 2nd standby server that has some minor other pourposes that SQL wouldn't pound on if it was needed.  SQL is installed but not active.  (you don't even need to buy a Licence for that server because it's a backup and you wouldn't be using it unless your production was down)  This is a stance you could be doing with all your servers.  Replicate all of them to one cold spare system and be ready to attach the DB's.  USE DNS to point apps to a SQL server and you could easily redirect users to the new server and instance while there's still downtime it's mins not hours.  THen I use a product called Double Take to do the same thing except it's to a reporting server.  It replicated the same databases in just about the same way to this other server.  Excpept it lets me actually use (READ only) the data on the other side.  Here's how.  It can then take a snapshot of the replicated target and present the snapshot to the OS and let SQL use the snapshot data.  SO you would be replcaiting the data and you can use the software to snapshot the data at a point in time.  Present it to the os and slq. attach the data and start using it for reporting.  Then when you want a fresh set of data. You take a new snaphot and start over.  The replication doesn't stop.  I thought this was pretty cool.  You could do both methods I described to the same server but there's another reason not to do that.

    Double Take has a DR peice.  IT's still not easy to use with people using named instances of SQL yet.  (there's simple app controller doesn't support anything other then the default instance for easy setup yet)  You can use their full "hard to use" app to do it thought.  Anyway the way it works is if the source goes down it can bring up the target automaticly if you want.  I makes the target look like the source on the network and users can start to pick up where they left off.

    ***************** OK back to disks and raid,

    Raid 5 should always be done with one hot spare.  Also age of your disks become an issue.  I actually try and refresh a disk from an array once a year.  (what you say?  actually break the raid on pourpose?  Yes.  why?  because if your going to keep a system out past it's 3 year warrentee you should be a little proactive.  Disks start to fail around the same time)  I'm not talking about Bad disks that go early in their life.  Usually they fail within the same window.  Older they are the more likely they will fail.  So what are you to do.   Start with 3 drives lets say and build your array.  If your like me you most likely get your parts well before you go live.  So get the 3 drives early then add the Hot spare as late as possible.  Maybe 6 months if you can.  Then say a year later get another drive and swap out on of the original 3.  You can do this... put in the new drive and make it a hot spare too.  Mark one of the 3 lives bad and force the failover.  When it's done pull the original and reuse it as a cold spare for another system.  Do this one a year removing the oldest disk.  You could make the oldest disk a Hot spare by failing it to the hot spare beforehand as you go along.  After 3 years you've got newer disks.  If you on the cheap.  Use your parts warrentee and lie saying you have a bad disk and to send you a replacement.  Do the swap and you might be in a better shape later when the system is old (should be retired) but your still running it because you have too. Not as good as buying new disks because they are often older disks and not disks that were recently manufactured. 

     

    Any thoughts?  I'm not arugeing the benifits of Raid 10 or others on performance.  But Raid 5 is the most cost effective.  But it costs money to do DR.  I would rather spread it around to cover more then disk failure.  A VRM or RAID card can go too.

  • We are a small company. After a damaging Raid 5 failure involving our SQL dB, we now only do Raid 1.

    K.I.S.S.

    I do keep a spare drive or two on hand for the critical servers, and yes, it's in a cabinet.

    Another point of failure I learned the hard way is the controller. Last year, one of my other servers, the Perc2 (yes its old) failed and took out both the Raid 1 drives with it. Complete loss of all data and programs. When it went, it was the same as putting a bullet through the drives.

    Drive redundancy isn't the only thing you need to plan for.

    Question:

    Along with your Raid choices, what brand of drives do you prefer?

    I vote Seagate.

    Even with one failure over the years, compared to the others I've tried in PC's, it's the only choice for servers.  I'm just now replacing a 6 month old 300gig Samsung under warranty.

     

     

  • I let IBM pick the drive.  That's what their for.  So unless your buying Dell servers. IE cheap peices of crap that use the bottom dollar drive available at the time you should be golden.  I'm not a white box person.  I don't think that's a good use of manpower in a company.  Unless your the OEM makeing the boxes.  I like IBM over HP and unless you need SUN I don't think there are any other server makers out their.  I would rather be running a server that says compaq on the side then a dell.  I would buy dell desktops by the boatload cus there cheap but nay never no more on being responcable for a dell server.  I won't even use an appliance if it's on dell hardware.  No way.

  • Funny story about Dell.

    I use all Dell servers.

    One day, I wanted to upgrade my drives on one Server.  I took the drive out to get the information off it and discovered it was an IBM branded drive!

    A little further work reviled that the drive was really a Seagate drive.

    Dell/IBM price -much too much.

    Seagate replacement:  affordable

    It has worked fine ever since.

    All my Dells have Seagate drives.

     

     

  • I prefer to see what my I/O requirement to judge raid requirement...here are my rules

    Rule 1: OS and data Log always remain on Raid1 or if having luxury will go for Raid 1+0

    Rule 2: Data may resides on Raid 1+0 or Raid 5 as long as there is not much I/O wait time

    Rule 3: Tempdb can be on Raid 5 as long as temp db is not used extensively by application (which is normally the case except DWH)

    Rule 4: Raid partitioning is based on Raid controllar battery backup RAM option and controllar read/write capacity ideally sud have one Raid partition as to get optimal usage...

    Cheers...

    Prakash Heda
    Lead DBA Team - www.sqlfeatures.com
    Video sessions on Performance Tuning and SQL 2012 HA

  • First of all, I love your Editorials, Steve, cause you always bring on some interesting perspectives.

    Recently I've been talking with ours system administrators colleagues in my job about what do they think regarding RAID implementations.

    I'm not so old, but I've been around a little a while, which has given me the opportunity to be involved in the server admin area in multiple ways, as System Administrator, DB Administrator, and even EMC-SAN Administrator, all of them simultaneously. -Does it sound familiar to anyone of you?-

    But I wanted to know the opinions of some others professionals cause I am in a new job right now, and you must listen different opinions about technology issues, cause anybody's experience is authoritative. The first question to my fellow techs started with the point about RAID-5. My point was that it's not as reliable as it was promoted according to my previous experience. But it result that none of them had had enough experience with RAID-5 configurations and my survey wasn't as rich as I expected to be.

    Suddenly I became a sort of Icabot Crane in my job, cause a few days after that conversation severals RAID-5 configured SQL Servers have just crashed out in a period of a few months. Everybody who didn't have an specific opinion or didn't agree with me about my perceptions to RAID-5 when I started the conversations, now look at me like saying: "Oh, that's what you were trying to tell us." 

    The SAN is always a good perspective no matter the kind of RAID, because it gives you a 2nd. controller and you have a wider range of configuration with an additional controller.

    Generally speaking I prefer implement RAID-10 everytime the budget supports it. RAID-1 is the one that easily assures your paycheck, RAID-0 is not usually an option even though a few years ago I was administering a DWH and the solution provider forced us to implement RAID-0. They even signed an agreement of no-responsability to the Technical Support Department in the case of a fail in the RAID. They preferred the downtime, reinstall the OS, restore the 300Gb. database to the last backup available, and run the ETL processes again (in the case of a RAID failure); than to deal with the problem of a little bit slower response time but most reliable configuration of a RAID-1 solution. It's incredible what people do to sale a sofware these days, don't you think?

    WELLINGTON,

    mcdba, mcts

  • All of the instances, where I am now, are IBM boxes with a 3x-drive/Raid-5 config that are logically partitioned into the C:- and D:- drives.  I hate the config.  Yuck!

    At present, one of the most important applications is resting on one of these servers with SS2K+SP3 (std) and I've been told that I have to wait .... and wait.... and wait.... before I can take the service down to do anything with it.  We have a log exploring program that is spitting out about 10 messages a day saying that there is a bad stripe on the controller.

    Thanks to this, I've managed to get some new dual-cpu boxes through with the full complement of 6 drives which I plan to set up in 3 lots of Raid-1.  I've got a few options in my head and I'm still trying to work out the best way to go.  Each will have SAN connectivity to cover backups + non-system DBs.

    There is the 1-tempdb-per-cpu option:

    d0: O/S + swap + MSSQL + system DBs

    d1: tempdb

    d2: tempdb

    There was the option of:

    do: O/S + MSSQL + system DBs

    d1: swap

    d2: tempdb

    or:

    d0: O/S + MSSQL

    d1: swap + system DB logs

    d2: tempdb + system DBs

    then again:

    d0: O/S + MSSQL + system DB logs

    d1: swap + system DBs

    d2: tempdb

    I'd like to keep the system DBs and logs local with the non-system DBs & backups being SAN-based.

    Wasn't it Vinnie Barbarino who used to say "I'M SO CONFUSED!!!"? (if you don't remember Welcome Back Kotter then that's your loss )

     

     

    A lack of planning on your part does not constitute an emergency on mine.

  • So what about raid 6?  Then you can lose 2 disks of the min 4 and still be running.  Slow mind you but you could run.  you can do raid10 but what I don't know is if you can do online disk expansion like you could be raid 5 or 6.  What's that?  put a disk in, the controller see's it and blamo you can add the disk to the array and no you have more disk space.  I am pretty sure you can't do that with raid10.  (maybe I'm wrong I just thought you couldn't do it.)

     

    I wish the last poster had a hot spare on his systems. 

  • If you have to expand the number of disks, which you can with some hardware raid 10/50 pci-x cards, I think they price out at around 600-700 dollars, you probably did not plan the total data usage cycle correctly.

    Either way, SAN's fit excellent if cost is not a factor, but a couple RAID 10 boxes hardware costwise run about 4k-5k US each using high quality parts, for example, seagate five 5 year warranty drives (my favourite) and intel server boards. I've from the ground up deployed a few and there's nothing like having 2-6 terabytes of storage for many business applications, including iso's, mail backups, sql server back ups, document archival and log files.

    At that point, tape or san backups would less impact the daily functions. 

    If you clustered them, given the redundancy, you probably could run everything on it including the live apps, as I've not seen may MS server apps that easily allow for more that 2 gig's of memory usage each.

    When you move outside that memory allowance with 64 bit apps, obviously the budget would allow for fibre and such for expansion.

    Which also applies if you are looking at the HP higher end data stores with blades running live and dev apps.

     

     

  • Though it gets techincal, apparently, RAID 5/6 have data corruption issues associated with the structure.  Also I've found on the raid 5/6 boxes I've found a good deal of fragmentation occurs.  It seems that the structure and the way that win2k/2k3 servers work in real life situations are really hard on the drives.

Viewing 15 posts - 16 through 30 (of 41 total)

You must be logged in to reply to this topic. Login to reply