Disk Configuration Dilemma

  • Assuming a DB server has several databases. In those databases there are tables that are directly looked up by Clustered index. The tables are not joined to other tables. There are 3 or so databases but they contain lots of data tables that get hit heavily. There are a few ether indexes but they are used infrequently relatively to the clustered index. I have a disk array if 12 disks to configure. Assume RAID 10 only so 6 are available for striping etc…

    ·        On one hand I assume that SQL Server has to calculate where to put data if I use multiple file groups. Even if the file groups are separate stripe sets, I imagine the RAID controller will do the job faster given our usage on no joins etc… so multiple file groups seem to offer little benefit. So it would seem, feel free to correct me.

    • On the other hand, I can I create 2 log segments using 1 disk and use the remaining 4 disks as a stripe set where the data read and writes would be quite a bit faster. That said, I am still bound by seek time. Once the seek is completed, the data gets blown on or read off quite quickly. That is a viable option and using 2 log segments allows me to balance log segment usage. All the databases can share the same 4 disk stripe set for data and use one or the other for the log segment all on separate RAID channels.
    • Alternatively, I could create 3 separate single disk segments each with a separate disk for the log. In this manner I could have 3 logs all working at once and independent data disks all on separate RAID channels as well. The issue is the read and write times will be slower on single data disks than the striped sets. The seek time remains constant (basically) but there is the ability to do multiple read write operations plus isolation when rebuilding index and the like. Still I am not sure if that outweighs the overall performance of faster read write overall of larger stripe sets

    Is multiple separate segments with separate logs segments better performance wise than multiple logs with a larger stripe set for data. The database is busy and has concurrent automation id's hitting it around the clock quite heavily. Any opinions or suggestions on the subject? Even ones I have not considered.

  • Eric,

    A few things to consider, and it would be great if you could test a few things. First, if your logs will survive on 1 disk, then use a RAID 1 set and to protect it and eat up minimal disks. That's if the space works. Not sure a RAID 10 benefits a log and R5 is worse unless you need space. Maybe then a RAID 5 (3 disk set) if the writes aren't too heavy. Modern systems are pretty quick and the RAID controller can handle most of the calculations unless you are extremely OLTP write heavy. Separating this is important because it's been shown sequential operations are extremely quicker than non-sequential. 20-25 times faster.

    For the data, if we assume 10 disks, you could have up to 5 R1 sets. One thing you should really think about is using a 2 disk (R1) set for tempdb. This allows parallel operations in tempdb and the database. If you need more disks, go to 4 rather than an R5 arrangement. Other than that, it seems that a single R0+1 for everything else simplifies tuning and makes all the I/O available to SQL. Unless you can try multiple configurations. I'd drop the remaining 8 disks in this config. the OS could share the tempdb array if needed.

    For filegroups, you can specify where tables and indexes reside, so consider this. Be sure you test a few configs with a decent simulated load. Multiple files can help because a separate thread is used for each, so it can improve performance for SQL.

    Check out R0+1 v R1+0, they are different and perform differenttly.

    If this is a SAN, check out this: http://www.sql-server-performance.com/ew_san.asp

    Otherwise: http://www.microsoft.com/technet/prodtechnol/sql/2000/maintain/rdbmspft.mspx

  • Actually I have 14 disks total and 2 are for tempdb and System tables and are 15K rpm, 2 are for log segment and are 15K, and remaining 10 disks are 10K rpm and are for data.

    The disks are split 50/50 with respect to IO and are random access. If it were sequential access, I would use parallel flat file processing and not use a databse at all. This is definitly random access with lots of turn over of data and lots of concurrent connections hitting it, perhaps several hundred at a time for hours on end. Obviously the log is sequential which is the biggest eason to break out data and log. Log is sequential and data tends to be random. Additionally, random IO tends to take priority so mixing log and data on one disk can slow logging regardless of RAID level. For this case, lets assume I am speaking of data segment and not log.

    I assume that if I use one larger RAID 10 array of 10 disks, making a 5 disk stripe, on average it will perform better than 5 RAID 1 arrays. Infact is should perform better all the time I suspect. When performing one operation the reads/writes will be 4x faster. Even if I can read multiple disks at once using 5 RAID 1 arrays, the reads and writes will be 4x slower. Pretty sure there is a formula to show this.

    Your feeling is to use a larger striped array correct? That is my feeling but certianly others have had this decision.

    I believe RAID calculations are similar to below. Raid 10 and Raid 01 are the same. 

    Raid 0 -- I/Os per disk = (reads + writes) / number of disks

    Raid 1 -- I/Os per disk = [reads + (2 * writes)] / 2

    Raid 5 -- I/Os per disk = [reads + (4 * writes)] / number of disks

    Raid 10 -- I/Os per disk = [reads + (2 * writes)] / number of disks

    Regards

    -Eric

     

  • Raid 5 degrades performance only where there are lots of writes, and to be fair you can to a certain extent offset this with lots of memory, if database oprtations don't have to do a lots of physical disk access then you will get away with raid 5 with extra memory. The transaction logs are the exception - make these slow and your entire system goes slow.

    The tempdb issue depends upon how heavy your tempdb is used, if your mdf's exist on say 6 spindles, putting tempdb on 1 spindle may well degrade performance - I'd always say tempdb should have the same number of spindles as the database base mdf's.

    some raid controllers can read and write to both sides of a raid 10 simultaneously so you don't incur any i/o overhead for writes and the reads actually double in performance.

    Disks are usually the main bottleneck and memory can often more then offset this - if your data is in cache there is no disk issue ( except transaction logs )

    I prefer raid 10 but if spindles are limited then you might do better with raid 5. For a fixed number of disks raid 5 will always really perform better, but for an array of the same number of usable spindles raid 10 will always win.

    e.g. 6 disks for raid 5 require 10 disks for an equal raid 10 array.

    [font="Comic Sans MS"]The GrumpyOldDBA[/font]
    www.grumpyolddba.co.uk
    http://sqlblogcasts.com/blogs/grumpyolddba/

  • I'd always say tempdb should have the same number of spindles as the database base mdf's.

    Sure, that is a nice tip. It makes sense. One question, when using hardware RAID controllers, the option for RAID 10 can be used for the Transaction log, correct? I keep reading that you can not stripe the log but a hardware controller abstracts that issue. Is what I read regarding log and RAID issues only when using software RAID, like the Windows operating system software RAID support, or am I just reading information incorrectly?

    Where memory can not help is when loading data and defragging clustered indexes. We have to do it several time per day. If you are inserting and deleting alot, memory will not help. Raid 5 is great for relativly static data as you mention, that is where reads are the bulk of the volume, data is in cache so disk means little, or where updates are not to index values. Wharhousing comes to mind actually. High volume OLTP I just do not see RAID 5 working regardles of memory. The actual data turns over too much. Actually our reporting database is RAID 5 as it is read at that point.

    Can anybody clarify this point, I just can not see why a striped log would be a problem with a state of the art hardware RAID controller. Any information on this as I would love to stripe the log. Last time i tyired it SQl Server did not complain so I think that information about not using RAID o pertains to software RAID but I am very unclear on where this is coming from. Perhaps I just though I read it somewhere.

     

    Regards

    -Eric

  • Well I used raid 10 for tran logs back in the days of sql server 6.0. Usually you just select, say 4 disks, in the hardware set up and click on mirror ( or raid 1 ) and the controller does the rest.

    Software raid isn't so reliable and causes an overhead. Never tried raid 10 in software.

    striped mirrored logs run quick!

    If you have lots of memory, 32Gb or more, then it seriously improves performance and offsets disks. To improve index rebuilds add processors.

    [font="Comic Sans MS"]The GrumpyOldDBA[/font]
    www.grumpyolddba.co.uk
    http://sqlblogcasts.com/blogs/grumpyolddba/

  • Yes I figured the controller would work. I am going to RAID 10 my log and use a seperate RAID 10 on the data. Thanks.

    -Eric

Viewing 7 posts - 1 through 6 (of 6 total)

You must be logged in to reply to this topic. Login to reply