Thoughts on ways to improve performance of heavily used server

  • I thought i would probe the great minds on this site - we have a very heavily used server - as the database has grown, and the number of users has increased, the system response time is slowing to levels which are unacceptable to the users:

    Hardware - 8 way server with 8 gig RAM clustered

    Software - SQL2000 Enterprise SP3, Windows 2000 Advanced Server

    Drives - Fiber connected SAN

     >Data - 8 x 146meg x 15000RPM drives RAID 5

     >Logs - 2  x 146meg x 15000 RPM drives RAID 1

     >TempDB - 2 x 146meg x 15000 RPM drives RAID1

    Users - 500-600 users with avg 1200 concurrent connections

    NIC - 2 x 1gig

    Database size - 180gig

    Data Inserts - 700,000 rows day

    Data updates - 500,000 rows day

    All UDFs and views have been analyzed and are using appropriate indexes.   The database is highly normalized and the typical view is building reports on 50K records by joining 16-20 tables.

    I have been running a variety of performance counters on the system for weeks - CPU utilization hovers around 55% with spikes to 70%, cache memory avgs 3-4 gig and never falls below 2, disk queues are 0 on the log and tempdb drives, run 2-12 (within spec of 2x8=16) on the data drives except during reporting periods when we see sustained levels around 60-70. 

    The problem seems to be the volume of reads and the number of joins required for reporting.  Structural changes to the database are unlikely to occur for the next 12 months due to the complexity of the system and the existing development queue.  I'm wondering what other means people have used to improve performance - here are a few of my thoughts:

    1.  Add additional drives to existing RAID5 array

    2.  Convert RAID5 to RAID10

    3.  Add another 8 GIG RAM to the server – requires Windows 2003 migration (not likely in next 4-6 months)

    4.  Install a second data drive array and place 1/2 the tables on the new array

    5.  Setup a 2nd server, do a nightly database restore from Production - redirect all reports using data prior to the current date to the reporting server.  (Not likely due to cost)

    Any thoughts would be appreciated.

    Harley

  • First start with the basics. Defrag or rebuild the indexes. And capture the most intense queries to see why they are that way. Work with streamlining the current load first before you look at other options.

    Also, if reports get the same data you might consider what options your reporting tool may hve such as creating static reports from data daily, monthly or other, or even caching the data for several mintues before getting live data.

  • Maybe you also should look into updating the statistics.

    Is it just the reports that takes time or are you also experiencing problems loading or removing data?

    Are your dataloading in some way interfering with the users queries?

    //Hanslindgren

  • Here's my 'short list' for tuning:

    Round I

        DBCC UPDATEUSAGE

        UPDATE STATISTICS (with FULL scan) for all tables

        exec sp_recompile for all tables

        exec sp_refreshview for all views

    Round II

        DBCC DBREINDEX

        UPDATE STATISTICS (with FULL scan) for all tables

        exec sp_recompile for all tables

        exec sp_refreshview for all views

    Round III

        Profiler

        Query Hints

    Now lets turn to your SAN ... it appears that you are using LUNs. You may want to talk to your SAN Admin to see if he can perform a little reconfiguration to give you Meta-LUNs instead of LUNs for the Data only. Your LOgs and tempdb seem OK.

    RegardsRudy KomacsarSenior Database Administrator"Ave Caesar! - Morituri te salutamus."

  • For your 180 GB database, are there multiple filegroups? You should look into presenting each filegroup with its own logical drive.

    I would also upgrade RAM. For your database size, # of users, and joins, 8 GB is fairly small.

    We have a similar performance issue at my company. The DB is approx 120 GB, separated into five filegroups. We recently migrated from local storage to an HP EVA8000 SAN, which allowed us to present 5 LUNs, VRAID-5 like yours, with logs as VRAID-1. We have 32 GB of RAM available to SQL2k. Still, we encounter performance issues due to long-running queries that cause blocking.

    We have a mirror database on another server where we try to corral these long-running query users. This sounds like it's not an option for you, but if you have specific users who seem to always run long-running queries, it may be worthwhile to scrounge up a server just for them.

  • I haven't yet tried this myself but a consultant I know swears by it... if you have room in the server, find two relatively small SCSI or SATA drives and configure them with RAID 0, and then move your page file(s) to that drive. He says he can often get a significant performance boost doing so for less than $100.

    I also agree with Ivan and Antares... if you can find some way to offload even a portion of your data to a separate server to reduce load that should help. You might also see if there's a way to schedule some of these reports (depending on how dynamic they need to be) during non-production hours to reduce load.

    Brian

  • I think all of your own suggestions are good ideas. I like them roughly in the reverse order given...5-4-3-2-1. Sell number 5 to management by casting it as a high availability / disaster recovery solution as well.

  • Thanks for all of the thoughful responses......

    In answer to some of the the thoughts,

    >Statistics are updated nightly based on 20% selection. 

    >We just ran dbcc reindex on all the tables - no affect

    >Inserts don't seem to be affecting the reports to a significant degree as report performance is poor even when no inserts are taking place.

    >I would like to increase memory but we can't w/o upgrade the OS to 2003 and the enterprise won't approve the upgrade till late 2006

    >Reporting is in response to clients needs and can't be rescheduled

    It seems to core of the problem is the number of joins in the reports - typical queries pull together 20 tables - all joins are based on indexed fields but the sheer number of joins seems to preclude decent performance.  The best performance increases have come from increasing spindles in the array from 6 to 8 - I have a request in to take it to 12 in the next few weeks even though we are only using a small percentage of the drive space.

    2 questions though -

    Rudy - I don't understand the LUNS vs Mega LUNS - any good sources for info so I can discuss it from a reasonable position.  What best-case impact could be expected?

    Ivan - I believe we are also on an HP SAN - 1000? It's in a remote data center so I don't have access to it - the RAID5 array for the data is configured as 1 partition with the data in 1 file - what benefits would we see if we created addl partitions in the free space on the array and created addl data files on each logical drive?  Since it is sitting on the same spindles, is there a SIGNIFICANT advantage in using multiples files?  Are separate threads used for each file access?

    Thanks again for the help,

    Harley

     

     

     

  • You need to analyse database usage, I suggest you collect statistics for your server so that you can understand how it is performing. Idera Diagnostic Manager is good at doing this.

    One bottleneck you might have is tempdb - if tempdb usage is heavy presenting 1 spindle vs 8 for data could seriously impact performance, if you're going to split tempdb out I'd always recommend the same or more number of spindles than the data drive. Raid 5 will hit writes but not affect reads with regard to changing to raid 10.

    You really need to collect stats for usage, otherwise you're just guessing. disk i/o completion time will indicate if you have disk problems - forget queues they never seem to make any sense on a SAN - measure the transactions for each database, and capture the %age of read / write ops.

    Highly normalised databases often suffer poor reporting - consider some denormalisation to improve performance.

    I'd always add more memory given a chance - SQL Server is designed specifically not to page - if you have paging then it's something else - same applies to the page file as tempdb - it's best placed on the fastest array you have - putting it on a raid 1 will likely bottleneck ;;

    Spindles = performance

    [font="Comic Sans MS"]The GrumpyOldDBA[/font]
    www.grumpyolddba.co.uk
    http://sqlblogcasts.com/blogs/grumpyolddba/

  • Yes, separate threads are used for separate files. Yes, spindles = performance. The situation you are in is difficult to guess at; trial and measurement is the only reliable method. Sounds like you have a lot of extra space, plus a window of decreased load at night. If some of your reporting customers can be satisfied with data as of last night, then pre-performing some of those joins to form a reporting data mart would be one way to go.

    One easy thing to try would be to put a couple of additional new tempdb files on that Raid 5 array. Another easy trial would be to create a couple of new filegroups for your main database, each with their own file(s) on the Raid 5 array, then put some of the non-clustered indexes, and perhaps a few of the bigger tables on them, such that indexes and tables are on separate filegroups.

    It is certainly debatable whether distributing I/O is best done by manually placing database objects on different devices, or by just giving the server lots of spindles and letting the striping distribute the I/O. If you do not have a real good understanding of access patterns in your system, the manual method stands less chance of success, of course.

    You have faithfully executed the basic first rule of disk tuning - separating out data, log, and tempdb. If additional drives and servers are too far in the future, then don't be afaid to experiment a little with spreading out the I/O some more by violating that basic first rule - just make sure you can revert back to a previous configuration. However, I wouldn't mess with the Log array - you want to keep its sequential nature unaffected.

    I would also try running statistics with full scans - its very possible that some queries could benefit significantly; it can't hurt.

  • LUNs vs. Meta-LUNs ... to over simplify ... a LUN is a fixed size (e.g 8 Gb) stripe of data space on a RAID5 array. a Meta-LUN is a fixed size (e.g 8 Gb) stripe of data cast across a number of RAID5 arrays. i.e. lots more spindles for I/O, definitely more efficient. From a SAN perspective it also helps eliminate 'hot' and 'cold' spots on the storage farm.

    RegardsRudy KomacsarSenior Database Administrator"Ave Caesar! - Morituri te salutamus."

  • Thanks for all the well-considered responses - a lot of ideas to digest.

    Regards,

    Harley

Viewing 12 posts - 1 through 11 (of 11 total)

You must be logged in to reply to this topic. Login to reply