pageiolatch_SH troubleshooting

Question

pageiolatch_SH troubleshooting

mrTexasFreedom

Ten Centuries

Points: 1302
More actions
September 6, 2011 at 3:37 pm

#244276

We're experiencing some performance problems in SQL Server 2005 installed on Windows Server 2003. The queries are pretty static and haven't changed much in the course of a year. About two weeks ago, our application has begun running more slowly for users.
I've been using the helpful sp_whoisactive stored proc to take snapshots of where the locks are and which queries are involved. That has enabled me to find a few queries that are commonly associated with the pageiolatch_sh. Here's one:
SELECT TOP 1 x12a.match_id AS x12exist_match_id,
x12a.response_date AS payer_verifieddate,
ip.last_updated_date AS subchangeddate,
x12a.payer_id AS x12requestpayerid,
x12a.linked_id AS x12linnkid,
x12a.linked_id_type AS x12linnkidtype,
Isnull(x12a.aaa_reason_code, '') AS aaaresponse
FROM z3_insurance_plans ip
INNER JOIN z3_x12_audit x12a
ON ip.id = x12a.subscriber_id
AND response_271 IS NOT NULL
WHERE ip.id IN ( 2489856, -998877 )
AND x12a.payer_id = '103'
AND x12a.status_code <> 'FRG'
AND Isnull(x12a.aaa_reason_code, '') NOT IN ( '42' )
ORDER BY response_date DESC
The Estimated Execution Plan faults the AND x12a.status_code <> 'FRG' clustered index scan as taking the majority of the resources.
Additionally, I've run performance monitor on the DB server, which has 16gb of physical memory. In non-peak times, I'm seeing a pages/sec count averaging over 350 with a maximum as high as 6000. The % Disk Time can average 4500 with maximum at 16000.
Should I be worried about the pages/sec counts being high? Or is this a good thing for a database server? BTW- the paging file was set at 4gb, but I bumped it up to 16gb for good measure. I don't believe the pagefile is being used, though, as the system memory never reaches peak. SQl Server has 12gb of physical memory allocated, also.
My gut tells me one of the problems is that the data and transaction log files are on the same raid device. My boss rejects this on the basis of the track record of the server not suffering this type of performance problem with the files being configured in this manner.
Any directions you can recommend me take this troubleshooting would be appreciated.
mtf
Forum Etiquette: How to post Reporting Services problems

[/url]

Forum Etiquette: How to post data/code on a forum to get the best help - by Jeff Moden
[/url]

How to Post Performance Problems - by Gail Shaw
[/url]

mrTexasFreedom Ten Centuries Points: 1302 More actions · Answer 1

I have continued to troubleshoot this issue. I noticed that the join is matching an int column in one table against a varchar(50) column in a larger (900,000+ rows) table. That varchar(50) column only contains integers, though, so I'm currently testing altering that column to be a smallint type to see if that will speed this query up.

Anybody have any thoughts on the transaction log and data files all being on the same RAID 1 device? How can I convince my boss this is worthy of an upgrade?

-- mtf

Evil Kraig F SSC Guru Points: 100851 More actions · Answer 2

Couple of things here.

Pages/sec are when you're resolving hard page faults, which since you mentioned your page file, seems to indicate that you're aware of that. However, yes, 350/sec is VERY high. It means the 12 gb assigned to SQL Server isn't currently enough for its standard operations. That doesn't mean I recommend taking away the 4gb you left free for the OS, though.

%DiskTime is ... well... misleading these days, as are a number of disk indicators. That's mostly because of RAID/SAN mechanisms. This are usually benchmarked when running well and then looked at during 'problem times' to help determine the cause. You CAN dig into these values to come up with intelligent results, but that's almost a skillset of its own.

Log and data on the same drive are never helpful, however, these are selects, not writes. You're dealing with read speed being the issue here. Now, is the log helping with that? No. Your disk heads are flying trying to deal with the random and non-random data access that the two are trying to resolve. On the other hand, getting a dedicated spindleset for a LUN for logs is almost unheard of, so unless you're shoving your own spindles into the rack, this really isn't something you're going to be able to do serious work with.

That leaves us what we CAN work with... the queries.

To help reduce the memory load and disk load, we need the queries to need less data to do the same work.

We're experiencing some performance problems in SQL Server 2005 installed on Windows Server 2003. The queries are pretty static and haven't changed much in the course of a year. About two weeks ago, our application has begun running more slowly for users.

This sounds like you hit a tipping point. Or you haven't kept up to date on fragmentation and statistics upkeep.

The first thing I'd do is make sure statistics are updated on the table(s) that are giving you problems during your next maintenance window. Make sure to use WITH FULLSCAN when you do it.

Next I'd look into fragmentation of the tables. Why? These are usually the significant causes of issues that affect multiple queries instead of a single problem child. To check on your fragmentation, here's a reasonable blog article to walk you through it:

http://www.techrepublic.com/blog/datacenter/detecting-index-fragmentation-in-sql-server-2005/251

Sorry I don't have anything better offhand, but someone will probably chime in with a few more links. 🙂

Moving from there, a <> 'FRG' is going to be a clustered index scan. You're pretty much hosed there, it's just not selective enough. However, the rest of the search (subscriber_ID and payer_ID) should be able to help you out with that. The ISNULL(x12a.aaa_reason_code) is completely non-SARGable, which means it can't be used in an index search. The reason for that is because the function has to alter the data before it can be checked to the condition, which means it can't bring the condition to the index, it has to bring the data to the condition.

The other piece you mention, the Varchar to INT, is a significant issue. It has to do an implicit conversion under the hood on one side or the other, so indexing is being ignored somewhere... same issue as above, it's non-SARGable that way.

To assist you in further troubleshooting this, I'd have to see DDL for the tables and the execution plan. If you look in my signature, there's a link for 'help with index/tuning'. It'll walk you through what I and others will need to truly help you here. But I'd start with the maintenance on statistics and fragmentation, since this has 'just started' on a static system.

- Craig Farrell

Never stop learning, even if it hurts. Ego bruises are practically mandatory as you learn unless you've never risked enough to make a mistake.

For better assistance in answering your questions[/url] | Forum Netiquette
For index/tuning help, follow these directions.[/url] |Tally Tables[/url]
Twitter: @AnyWayDBA

mister.magoo SSC-Forever Points: 47068 More actions · Answer 3

Just thinking about simple things to check here, because I am not as experienced as others on these forums...

Are you sure that you are not experiencing unexpected file growth in your dbs that is causing excessive disk activity and manifesting as pageio waits?

MM

select geometry::STGeomFromWKB(0x0106000000020000000103000000010000000B0000001000000000000840000000000000003DD8CCCCCCCCCC0840000000000000003DD8CCCCCCCCCC08408014AE47E17AFC3F040000000000104000CDCCCCCCCCEC3F9C999999999913408014AE47E17AFC3F9C99999999991340000000000000003D0000000000001440000000000000003D000000000000144000000000000000400400000000001040000000000000F03F100000000000084000000000000000401000000000000840000000000000003D0103000000010000000B000000000000000000143D000000000000003D009E99999999B93F000000000000003D009E99999999B93F8014AE47E17AFC3F400000000000F03F00CDCCCCCCCCEC3FA06666666666FE3F8014AE47E17AFC3FA06666666666FE3F000000000000003D1800000000000040000000000000003D18000000000000400000000000000040400000000000F03F000000000000F03F000000000000143D0000000000000040000000000000143D000000000000003D, 0);

mrTexasFreedom Ten Centuries Points: 1302 More actions · Answer 4

Wow. These are great suggestions. Some definite edges for me to pry at this problem. I'll get back to this thread tomorrow with my relevant findings.

Appreciatively,

mtf