Tempdb pefformance problems

Question

Post reply

Tempdb pefformance problems

Dean Jones-454305

SSCarpal Tunnel

Points: 4804
More actions
December 8, 2009 at 6:28 am

#143435

I would like to know what to check for if I have tempdb contention problems, currently I don’t see any high alerts for the disk related counters on perfmon. At the same time, I want to rule out the tempdb configuration problem with the use of 8 tempdb files with equal sizes, I have learnt that this can sometimes cause problems and one should start with 1 core = 1 file first and work down the list, at the moment I have 8, as the server has 4 CPU’s and is a dual core machine.
I noticed that select * from #temptable takes around 2 seconds when there is only 1 thread running the query, but as soon as another thread starts, then it takes a ridiculous amount of time, I mean 30 – 40 seconds.
Are there any DMV’s that I can query, to nail out the root cause of the issue, I have looked at CPU and memory and they all look fine, I’m suspecting tempdb as being the bottleneck.

Viewing 12 posts - 1 through 11 (of 11 total)

You must be logged in to reply to this topic. Login to reply

Jonathan Kehayias One Orange Chip Points: 26778 More actions · Answer 1

Paul Randal has blogged about the 1 file per cpu with tempdb in the past. You can read his thoughts on this at:

In Recovery... | Search Engine Q&A #12: Should you create multiple ...

I wouldn't have more than 2 files with your stated configuration unless you have guaranteed PFS contention. To troubleshoot tempdb see the following whitepaper:

Working with tempdb in SQL Server 2005

It would take much more than 2 connections to cause tempdb contention unless tempdb is on a incorrectly configured IO subsystem. Ideally tempdb should be on its own dedicated disk array that is a RAID 1+0 to minimize the impact of write costs associated with RAID 5.

Dean Jones-454305 SSCarpal Tunnel Points: 4804 More actions · Answer 2

Thanks for the reply, so you saying that 0.5 x number of cores would be beneficial or is it more like 0.25 x no of cores ?

I guess one would have to try out both of them to see which one is suitable, but is there a way to run checks to track performance when using both options ?

Thanks.

Jonathan Kehayias One Orange Chip Points: 26778 More actions · Answer 3

There are a couple of things you could look at. If the bottleneck is not PFS contention and is instead a diskio problem, sys.dm_io_virtual_file_stats() would be a good thing to watch between the two. This will show you io_stall rates for the database files for tempdb as well as the write vs read io rates for each file. Beyond that, it would be monitoring tempdb as outlined in the whitepaper to see which configuration has the least amount of contention.

Dean Jones-454305 SSCarpal Tunnel Points: 4804 More actions · Answer 4

Jonathan Kehayias (12/9/2009)
There are a couple of things you could look at. If the bottleneck is not PFS contention and is instead a diskio problem, sys.dm_io_virtual_file_stats() would be a good thing to watch between the two. This will show you io_stall rates for the database files for tempdb as well as the write vs read io rates for each file. Beyond that, it would be monitoring tempdb as outlined in the whitepaper to see which configuration has the least amount of contention.

Are there any guidelines for checking sys.dm_io_virtual_file_stats, ie baseline figures etc.

Also, reading up MS information on PFS contention see http://support.microsoft.com/kb/328551, its does say it refers to SQL 2000, for SQL 2005/2005 are there any guidelines for this and what do I need to be looking at for detecting PFS contention.

Thanks.

Jonathan Kehayias One Orange Chip Points: 26778 More actions · Answer 5

For IO_Stalls, as close to zero as possible. Stalls = bad performance/high latency to disk.

For SQL 2005 look at the following Engine Team Blog posts:

TempDB Monitoring and Troubleshooting: Allocation Bottleneck

Dean Jones-454305 SSCarpal Tunnel Points: 4804 More actions · Answer 6

Thanks for the information. I have checked tempdb on system A and ran thesame code on Server B, Server B has a dedicated IO subsystem for tempdb.

SELECT substring(name,1,20) as 'Database',

case file_id when 1 then 'Data'

when 2 then 'Log'

end as 'Type',

num_of_reads,

num_of_bytes_read,

num_of_writes,

num_of_bytes_written,

size_on_disk_bytes,

io_stall_read_ms,

io_stall_write_ms,

io_stall

FROM sys.dm_io_virtual_file_stats(null,null),

sys.databases

where sys.databases.database_id = sys.dm_io_virtual_file_stats.database_id

and substring(name,1,20) = 'tempdb'

go

tempdb|Data|363945|22155657216|354290|22878412800|1073741824|650129|6007306|6657435

tempdb|Log|21|450560|107859|6620096000|5368709120|15|172421|172436

tempdb|NULL|362888|22146949120|353840|22856851456|1073741824|611836|6062488|6674324

tempdb|NULL|363025|22144933888|353933|22862233600|1073741824|629907|6034630|6664537

tempdb|NULL|362770|22139232256|353795|22854279168|1073741824|608596|5929837|6538433

tempdb|NULL|363005|22142525440|353961|22862700544|1073741824|610345|5971057|6581402

tempdb|NULL|363107|22149324800|354143|22863060992|1073741824|616765|6067918|6684683

tempdb|NULL|363044|22153854976|353945|22862241792|1073741824|617332|6043372|6660704

tempdb|NULL|363137|22153707520|353900|22861438976|1073741824|631008|5965039|6596047

Not sure how to read the data above, but it looks pretty bad doesnt it ?

Jonathan Kehayias One Orange Chip Points: 26778 More actions · Answer 7

Without knowing more information it is impossible to tell you good or bad. How long has the server been up? If those numbers are aggregate over 2 months, then they don't look as bad as if you restarted the server on Monday. You can get an idea of uptime by querying sys.databases for the create date of the tempdb database since it is created when the server starts.

What RAID level and how many disks is the tempdb drive? I'm going to guess RAID 5 or 6 based on the fact that writes are approximate 10 times worse than reads for stalls? One of the trade offs with RAID 5 or 6 is that there is a penalty for writing data due to parity calculation. What is the controller cache configured for ratio wise read/write? How much cache is on the controller. I'd reduce the number of files personally which should help reduce disk head contention trying to write across multiple files. You may have traded one problem, allocation contention in SQL for hardware contention at the disk array. Finding the balance between the two is what you are essentially trying to do here.

Jeff Moden SSC Guru Points: 1000539 More actions · Answer 8

Dean Jones-454305 (12/8/2009)
I noticed that select * from #temptable takes around 2 seconds when there is only 1 thread running the query, but as soon as another thread starts, then it takes a ridiculous amount of time, I mean 30 – 40 seconds.

For how many rows?

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.
"Change is inevitable... change for the better is not".

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)
Intro to Tally Tables and Functions

TheSQLGuru SSC Guru Points: 134017 More actions · Answer 9

Dean, there are a LOT of variables here. I recommend you get a performance tuning professional in for a day to review your system and make recommendations as well as mentor you on what to look for in the future.

Some initial questions that I would ask:

1) how many physical spindles underly your tempdb partition.

2) how many other things are on that partition?

3) tlog on same partition?

4) are the 8 cores physical or is hyperthreading enabled?

5) what is raid type if applicable?

6) is the partition sector aligned and is cluster size 64K?

That is just a start...

I also don't understand what you mean by a second query causing a 10X delay in response...

Best,
Kevin G. Boles
SQL Server Consultant
SQL MVP 2007-2012
TheSQLGuru on googles mail service

Dean Jones-454305 SSCarpal Tunnel Points: 4804 More actions · Answer 10

TheSQLGuru (12/10/2009)
Dean, there are a LOT of variables here. I recommend you get a performance tuning professional in for a day to review your system and make recommendations as well as mentor you on what to look for in the future.
Some initial questions that I would ask:
1) how many physical spindles underly your tempdb partition.
2) how many other things are on that partition?
3) tlog on same partition?
4) are the 8 cores physical or is hyperthreading enabled?
5) what is raid type if applicable?
6) is the partition sector aligned and is cluster size 64K?
That is just a start...
I also don't understand what you mean by a second query causing a 10X delay in response...

1) how many physical spindles underly your tempdb partition.

2 spindles for system DB's, as there is a seperate LUN for this, and the size needed is small, hence 2 spindles were used here.

2) how many other things are on that partition?

Just other system DB MDF files, so only master,msdb and distribution for replication

3) tlog on same partition?

Transaction is on a different LUN as well, 2 spindles used here.

4) are the 8 cores physical or is hyperthreading enabled?

HT not sure, 8 cores. 4 physical socksets.

5) what is raid type if applicable?

RAID 10

6) is the partition sector aligned and is cluster size 64K?

Yes, it is aligned.

Jonathan Kehayias One Orange Chip Points: 26778 More actions · Answer 11

Dean Jones-454305 (12/12/2009)
1) how many physical spindles underly your tempdb partition.
2 spindles for system DB's, as there is a seperate LUN for this, and the size needed is small, hence 2 spindles were used here.
2) how many other things are on that partition?
Just other system DB MDF files, so only master,msdb and distribution for replication
3) tlog on same partition?
Transaction is on a different LUN as well, 2 spindles used here.
5) what is raid type if applicable?
RAID 10

If you have 2 spindles you have RAID 1, not RAID 10 and this is the problem with your tempdb configuration. A RAID 10 would require a minimum of 4 disks, 2 mirrored pairs, striped. You only have a mirrored pair, no striping, so your maximum IOPS is based off a single disk.

While tempdb may be a "system database", it is one of the most important databases configuration wise because it is used the most heavily, especially on a system that is memory constrained or that makes heavy use of worktables for interim results, sorts, hashes, or uses temp tables explictly. Tempdb should be moved to a true RAID 10 disk array that has the ability to do the IOPS required by your workload.