Can a Transaction Log delay Checkpoints ?

  • Good Day,

    we are running SQL Server 2005 Std Edition with Windows 2003 on 15K SAN Storage . Our Application vendor recomended we set the Database to Simple Recovery mode as they experienced when the Transaction Log is reaching a certain level , the checkpointing to the MDF file falls behind causing timeouts . I have never experienced it on any of my systems before . Has anybody experienced this ?

  • Checkpoints flush dirty data pages from the buffer cache of the current database to disk. This minimizes the active portion of the log that must be processed during a full recovery of a database. During a full recovery, the following types of actions are performed:

    The log records of modifications not flushed to disk before the system stopped are rolled forward.

    All modifications associated with incomplete transactions, such as transactions for which there is no COMMIT or ROLLBACK log record, are rolled back

  • Good Day,

    thank you for the reply . I am rather interested in the observation our vendor has made that if the Transaction Log file gets too full , SQL Server cannot write the transactions to the MDF file fast enough . And therefore the database has to be set to SIMPLE Recovery Mode . Too me this sounds strange . Any ideas ?

  • Yes it wont be fast enough if you are using full recovery model.I will have to check whether dirty pages have been flushed to disk or not first and other things as i have mentioned in my previous reply.So they have recommended simple recovery model

  • Thank you for the quick response, but this strategy basically destroys SQL Server's Disaster Recovery Plan where you will not be able to make Transaction Log Backups and as a result will not be able to recover the system as close to the Point of Breakdown as possible. Also you will not be bale to use either Logshipping or Database Mirroring . Putting my job on the line .

  • I am still curious why there must not be any data in the Transaction Log ? How does a Transaction Log containing committed transactions impact the performance ?

  • Sounds to me like your vendor is trying to blame something that sounds like an IO subsystem performance problem on the internal workings of SQL Server.

    The time between checkpoints is determined by the recovery interval and the number of records in the transaction log.

    SQL server uses an algorithm to determine when to perform the next check point. The algorithm will create checkpoint to ensure that in the event of system failure the recovery will not take longer than the time specified in your recovery interval. This is set for the entire instance.

    If the checkpoint process is "Falling Behind" then it is possibly because IO subsystem is struggling to keep up.

    I have a short post on check points that you can read here http://www.gethynellis.com/2010/02/sql-server-checkpoint.html

    If you split you transaction log from data file. That is place each on a different set of disk arrays then this will go someway to reducing IO contention between the data and log files.

    Gethyn Elliswww.gethynellis.com

  • Gethyn,

    thank you for the feedback. Do you refer to the overburdening of the single Lun or do you refer to possible Bugs in SQL server that could cause the checkpointer to fall behind ? The vendor claim that they also experienced it on 15 K SAN disks . Does this mean that we might have to split the MDF ,LDF and Temodb files onto seperate LUNS ?

  • lianv 90648 (12/29/2010)


    Gethyn,

    thank you for the feedback. Do you refer to the overburdening of the single Lun or do you refer to possible Bugs in SQL server that could cause the checkpointer to fall behind ? The vendor claim that they also experienced it on 15 K SAN disks . Does this mean that we might have to split the MDF ,LDF and Temodb files onto seperate LUNS ?

    I'd say it was the overburdening of the IO subsystem. Not a bug.

    Brent Ozar has some great stuff on SAN best practices check out this post

    http://www.brentozar.com/sql/sql-server-san-best-practices/

    It is normally recommended as best practice to split out the log, data and tempdb files (and Backups), onto seperate drives but as always and Brent does have a post when its ok to bundle them together

    http://www.brentozar.com/archive/2009/02/when-should-you-put-data-and-logs-on-the-same-drive/

    Gethyn Elliswww.gethynellis.com

  • On very active systems I would definitely separate the IO across multiple LUNS. Not only does it spread out the load on the SAN, but it gives the OS more fingers into the SAN. It will improve performance.

    You can create filegroups and put files on multiple LUNS to split the load. Separating Primary, Data, Indexes, Text/Image and Logs works for me. Data, index, and text\image files can be striped-- log files should not be striped.

    Tempdb can be split up in many ways. But the log and data should be split to their own. If you have a data file per CPU, (or some subset of this) then you can split the data files across disks and investigate using -T 1118 as a startup parameter for SQL. The log should never be striped.

    There are a few other overlooked things that can hit the SAN hard.

    1) OS disk alignment to the SAN. (Perhaps the biggest and most overlooked )

    2) Data file growths without Perform Volume Maintenance permission.

    3) Log file growths and Zeroing.

    4) Log file fragmentation from growths.

    The articles below are references to what I recommend above.

    http://www.sqlskills.com/blogs/kimberly/post/8-Steps-to-better-Transaction-Log-throughput.aspx

    http://www.sqlskills.com/blogs/kimberly/post/Instant-Initialization-What-Why-and-How.aspx

    http://sqlblog.com/blogs/linchi_shea/archive/2007/02/01/performance-impact-of-disk-misalignment.aspx

    http://www.sqlskills.com/blogs/kimberly/post/Understanding-TempDB-table-variables-v-temp-tables-and-Improving-throughput-for-TempDB.aspx

Viewing 11 posts - 1 through 10 (of 10 total)

You must be logged in to reply to this topic. Login to reply