Best Practices for Reboot Schedule

  • Why reboot SQL Server 2000 on a schedule? Well, I found a reason this week. Luckily I didn't get burned. I have a Win2K Advanced Server 2000 active/passive cluster, it's sole use is for SQL Server 2000 Enterprise Edition. NOTHING ELSE. My sysadmin and I were checking out the passive node and decided to reboot it (it hadn't been rebooted in 4 months). OOOPPPS, it never came back up. It would start coming up, give an error about a service or driver that wouldn't start, checkdisk, blue screen, reboot and start all over again. We tried several different things. Nothing helped. I finally walked behind the racks and started pushing all the connectors to see if one was loose. Nothing all were seated properly. But amazing enough, the problem stopped and the passive node came up properly. Maybe one of the fiber optic connectors was just a hair loose. We'll never really know.

    BUT...what if we hadn't rebooted the passive node? What if the active node had failed? There probably wouldn't have been a working node for it to failover to and we would of had a catastrophe (minor, but try explaining that to management).

    Solution. We are going to reboot the passive node once a month. If it works ok, we will fail the active node over, reboot it, and then fail back to the primary node. This will assure us that both nodes work, failover works both ways AND clear out the tempdb database.

    -SQLBill

  • Have you considered the practical issues ? For example

    1. If a server is accessed 24/7, what happens during downtime ? Can you warn all users in advance ?

    2. Are your servers 'stand alone' or linked in some way eg replication, distributed queries, file transfers

    3. What services/applications run on your servers ? Read up on any third party software.

  • I have not seen any white paper from Microsoft regarding best practices of a reboot schedule.

    In our organization, we only reboot the servers when patches are applied or (rarely) when hardware fails and necessitates downtime.

    To enable that long term reliability, only the bare minimum services and software are installed on the dedicated SQL servers and the third-party driver versions are chosen carefully after lab testing. The end result was very reliable systems and the ability to convince management that a reboot schedule was not only unnecessary, but created downtime when the goal is to reduce it.

    David R Buckingham, MCDBA,MCSA,MCP

    SQL Database Administrator

    WebMD Corporation


    David R Buckingham, MCDBA,MCSA,MCP

  • Thanks to all for the great comments on this. SQLBill brings a great point with the clustering issue and my configuration falls into that realm. Still not sure whether I will reboot the active node but I am inclined to reboot the passive occassionally.

    Ultimately I agree with Steve and Andy on this one and believe that patches to OS and SQL will require reboots frequently enough and that coupled with any application problems accessing the DB's on the server we will probably be rebooting every 60 days at the outside anyway.

    Still would like to see a white paper on rebooting schedules specific to SQL Server so if anyone does find one please post.

    Thanks again.

    David

    David

    @SQLTentmaker

    “He is no fool who gives what he cannot keep to gain that which he cannot lose” - Jim Elliot

  • On a side note to this topic, how do you take up-time statistics on your servers?

  • One way is to check login time on spid 1-5 in EM under Current Activity or in sysprocesses in master db in QA or EM. This is the time SQL Services were last started up.

    As for me I reboot when patching but occasionally when I have remote machines leave a lot of connections behind (we have Notes connecting to us and the connector seems to go wacko periodically for days on end).

    Other than that we try not to reboot and if something surfaces to cause need then we spend out time researching the problem.

  • Like most respondants we only reboot our servers when neccessary. However as an aside this means you need a scheduled job to cycle your sql logs regularly (we do once a week, but we keep more than the default 6 past logs). The first time you try and check out the log file of a system that's been backing up transaction logs every 15 minutes for the last 6 months without cycling the sql log you'll remember this point... That's "DBCC Errorlog" for ref.

Viewing 8 posts - 16 through 22 (of 22 total)

You must be logged in to reply to this topic. Login to reply