SQLServerCentral Article

Starting and Stopping SQL Server Part 4

,

Well, this is part four of my comments on starting and stopping SQL and

related items (Part 1, Part 2, and Part 3). It's a seemingly small thing until you have to do it for the

first time in production, or something goes wrong. This article will wrap up my

efforts with a look at some ideas that may save you pain.

How long does it take to restart the service? How long to reboot the

computer? Knowing these two items will absolutely increase your comfort level

when you have to do it 'for real' and beyond that, is often critical in making

good decisions. For example, with my current employer we strive for 24x7

availability, but we can easily schedule as much down time as needed any evening

after 9 pm with only a few hours notice. We can also two fifteen minute and one

thirty minute windows (breaks and lunch) during the day that we can use if we

can't wait until 9 pm. Let's say that you have a spid that is hung and you'd

like to restart the service to clear it. Can you do it in 15 minutes? Can you

reboot the server and have it available again in 15 minutes? I'll argue that

this is worth sticking on every data server you have as you may not be the

person doing it - and wondering if something has gone wrong!

Along those lines, let's say that you decide that you're going to stop &

start the service at noon. You send out email letting everyone know that you'll

begin promptly at noon. At 11:55 you're sitting at your desk watching the clock

and playing Solitaire to pass the time. At 12 sharp you right click the server

in Enterprise Manager and select shutdown, then sit back to wait for the

shutdown to complete. At some point it completes, you right click and select

start, then wait for recovery to complete. Easy enough, right?

Yes and no. Yes, you can do it that way and no, you shouldn't do it that way.

Let's look at what BOL has to say about shutdown:

  • Disables logins (except for system administrators).
  • Performs a CHECKPOINT in every database. However, if you stop an

    instance of SQL Server using CTRL+C at the command prompt, it does not

    perform a CHECKPOINT in every database. Therefore, the next time the

    server is started, recovery time takes longer.

  • Waits for all Transact-SQL statements or stored procedures currently

    executing to finish.

Know imagine some knucklehead (ah, I mean user) has started at query at 11:55

that will take 20 minutes to run, what does that do to your schedule based on

the third bullet point? Checkpointing is a good idea, but again, you don't know

how long it will take. Together, they represent a fair amount of risk that you

won't meet your time window. Instead, I recommend the following strategy:

  • At 10 minutes prior to shutdown, turn off SQL Agent. It can take a

    minute or two to terminate depending on what is already in progress, and you

    don't want it starting something just before you initiate the shutdown.

  • Email the reminder that the server will be unavailable beginning at x.
  • At 5 minutes prior to shutdown, run a checkpoint manually in each

    database. In practice you may need to start it sooner if you've increased

    the recovery interval beyond one minute, or you may able to start it later

    if it doesn't take long to complete. You don't have to do this step, but

    it's an extra layer of insurance to make sure as much as possible has been

    written out.

  • Issue 'shutdown with nowait'. This will immediately shut things down

    without doing a checkpoint. Note that doing so doesn't mean you'll lose

    data, it means that rather than cleaning up on shutdown, you'll have to do

    it on startup. The checkpoint we did manually (and outside our window) will

    reduce that cleanup to some degree.

  • Immediately restart the service using whichever of the methods I've

    discussed that you pre fer.

  • Restart SQL Agent
  • Notify everyone that the server is again available. Document the down

    time and reason.

Neither of the above methods guarantees that users get a clean shutdown in

their applications. For example, let's say that your users have an OLTP type

system - order entry perhaps. Typically the application will connect to the

database, get what it needs, then close the connection. When the user saves the

changes, it reopens the connection and issues commands to persist the changes

(hopefully in a transaction). It's entirely possible that they will have started

their session at 11:59 successfully, work three minutes, then try to save at

12:02 and get an error. At that point you have two hopes. One is that users are

trained to not try to save until they are told it's ok (low tech yes, but valid)

or that the application is sturdy enough to give them a chance to retry the

operation until it completes. I'm not saying you shouldn't do the shutdown, just

making sure you see what else is occurring.

Let's change gears a little to a different scenario. Hopefully you're

familiar with how to restore the master database. Briefly, it involves putting

SQL into single user mode by starting it from the command line with the -m

switch. Do it on a practice machine and it's reasonably easy. Not so easy on a

server. Why? Well, they don't mention this in BOL, but you're not the only user

trying to connect to that server! Typically there are tens if not hundreds of

applications, jobs, reports, etc, all failing or being retried while the service

is down. When you start in single user mode there is no guarantee that you'll be

the single user! I ran into this for the first time just this year, replication

agents from another server were connecting as soon as the service came up. Easy

enough to stop them, but there may be other items that are not so easy to

control.

Hopefully some of this four part effort has been useful to you. I look

forward to reading your comments.

Rate

5 (1)

You rated this post out of 5. Change rating

Share

Share

Rate

5 (1)

You rated this post out of 5. Change rating