Server Hangs

  • I've got SQLServer 2000 Standard running on a Windows 2003 server.  4 GB memory, 2 CPU's.  The application running against the db as well as the db itself has been running and operational for several years.  The server & db must be available 24 x 7.  Latest MDAC, service packs etc. 

    Problem is at least once per week the server hangs and no one can access SQLServer (hince the application server - on another server) stops.  This almost always occurs around 1:22 A.M.  Last time was this Saturday.  There is NO SQL Server jobs running at that time (backups start 2:00 A.M.).  This time is the lightest traffic and load time.  And someone will call network support who will restart the SERVER.  Nothing in logs, nothing in event(s).  No trace.

    Any ideas?

  • Since your application has been running for so long and work activity is so light and the hang time so consistent, the following, I guess, comes under the heading of starting to rule out possible causes.

    1. Check to make sure you have no blocking transactions.

    2. Use DBCC INPUTBUFFER (spid) to see what is being processed.

     3. Use SQL Profiler

     4.  Use System Monitor

     5.  Can application server ping Sql Server and vice/versa (network?)

     6. CPU utilitization % (ditto memory, etc.)

    I guess I am saying basically you need to get a handle on what is happening when the hang occurs, as best as you can.

    Best Wishes

    GaryA

  • Thanks Gary, but of course I need to sleep also.  this thing is once per week, low network, CPU, etc. and at 1 AM not PM.  Have a DOS job that pings from application server to db server every minute which doesn't indicate anything wrong.  Stops pinging around the time the network person restarts the server (of course).

  • Create a black box trace as noted here:

    From SQL BOL

    -------------------------------------------------------------------------------------

    Reporting Errors to Your Primary Support Provider

    If you are unable to resolve a problem, contact your primary support provider for assistance. Anytime you receive a server internal error (for example, assertion or access violation (AV)), contact your primary support provider. If you experience an operating system or I/O error, it is most likely a hardware problem. Correct the hardware problem and restore your database. When reporting an error to your primary support provider, be sure to provide the Blackbox.trc and Sqldiag.txt files.

    For more information about resolving a 9002 or 1105 space-related error, see Troubleshooting Recovery, Error 9002, and Error 1105.

    Use sp_trace_create with the TRACE_PRODUCE_BLACKBOX option to define a trace that appends trace information to a blackbox.trc file in the \Data directory. Once the trace is started, trace information is recorded in the blackbox.trc file until the size of the file reaches 5 megabytes (MB). The trace then creates another trace file, blackbox_1.trc, and trace information is written to the new file. When the size of blackbox_1.trc reaches 5 MB, the trace reverts to blackbox.trc. Thus, up to 5 MB of trace information is always available.

    Use the sqldiag utility to collect information about server version and configuration, .dll file version, error logs, extended stored procedures, operating system, computer version, configuration data, and additional data, all of which is put into Sqldiag.txt (by default located in \Mssql\Log). Also when sqldiag is executed, the two trace files blackbox.trc and blackbox_1.trc (if it exists) containing trace information including any server exceptions, are copied to the same output directory as sqldiag.txt (by default \mssql\log).

    For example, the environment in which the error occurred includes this information:

    Microsoft® SQL Server™ version number (as reported to the error log or returned by SELECT @@VERSION). The first message written to the error log provides the SQL Server version number.

    DB-Library API version number (as reported by the SQL Server Client Network Utility), ODBC driver version (as reported by ODBC Driver Setup), or OLE DB provider information.

    Application version number (for example, SQL Server Enterprise Manager or osql.exe).

    Operating-system version number.

    Hardware platform.

    Production or development environment.

    This additional information is also helpful in troubleshooting a problem:

    The message number, message state, and complete error message text.

    Any variables (numbers, database object types, or database object names) included in the error message.

    The context in which the message was generated (what statement was running at the time).

    The number of users who were logged in to SQL Server when the error occurred.

    The frequency with which the error occurs.

    If you are using SQL Query Analyzer and you do not see more than 255 characters in the results pane, click Tools, and then click Options. Click the Results tab, and then increase the value for Maximum Characters Per Column.

    In addition, review the error logs and, if running Microsoft Windows® 2000 or Microsoft Windows NT® 4.0, review the Windows application log and the Sqlstp.log file, located in the \WINNT directory. You may also want to use SQL Profiler to monitor events.

    To help resolve your problem quickly, you may be asked to send your Sqldiag.txt and error log in e-mail to your primary support provider.

    When providing this information, provide the entire error log, including all messages displayed from startup to the very end of the log. Be sure to send all information beginning with the time of startup and ending with the error message in question.

    Although most startup messages are identical each time you start SQL Server, additional messages sometimes appear during startup, which can provide clues for solving problems that occur during or after startup. If additional messages appear during startup, write down these error messages to assist your support provider in diagnosing and resolving the problem.

    See Also

    Monitoring with SQL Profiler

    -------------------------------------------------------------------------------------

    This will catch a lot of the events right up to the failure which may give you a clue.

    Otherwise set up a profiler trace to monitor after hours, and make sure no other applications are running at the time that could be an effect. Also monitor the number of connections as you may be running out of available connections based on yuor licensing, or hardware limit.

  • Thanks.  I'll give this a try.  I've confirmed that this has happened every Sunday morning at around 1:20 A.M. for past several weeks (about the time I received the new hardware server).

    I have checked the server log files and didn't see any activity from users, of course seen several log in's after server was restarted.  Perhaps 5 users at 1:20 - 1:30 am.  Also, several weeks ago, the network on call support didn't restart the server until around 3 A.M.  and my backups start at 2:00 A.M.  log shows they started and completed! 

Viewing 5 posts - 1 through 4 (of 4 total)

You must be logged in to reply to this topic. Login to reply