SQL Agent jobs Not failing

  • On one of our servers (SQL 2k SP4), the SQL Server Agent is starting successfully, but none of the jobs have any recorded job history for several weeks. (It's a development server, and I'm new employee here.) The jobs initially appeared to run when executed but were not reporting a 'Success' for last job status, and in some instances there is no job history either. I have tried running the the jobs under different user contexts (sa, sqlAgent account, my user) to no effect except under my domain account, when it shows the job as a fail and records a history. The SQL Agent account is a windows account local to the server and has user level permissions in windows, and SysAdmin rights on SQL Server.

    I tried cycling the SQL Server and the agent; I also tried switching the user the Agent runs under to the system Account to no effect. The Agent runs under the same user account as the SQL Server itself.

    I see the following errors in the SQL Agent Error Log each time a job is started:

    - [298] SQLServer Error: 6, Specified SQL server not found. [SQLSTATE 08001]

    - [298] SQLServer Error: 11004, ConnectionOpen (Connect()). [SQLSTATE 01000]

    - [382] Logon to server 'DEVINTSQL' failed (ConnAttemptCachableOp)

    If I happen to get a history entry for a job the error is:

    - 'The job failed. Unable to determine if the owner (domain\steblera) of job Delete Disapproved Template Mappings has server access (reason: Could not obtain information about Windows NT group/user 'domain\steblera'. [SQLSTATE 42000] (Error 8198)).'

    Now for the really weird part, I attached Profiler to watch the server when the job ran, and it actually executes the job steps successfully (I also manually verified this). The job doesn't report success or failure, and takes an extraordinarily long time to execute, something like 3 minutes for a sproc that executes in milliseconds.

    Additionally, I can connect to the server in Query analyzer just fine from my local machine with a trusted account; but, if I log into the server and Open query analyzer I get an error when the Object Browser tries to open. (Specified SQL server not found. ConnectionOpen (Connect())

    Additional, possibly pertinent information about the server:

    This is a virtualize server. SQL Server is running on a nonstandard port. The Network protocols in use are TCP/IP, Named Pipes, and Shared Memory.

    Any help on this would be great, it drives me crazy when any of my servers is running right.

  • Hi,

    Can you create a new simple job to test and see if the history is getting updated and also you try script the job, delete it and recreate the job.

  • Job job is already vary simple, one step, execute a simple stored procedure.

    I did try scripting out the job and adding it back in. Same result as before, the job doesn't report success and no history is being created.

  • It doesn't answer all of your questions but if you have a lot of jobs the history setting may be to low.

    If you go into Agent -> Properties -> Job System

    You will see the rules about job history retention maybe they are set to low? I have had to raise the row limit on servers with lots of jobs.

  • I checked the job history settings, the current number of job history rows are well within the limits.

  • astebler (12/15/2008)


    I see the following errors in the SQL Agent Error Log each time a job is started:

    - [298] SQLServer Error: 6, Specified SQL server not found. [SQLSTATE 08001]

    - [298] SQLServer Error: 11004, ConnectionOpen (Connect()). [SQLSTATE 01000]

    - [382] Logon to server 'DEVINTSQL' failed (ConnAttemptCachableOp)

    If I happen to get a history entry for a job the error is:

    - 'The job failed. Unable to determine if the owner (domain\steblera) of job Delete Disapproved Template Mappings has server access (reason: Could not obtain information about Windows NT group/user 'domain\steblera'. [SQLSTATE 42000] (Error 8198)).'

    .

    looks like some form of permissions or name resolution.......

    has the server been renamed? can you run the following

    select @@servername,serverproperty('machinename')

    do they match? - has MSDB or master been restored from somewhere???

    the section

    [Quote]Could not obtain information about Windows NT group/user 'domain\steblera'. [SQLSTATE 42000] (Error 8198))..[/quote]

    seems to indicate that possibly the server is dropping off the domain for some reason, or the job is owned by a user who's account has been disabled...

    Just some thougts....

    MVDBA

  • That's fairly close to what I was thinking, There is a definate date to when the failures start occurring, either that or that was the last time someone cleared the job history. My own thinking has to do with the local user account not having any rights on the domain to query Active Directory. But were that true, that would seem to solve only part of the problem.

  • I've managed to resolve the issue. I found a server Alias in the Client Network Utility that was aliasing the servers name and pointing to a network name of '.' (dot). As soon as I removed the erroneous alias all became right again.

    Thanks to those who offered suggestions!

  • Congratulations!

    Amazing what can be found when something doesn't work.

  • glad you found it and thanks for the update. That helps others when they search for similar issues.

  • I'm having this exact same problem on 2 of my 4 sql servers. 2000 with SP4. Jobs run fine but no history is being captured. msdb jobhistory table is empty on the 2 servers with no history and has records in it on the 2 servers that do show job history. Small number of jobs, plenty of space on msdb data and log drives. Both servers quit working the same night. i checked the net client utility and no bad alias's. Any other ideas of things to check ?? Thanks.

Viewing 11 posts - 1 through 10 (of 10 total)

You must be logged in to reply to this topic. Login to reply