SQL Agent job getting stuck

  • I have a job scheduled on a dev server that is scheduled to run every 5 minutes. It will run fine for a few days, and then it will get hung up. The status stays Executing Job -- it never finishes (either success or failure). I can stop the job, and it will immediately reschedule itself for the next interval and run fine again for a while.

    I found this thread that looks similar:

    http://qa.sqlservercentral.com/Forums/Topic10826-5-1.aspx

    However, the troubled job on this server doesn't have any notifications (which is why it was hung up for over 44 hours this last time) or use xp_sendmail as discussed in the earlier thread. It is a single step DTSrun task for a package that gets the file mod date from one of two input files and stores it in a global variable, truncates two staging tables, imports new data from two text files, and then does INSERT/UPDATE/DELETE on each of those into the actual production tables. It usually takes about 8 seconds total.

    I looked at the other jobs on the server (there are only 10 and they are all things like "Cycle Error Logs", and the DB Backup, Integrity Checks, Optimizations, and Trans Log backups for both SystemDBs and UserDBs) and none of them should be running at the time this job last got stuck.

    I'm not sure if it's relevant or not, but the recovery model on the related database is "simple" and I had the database set to autoshrink because this is a simple database with just a couple tables used to allow people to search a course catalog that is updated every 5 minutes with data from the course database. (If the database corrupted, it would be just as easy to rebuild everything and reload it from a text file as it would to restore it from a backup.)

    Any ideas?

    Andrew

    --Andrew

  • Autoshrink isn't a recommeded option. It locks the database for shrinking.

    With simple mode, you don't have to worry about the database growing too big, because at the end it will hover around a certain size. (empty space within the file will be reused).

    Perhaps the shrinking causes a timeout/new error path that the dtsjob doesn't handle well resulting in an infinite loop/wait for resources. Perhaps the path to the files isn't always available (reboots, securitytoken timeout).

    I would turn the autoshrink off (you have simple mode for keeping the size down) and a combination of

    *basis logging before,after and during the dts-package

    *sql profiling specifically for the dts-package (what was the latest step...)

    With sp_who2 you can see if the process is blocked by another process(spid)

  • Thanks for the suggestions. I'm not sure what you have in mind for basis logging, but the rest makes sense. Results from sp_who2 shows the process as sleeping and AWAITING COMMAND, but not blocked.

    For what it's worth, I turned autoshrink off right after I posted my original message, thinking it was probably not necessary. I don't think the change caused it, but the job got stuck again about an hour ago. It was running fine between 11:55AM and 1:45PM (23 executions) before it got stuck again, which is the fastest I've seen it happen.

    Network file access could be a problem, as I'm reading these off a remote machine using a UNC path. It might be necessary to FTP to the local system first.

    Andrew

    --Andrew

  • Well, the job froze again last night just before I went home. Looking at the profile, the job ended after completing a TRUNCATE TABLE command on one of the files, and before calling the stored procedure that handles INSERT/UPDATE/DELETE on the other file. (The workflow in DTS for each file is parallel since there aren't any referential integrity constraints.)

    One change I just made was to turn off the auto create/update statistics that were on by default. The only changes to these tables should be when this job executes, so I just added UPDATE STATISTICS after the INSERT/UPDATE/DELETE steps each procedure.

    Andrew

    --Andrew

Viewing 4 posts - 1 through 3 (of 3 total)

You must be logged in to reply to this topic. Login to reply