SQL Server log backup fails sometimes

  • We are running SQL Server 2008 R2 SP1. There is a scheduled transaction log backup using SQL Server maintenance job that runs at a certain time of the day. The job runs fine on few days and fails few days and there is no specific pattern. I changed the schedule to run every 5 minutes and the log backups were fine for the first 3 attempts and started failing randomly.

    -- operating system error 53(the network path was not found.) --

    I am not sure why it fails few times and runs fine the rest of the times. The backups are going directly to SAN storage and use UNC path - e.g. \\vnvx\SQL_backups\

    SQL Server errorlog has the following messages:

    Backup Error: 3041, Severity: 16, State: 1.

    Error: 18204, Severity: 16, State: 1.

    BackupDiskFile::CreateMedia: Backup device

    'N:\Backups\Userdatabases\Database\db _backup_200911111511.bak' failed to

    create. Operating system error 53(the network path was not found.)

    If it fails on a regular basis I could have checked permissions, SQL Server agent start up ID etc. However it fails few times and runs successfully few times. The job is set to run with 1 retry after 2 minutes and sometimes the 2nd attempt goes through.

    Please help. Thanks in advance.

  • Hi,

    Error message "Operating system error 53(the network path was not found" clearly indicates that the network path could not be found in order to access/create, hence you need to check if the drive letter is disconnected at times with SAN.

    Regards,

  • The error message states the destination path can not be reached. This indicates problems on the network or on the SAN level.

    There could be too little bandwith or too much latency on the network. If the problem is storage related, it could be the buffer cache of theSAN can't handle the load or there is too much I/O on the LUN's.

    ** Don't mistake the ‘stupidity of the crowd’ for the ‘wisdom of the group’! **
  • Thanks guys. How to troubleshoot this problem? Is there any kind of step by step diagnosis process that we can setup to find the issue?

  • And also if the issue is SAN setting related, we should notice the same with other SQL Servers in the mix. However that is not the case. We are seeing this issue just on this machine. Is the problem specific to this machine where we are seeing this issue?

    One of the options that I am thinking off is to take the backups to a local drive and have a file copy to move files from SQL Server host to the SAN drive. Shouldn't the file copy have the same problem?

    Sorry to be vague.

  • N Nara (8/21/2013)


    One of the options that I am thinking off is to take the backups to a local drive and have a file copy to move files from SQL Server host to the SAN drive. Shouldn't the file copy have the same problem?

    Create the backup on a local drive is a good idea, it will most likely even speed-up the SQL backup process. If the copy-task will have the same problem is unknown. It depends on the time it takes and on other simultanious actions taking place on the network and SAN. The filesystem is designed to handle files, so I expect the file-copy to be more efficient and maybe even faster.

    ** Don't mistake the ‘stupidity of the crowd’ for the ‘wisdom of the group’! **
  • Thanks Hanshi. I probably will go with the local backup option.

    However, I would like to troubleshoot the problem. Any pointers would really help.

  • Go for local backup and ask the backup team to backup from that location.

    hope local drives are SAN drives mounted?

    Regards
    Durai Nagarajan

  • N Nara (8/21/2013)


    Thanks Hanshi. I probably will go with the local backup option.

    However, I would like to troubleshoot the problem. Any pointers would really help.

    First I would try to determine if the destination will always be reachable. Setup a process (batchfile) to continuously ping the destination server and capture the results. You can ask the network admin to monitor the available bandwith and network troughput from the server to the SAN. Also let the SAN admin monitor the buffer utilisation and I/O activities. Combine all monitoring results to see if anything is out of the ordinary at the time the probllem occurs.

    ** Don't mistake the ‘stupidity of the crowd’ for the ‘wisdom of the group’! **
  • If the issue is specific to these backups then check the mountpoint, and LUN setup you are writing to, could be you have a disk within the LUN (or if the LUN is only the 1 disk) that is about to crash and is causing the LUN to destabilise and drop on and off the network.

  • We ran hrping with millisecond timeframes and it didn't show any packet loss.. not sure now.

Viewing 11 posts - 1 through 10 (of 10 total)

You must be logged in to reply to this topic. Login to reply