Problems With Transactional Replication

  • Hi all,

    I'm no expert on replication, I just kind of fudge my way through it, so forgive me if some of this sounds fractured, out of whack, or missing info.

    Here's the setup: I'm using transactional replication between 2 SQL2K servers Standard Edition, SP3a) connected over a 256k ISDN line. The publishing server is in Canada, the subscriber is in the US. The ISDN line is not used exclusively for this replication process. The publication contains 3 articles using a push subscription to shoot over 3 indexed views as tables on the subscriber. The process is scheduled to run M-F, between 8 AM and 5 PM.

    Here's my issue: every 2 days or so I see a general network error crop up that causes replication to stop. I set the distribution agent to retry over the course of 2 hours or so. I chalked this up to network hiccups or someone on the subscribing server messing about (we don't have direct control over that server), and the retries seem to work for the most part. Sometimes when the retry limit is reached, I get a call and have to manually restart the distribution agent and things eventually synch up. Up to this point, I'm not really concerned cuz the network problems are out of my immediate sphere of influence, so whaddya gonna do?

    However, every couple of weeks or so when I attempt to manually restart the distribution agent, nothing happens. The agent stays in it's stopped state, starting it doesn't work, I can't even reinitialize the subscription. After trying to restart everything unsuccessfully for 20 minutes or so, I resort to zapping the publication and the subscription entirely, create a new publication and subscription, apply the initial snapshot, and set up the distribution agent again, etc.

    Network problems are network problems, but I don't understand why SQL Server is undergoing a cranial vapour lock over this. Has anyone experienced anything similar? Would running a SQL trace reveal anything?

    Thanks in advance for any input.

    Vik

  • I would suggest to start the distribution agent with the distrib.exe file.

    You can view all the parameters in BOL lookin for distribution agent\Starting.

    Executing the file with the required parameters will open a new DOS screen showing the progress of the distribution agent.

    in that way, you can see what the agent is doing. Also you should run sp_who2 sp in both the publisher and subscriber to find out what is happening. I don't think it is necesary to re apply the snapshot again.

  • Thanks for the suggestions. Unfortunately, I cannot run distrib.exe from the command line in this situation, but I can and will run sp_who2 when and if this happens again.

    Thanks again.

    Vik

  • You can accomplish the same thing by adding the logging switches to the job (-outputverboselevel 3 -output c:\repl.log I think). When it hangs up, you might check to see if there is blocking on either side.

    Definitely shouldnt have to kill the pub to fix this. Worst case, open the Win task manager and kill the distrib.exe.

    Andy

    http://qa.sqlservercentral.com/columnists/awarren/

Viewing 4 posts - 1 through 3 (of 3 total)

You must be logged in to reply to this topic. Login to reply