snapshotting a publication is very slow in transactional replication

  • Hi, I was having difficulty getting snapshots done. Some publications that only have a few small articles and that used to finish snapshotting in a few seconds just hang. I got 'The process is running and is waiting for a response from the server'. I got every other runnable processes disconnected, thinking it was because of the contention on the server, but did not work. Btw, our publisher and distributor are on the same server. Which I guess should not matter, because the snapshotting worked fine two days ago.

    Thank you for any insight on this.

    Kathleen

  • Kathy,

    Please check sql server memory and processor utilization(for sqlservr.exe as well as for os level processes) counters under perfmon.

    Manu

  • there are several replication agents

    LogReader Agentfrom LDF -> distribution db

    Snapshot Agentgenerate the flat-files (e.g. *.bcp)

    Distribution Agentdeploy initial snapshots, then subsequent changes ex distribution db

    where you are actually commenting on speed of the DA and not the Snapshot ??

    it is possible that you have hit a resource limit (check db+tlog sizes against AutoGrow max's)

    anyway, the agents are *.EXE that run at the Distributor and connect to the various db's

    - the DA can be quick to do a Bulk Insert to sub db, but then take ages to create PKs etc

    I suggest that you use sp_who2 or activity monitor to check again for blocking

    - failing that identify the DA's SPID and use DBCC inputbuffer(spid) to find what its doing

    BTW there is a dedicated form for replication matters so I suggest you post questions there to maximise received wisdom !

    HTH

    Dick

  • Thanks, Manu,

    I will do that then.

    But since it is a dedicated SQL box, with no other active connections to it, I would think the CPU should be low and memory for sql should stay where they were before ( I give 29 GB out of total 32GB to SQL, and I believe SQL memory stays at the max they are allowed).

    Thanks for the input.

  • Dick, thanks for the reply.

    I am pretty sure its the snapshot agent being slow, because viewing the agent detail, it says

    ....

    [0%] bulk copied snapshot data for article...

    ...

    [15%] bulk copied...

    ..

    [99%] the process is running is waiting for a response from the server

    [99%] the process is running is waiting for a response from the server

    [99%] the process is running is waiting for a response from the server

    or some other publication it might say

    [0%] the process is running is waiting for a response from the server

    and it will re-try and re-try.

    For those that eventually went thru, it took much more time than it usually is.

    I forgot to run my blocker process to check on blocking processes. Thank you for reminding me.

    I will post my message on the right forum. Thank you!

  • Hi Dick, I have a question about the database autogrowing. I understand that when snapshots happen, a bulk copy happen on the publisher, then the scripts get posted to snapshot folders, does the distribution agent read from the folders, in order to distribute the commands to the subscribers? If so, the distribution database should not grow much, since the scripts are stored in the file system, not the distribution database?

    Thanks,

    Kathleen

  • that is correct, the Snapshot writes flat-files on the Distributor(?) which the DA's then read

    - there are doubtless very few rows written to tables in distribution db

    worth checking that you have enough diskspace to accommodate all the .bcp files

    - you should also be running a respectable patch level (eg SP2 is 9.0.3159) and not RTM !

    best of luck checking blocking, space etc

    Dick

  • Thanks again, Dick. I just checked-- we do have enough space for the distribution database.

    I have one more question ( I am asking for too much). So the whole snapshot process include basically two steps, one to bulk copythe tables (run by snapshot agent), second to write the commands to the snapshot folder. If the whole process is slow, it could be either bulk copying is slow ( so memory is in demand?) or could be writing to the folder is slow ( could be disk io problem). So these two resources are potential bottlenecks?

  • Thanks again, Dick. I just checked-- we do have enough space for the distribution database.

    I have one more question ( I am asking for too much). So the whole snapshot process include basically two steps, one to bulk copythe tables (run by snapshot agent), second to write the commands to the snapshot folder. If the whole process is slow, it could be either bulk copying is slow ( so memory is in demand?) or could be writing to the folder is slow ( could be disk io problem). So these two resources are potential bottlenecks?

    It is the Snapshot process that reads the [entire] table(s) from published database and writes the flat-files containing

    .prepreparation (drop constraints, tables)

    .schtable definition and repl sprocs

    .bcpdata

    .idxindex

    and you should find this in folder like (here's one from my system as example)

    D:\Program Files\Microsoft SQL Server\MSSQL.1\MSSQL\repldata\unc\UATDB03$ENV3_BOND_BOND FOR DB03\20080611063794

    and then will write related entries in distribution db

    it is the DA that actually

    reads those files

    delivers to each sub

    deletes the files [usually]

    and using replication monitor you would see [double-click the entries and set auto-refresh] what each is doing

    - DA probably says "waiting for Snapshot" or somesuch

    so in your case it does look like the Snapshot Agent is the one having trouble completing so DAs have nothing to do [yet]

    I suggest you look at flat-files and see what most recent file is (sort by modified date) as a clue where it's stuck.

    Check all NTFS permissions and that agent has permissions to write to distribution

    - I recommend making the agent's domain login to be sysadmin as TEMPORARY measure

    HTH

    Dick

  • Dick, thanks for all your help.

    I was able to get the snapshots done. I found the snapshot agent process got a waittype 'sos_scheduler_yield'. I figured the contention is on sql server agent scheduler threads, because the CPU was low between 10%-15%, and we have tons of jobs running (including logreader and distribution agents for other databases), so I disabled those running ones and snapshots went really fast.

    Thanks again. I learned a lot from you.

Viewing 10 posts - 1 through 9 (of 9 total)

You must be logged in to reply to this topic. Login to reply