Replication is extremely slow !

  • Hi,

    We have a large Prod 2005 DB ~ 350G that is being replicated (transactional) to 2 subscribers.

    The app team performed some data archival procedures on the publisher DB over the weekend and deleted about 25 G data. These transactions are now replicating over to the subscribers.

    However when I look at the log reader, the transactions are still being scanned from the log and moved from publ to distributor and when i check the undistributed commands tab i see pending transactions :

    # of undistributed cmds: 13618842

    # estimated time: 157.15:00:42 !!!

    I tried to change some of the log reader and distribution agent profile settings and also increased frequency of log backup but nothing has helped.

    Would it be best to reinit the subscribitions and resynch the snapshot at this stage? Will running snapshot impact the production tables and lock them for a considerable time?

    Is there anything else we can do to resolve this issue or speed up the replication?

    Any suggestions are appreciate

  • Hi.

    check the scan density of that tables like

    MSMERGE_CONTENTS, MSMERGE_TOMSTONE etc..

    dbcc showcontig ('MSMERGE_CONTENTS')

    - Scan Density [Best Count:Actual Count].......: 50.44% [26305:33534]

    if its too low then reindex it like:

    dbcc dbreindex ('MSMERGE_CONTENTS')

    it should be around 100 % or even more than 90%.

    check it will help you alot.

    Thanks,

    Noman

  • Hi

    We had a similar performance issue - but with Log Shipping and not replication, this was how we resolved it:

    We had our poduction system that did 30 minute log backups for use in Log Shipping to another server at the same site.

    The restore of these logs used to take max 3 minutes per log.

    Then the log restore time started getting worse and landed up at almost 20 min.

    We identified that the destination server was taking a long time to apply the logs to the Database - with 1 factor being that the database indexes were very fragmented.

    We did a re-index on the database and this brought the restore time back in line (we did cycle the server as well to flush the system)

    Not sure if this would apply - but based on the amount of free space re-claimed - this may be something to consider?

    Let us know

    Thanks

    Kevin

  • I am not sure if you are aware, but by default replication replicates one row at a time.

    So, since you deleted 13 million rows, the distribution agent needs to run 13 million individual delete stored procedures.

    The default procedures are pretty simple, but say it takes 1/100th of a second per to join the transaction and run the delete it will take 36 hours to replicate the action.

    Transactional replication is not really designed for huge databases going through major purge operations. In the future, for something like this, you may want to disable your replication, do the deletes on your publisher and subscribers, and then re-enable replication (being very careful, of course).

  • Okay, I understand and agree completely.

    (It was the application team that did it !:w00t:)

    What is the next best alternative in our situation now?

    or Is there any other way to speed it up ?

    Thanks for your inputs

  • for such huge amount of deletes,

    try using a sproc. set it to execute store procedure. for more information:

    http://technet.microsoft.com/en-us/library/ms152754.aspx

    -----------------------------
    www.cbtr.net
    .: SQL Backup Admin Tool[/url] :.

  • the LogReader (LR) reads from the AppDb tranlog (.LDF) and writes to the distribution db

    if these App team changes are still going in, the supposition must be some blocking and/or resource problem holding it up. Specifically, you should

    1. use ActivityMonitor (or who2 etc) to see any blocking

    2. check the AutoGrow limits on your distribution db (and adequate diskspace on D:,E: etc)

    you should also eliminate the various Distribution Agent (DA) as source of conflict - maybe stop them until the LR finishes.

    I hope that your App team originally broke the DELETEs into small pieces to minimise blowing the AppDb tranlog, and that such batches should be flowing out to subs (visible with Profiler) albeit slowly as is row-by-row as you know.

    Grooming (aka culling) is normal procedure to prevent PROD servers getting overwhelmed, so you may get faced with this every week/month/year, and necessary to have viable solution. Some sites may use partitioning or have multiple publications (eg Jan08, Feb08 etc), so is easy to manage (eg just drop the Jan07 pub and all underlying tables everywhere.

    Tell us your solution [we promise not to larf]

    Dick

Viewing 7 posts - 1 through 6 (of 6 total)

You must be logged in to reply to this topic. Login to reply