Is Replication hitting a limit ?

  • Environment :

    SQL Server 2005 32 Bit on Windows 2003 R2 server

    64 GB - AWE Enabled

    Question :

    I have a few replications set up and all is good. Until....

    I created a new Publication P1 and then tried to create two Subscribers S1 and S2.

    I was able to create only one subscriber. doesnt matter which one I can create first, it allows me to create either S1 or S2.

    As I create the second subscription and look at the sync status all it says is "Starting Agent" for a long time and then stops.

    Am I hitting a limit here ?

    Note: My Publisher, distributor and Subscribers are all different servers

    I tried the heap changes on the publisher from the article "http://support.microsoft.com/kb/184802/". Does not work.

    Help much appreciated

    Thanks

    Shinoj

  • Any thought experts ?

  • try adding some logging on your agent job and output it to a text file, see how far the agent is getting and whether its running into any problems. it sounds like it could be a locking issue but that pretty hard to say with limited information.

    see this article on setting up logging if you are unsure how to do it http://support.microsoft.com/kb/312292

  • Thanks Guys... I guess I know whats happening now.

    I think I'm hitting the limit on maximum number for continuous subscriptions on a distribution server.

    I am planning to setup scheduled subscriptions

    Shinoj

  • interesting. what limit are you hitting? number of publications?

  • I would like to know that as well... We have 325 Plus tables being replicated to two subscribers. These articles are published by using 10 publications.

    -Roy

  • I have only about 10 publishers and 20 subscriptions in all but I kept hitting the limit

    as I start the snapshot agent after creating new publication and subscription.

    I delete an old subscription from some other publisher and then try the new one - that works

    I'm wondering is it because of my 32bit service?

    Scheduled subscriptions seems to be working... Here is an article I found...same issue

    http://ask.sqlteam.com/questions/1378/replication-limit-on-number-of-push-subsciptions

    Shinoj

  • Thanks for sharing that. It will be a good reference when our replication set up blows apart... 🙂

    -Roy

  • interesting, thanks Shinoj

  • When I have seen this it was down to exceeding the number of worker threads configured in the various SQL Agent subsystems

    To check this

    on the distributor

    select subsystem,max_worker_threads from msdb.dbo.syssubsystems

    where subsystem in ('Snapshot','LogReader','Distribution')

    or

    exec sp_enum_sqlagent_subsystems

    check the number of max_worker_threads against the number of agents running for each subsystem. Generally you also see a message in the logs indicating the stalled process is in fact waiting for a worker thread.

    There are two work rounds that I am aware of. Increase the number of worker threads per subsystem in the msdb.dbo.syssubsystems table or perhaps more safely set up a proxy and run additional distribution or snapshot agents under the proxy account.

    The latter approach has worked for us. We have 2 proxy accounts. 128 Distribution agents running under the SQL Agent account and a further 35 split across the two proxies

    This setting used to be held in the registry see http://support.microsoft.com/kb/306457 and while this is an old KB there is reference to SQL 2005 and manually adjusting the syssubsystems table

    also check out the msdb proc sp_verify_subsystems as this gives details of how SQL Server sets up the worker threads on installation. You need to run this proc manually if you have adjusted a server from 24 CPUs down to 1 to get round the SQL 2005 RTM bug that causes installation failure where the number of CPUs in the server is not a power of 2.

    When you finish the install and get SP2 on you can then switch the number of CPUs back to 24, however because the system was built with 1 CPU all the max worker threads are configured for 1 CPU

    Deleting all rows in the table then running sp_verify_subsystems resolves this for you.

  • My subsystem table has these value, seems high and I hardly have abt 10 publishers and about 20 subscribers. I wonder how I hit a limit

    Snapshot - 800

    LogReader - 200

    Distribution - 800

    Do you mean running sp_verify_subsystem will change these values? Will it need a distributor restart after i run sp_verify_subsystem?

    Shinoj

  • The numbers returned by that query are very suspect. Try running select * from MSdistribution_agents on the distributor. I'm curious but are you using shared distribution agents? The issue you are describing is something that is hit pretty easily and the only solid way that I know of around this is to change those agents to shared.

    To change to shared distribution agent you have to drop the subscription, go to publication properties, subscription options, and change to Independent Distribution Agent.

    You can quickly check which publications are set to independent agents by querying syspublications on the published database. Look at the independent_agent column. In most cases you can use a shared agent and not have problems. We only use independent agents when we have a REALLY active table and we do this so that other tables in replication are not negatively affected.

    Hope this helps.

    David

    @SQLTentmaker

    “He is no fool who gives what he cannot keep to gain that which he cannot lose” - Jim Elliot

  • 1)

    select * from MSdistribution_agents

    17 and 12 form two distribution agents a toal of 29

    2) checked all the publications and all are using independent agent (column value is 1)

    3) also checked the sysproxies table (SELECT * FROM msdb.dbo.sysproxies)

    Looks like there is a proxy for all the agents running on the server and the proxy subsystem shows me the proxy assigned to the subsystem. Also all the jobs run as their corresponding Proxy Account

    Noticed one thing though - all the jobs Run as the proxy account but the owner of the jobs is a sql account distributor_admin (but do not see the entry in sysproxylogin table for the sid and proxyid)

  • distributor_admin is a sysadmin. I guess that's why the use is not in sysproxylogin

  • Shinoj Ramanathan (8/10/2010)


    My subsystem table has these value, seems high and I hardly have abt 10 publishers and about 20 subscribers. I wonder how I hit a limit

    Snapshot - 800

    LogReader - 200

    Distribution - 800

    Do you mean running sp_verify_subsystem will change these values? Will it need a distributor restart after i run sp_verify_subsystem?

    Shinoj

    deleting the rows from the table then running sp_verify_subsystems will reset the number of worker threads depending on the number of cpus in the server

    if you do sp_helptext sp_verify_subsystems you can see exacly what this proc does but in essence there is a check for each subsystem with the following code extract being an example of what the proc does per subsystem

    IF NOT EXISTS(SELECT * FROM syssubsystems WHERE subsystem = N'Distribution')

    INSERT syssubsystems

    VALUES

    (

    6, N'Distribution', 14553, @InstRootPath + N'SQLREPSS.DLL', @ComRootPath + N'DISTRIB.EXE',N'ReplStart',N'ReplEvent',N'ReplStop',100 * @processor_count

    )

    If the server you are working with has 8 CPUs then the worker threads you have posted is correct as can be confirmed by the multipliers in the sp_verify_subsystems proc for each of the sub systems.

    The number of worker thread is not specifically related to number of publishers and subscribers rather and provided they are not shared agents each subscription to a publication will have its own agent so if you have 10 publications and each publication has 20 subscribers then you will have 200 distribution agents. This can be seen on the distributer in the SQL Agent Jobs and do a count of these with a job category of Repl-Distribution.

    select COUNT(*) from msdb.dbo.sysjobs sj join msdb.dbo.syscategories sc on sc.category_id = sj.category_id

    where sc.name = 'REPL-Distribution'

    If you try this in my experience all you need is a SQL Agent restart so that it picks up the new values.

Viewing 15 posts - 1 through 15 (of 15 total)

You must be logged in to reply to this topic. Login to reply