How to Pause Transactional Replication

  • Hi,

    I have a Datawarehouse which is replicating our CRM System around the clock.

    We use the replicated data as the basis for our Datawarehouse Fact tables and want to pause the replication for the duration of the fact table creation. This will allow us to have a "snapshot" of the CRM System for the basis of the fact tables.

    Up til now we have set the replication to run until 22:30 and the fact table creation started at 23:00 allowing the remaining transactions plenty of time to finish. We then had a "photo" of our CRM system as of 23:00. The fact tables were done around 04:00 the next day, but to make sure we left a 2 hour buffer before starting the replication again.

    The problem with this was the long catch-up time required after the fact tables were created. Our CRM System has some big batches which run around 03:00 which meant these transactions were stored up til the replication started again. The replication takes up to 3-4 hours to catch-up which is not acceptable.

    I came up with the idea of adding two steps to the fact table process. The first should stop and deactivate the replication distribution agents before the fact tables are created, the second step should start the replication after the fact tables are created.

    In order to allow this type of dynamic stop and start I planned and implemented the following scripts:

    Procedure to stop and deactivate chosen Replication Job

    CREATE PROCEDURE [dbo].[proc_Disablejob] @job_name VARCHAR(255) AS

    DECLARE @jobid UNIQUEIDENTIFIER

    --Find Jobid in Sysjobs

    SELECT @jobid = job_id FROM msdb..sysjobs WHERE NAME=@job_name

    -- only do the work if the supplied job exists

    IF @jobid IS NOT NULL

    BEGIN

    --Temptable to check what jobs are running

    CREATE TABLE #enum_job (

    Job_ID UNIQUEIDENTIFIER,

    Last_Run_Date INT,

    Last_Run_Time INT,

    Next_Run_Date INT,

    Next_Run_Time INT,

    Next_Run_Schedule_ID INT,

    Requested_To_Run INT,

    Request_Source INT,

    Request_Source_ID VARCHAR(100),

    Running INT,

    Current_Step INT,

    Current_Retry_Attempt INT,

    State INT

    )

    --fill temptable with jobs

    INSERT INTO #enum_job

    EXEC master.dbo.xp_sqlagent_enum_jobs 1,garbage

    --Stop the job if it is running

    IF EXISTS (SELECT * FROM #enum_job WHERE running>0 AND Job_ID=@jobid)

    BEGIN

    EXEC msdb..sp_stop_job @job_name

    END

    --Deactivate job

    UPDATE msdb.dbo.sysjobs

    SET enabled=0,--0=Deactivated

    date_modified=GETDATE()--set last modified to today

    WHERE (job_id = @jobid AND enabled=1)--only chosen job and only if currently active

    END

    Procedure to activate chosen Replication Job

    CREATE PROCEDURE [dbo].[proc_Enablejob] @job_name VARCHAR(255) AS

    DECLARE @jobid UNIQUEIDENTIFIER

    --Find JobId in Sysjobs

    SELECT @jobid = job_id FROM msdb..sysjobs WHERE NAME=@job_name

    --Only do the work if the supplied job exists

    IF @jobid IS NOT NULL

    --Activate Job

    BEGIN

    UPDATE msdb.dbo.sysjobs

    SET enabled=1,--1=Active

    date_modified=GETDATE()--set last modified to today

    WHERE (job_id = @jobid AND enabled=0)--only chosen job and only if currently decactived

    END

    I tried this out and everything seemed ok. The Distribution Agents were set to Deactivated and were shown as such in SSMS. However, the jobs continued to start even though they were deactivated!

    Can anyone tell me what I have done wrong or if I am going about this the wrong way? I have no problem with being told that I am a fool for doing it this way. Learning through mistakes is often the best way. 🙂

    Many thanks in advance!!

    Regards,

    WilliamD

  • Hi,

    I found the way to do it properly by taking one of the replication distribution agent jobs and unticking the active flag in SSMS and showing the changes in a script window.

    Quite simple really:

    USE [msdb]

    GO

    EXEC msdb.dbo.sp_update_job @job_id=N'0579c737-f1b1-4c70-a0af-7ae8ec6df2f7',

    @enabled=0

    GO

    Note to self.... stop trying to reinvent the wheel!!!!! 😀

    Regards,

    WilliamD

  • Note that pausing distribution agents is not exactly pausing "replication" is just pausing the subscriber activity; the publisher is still inserting in the distrigution database (logreader).


    * Noel

  • I would assume replication needs 3 to 4 hours to deal with the large CRM batch job regardless of when you start & stop your agents. You should drill in & try to improve that process. I am not a fan of leaving articles in repl when they are subject to batch jobs affecting hundres of thousands if not millions of rows at a time.

    I suggest having routine that pulls table(s) out of repl - let batch run - then add table back in & run snapshot agent. Distribution agent will now deliver bcp files for table rather than individual commands. Only drawback is if table is large - 5 million, and you are affecting 500K rows, you make look an etl tool e.g. to deliver just changed rows. I often let distr agent deliver entire table via bcp which is always faster than delivering individually.

    My 2 cents.

    Viel Glueck,

    ChrisB MCDBA

    MSSQLConsulting.com

    Chris Becker bcsdata.net

  • Hi,

    @Noel: I know that it's only pausing the subscriber, but that is exactly what we need to get the "photo" of the system as it is.

    @chris-2: The problem with your suggestion is we would basically have to drop the entire replication and reinitialise it (takes more time than the "pause and play" method). The view of our Project Manager and Company Board is that we have to have the entire CRM System in the Datawarehouse so that we can do "live reporting" without slowing the real CRM System, as well as doing the "normal" Datawarehouse stuff.

    My plan still doesn't seem to work though. I changed my start and stop procedures to use sp_update_job, but when the job runs at 23:00 through a schedule the subscriber agents still seem to be running. However, If I start this scheduled job by hand it works as it should pausing the agents running the fact table creation and restarting the agents afterwards.

    The strange thing is, when the scheduled job runs, it marks the repl agents as deactivated so they really shouldn't run any more.

    Am I missing something here or does this behaviour sound wrong to anyone else?

    Regards,

    WilliamD

  • You don't have to drop entire subscription. YOu can drop a subscription for 1 table at a time - then add back in. Run snapshot agent, this only creates bcp for 1 table.

    -- drop table

    sp_dropsubscription @publication = 'PubName'

    , @article = 'MyBatchTable'

    , @subscriber = 'all'

    sp_droparticle @publication = 'PubName'

    , @article = 'MyBatchTable'

    -- RUN BATCH JOB ON TABLE

    -- add table (drop & recreate table)

    exec sp_addarticle @publication = N'fxDB6_Pub', @article = N'MyBatchTable', @source_owner = N'dbo'

    , @source_object = N'MyBatchTable', @type = N'logbased', @description = N'', @creation_script = N''

    , @pre_creation_cmd = N'drop', @schema_option = 0x00000000080350DF, @identityrangemanagementoption = N'none'

    , @destination_table = N'MyBatchTable', @destination_owner = N'dbo', @status = 8

    , @vertical_partition = N'false', @ins_cmd = N'CALL [sp_MSins_dboMyBatchTable]'

    , @del_cmd = N'CALL [sp_MSdel_dboMyBatchTable]', @upd_cmd = N'SCALL [sp_MSupd_dboMyBatchTable]'

    GO

    --IDENTITY @identityrangemanagementoption = N'manual' / @schema_option = 0x000000000803509F

    exec sp_addsubscription @publication = 'PubName'

    , @article = 'MyBatchTable'

    , @subscriber = 'SVRNAME'

    , @destination_db = 'fxDB6'

    , @sync_type = 'automatic'

    , @reserved='internal' --SQL2005

    GO

    -- run snapshot agent - will only create snapshot for MyBatchTable

    ChrisB MCDBA

    MSSQLConsulting.com

    Chris Becker bcsdata.net

  • Hi Chris,

    I got the idea about the tables being removed and brought back in. The problem is, the batch operations at night make large changes on 90% of the tables that are replicated. (Project Lead - "Can't reduce the number of tables being taken, we never know what we want to look at in the Datawarehouse")

    On top of this, we are running our CRM System 24 hours a day during December with the idea of maybe switching to 24 hours forever some time next year.

    We will need to look at this again in the future, and my idea of a pause and play for the 3 hours that we need would be the best way to go forward.

    Do you maybe have an idea why the job works properly when I start it by hand in SSMS and why it doesn't work when scheduled?

    Thanks in advance

    Regards,

    WilliamD

  • Hi again,

    found the problem.

    A scheduled job will only run if the schedule is active. -- Bugfix already reported to M$ 😉

    One day when I'm a big boy I would like to be a Database Developer..... or maybe (looking at my schoolboy mistakes) just a dustbin man!

    Regards,

    WilliamD

  • Hi again,

    just a quick update about this.

    The job ran successfully last night, the replication paused and ran again after the fact table generation completed. All in all the system was happy too!!!

    I managed to time the job so that the big batch run in the CRM system starts half an hour after the replication is running again. The agents only needed 8 minutes to catch up with the CRM system instead of the 2-3 hours that it used to be..... WOOHOO

    Regards,

    WilliamD

  • If you are using Enterprise Edition you could create a snapshot of the subscriber, load the data warehouse from the snapshot, and drop the snapshot when the load is finished. No need to fiddle with replication.

  • Hi Scott,

    The replication is used for two reasons:

    1. To offload live reporting onto a separate machine.

    2. To use the replicated data to create our fact tables in the Datawarehouse.

    The first point requires transactional replication to keep the offloaded data up to date. The second point uses the transactional replication data because it is not practical doing snapshots of four DBs that are around 45GB each and growing fast.

    We have tried a few methods and found the transactional one to be the best.

    Thanks for all the tips, if there is anything you'd like to add, please do so. I am more than happy to learn from you all!

    Regards,

    WilliamD

  • I think I understand what you're trying to do, but I'm not sure you understand what kind of snapshot I'm talking about. A database snapshot is not a replication snapshot, this is a new feature in SQL 2005.

    If you create database snapshots on the subscriber it doesn't copy the entire database. For the life of the snapshot, every page in the original database that is changed for the first time will first be copied to the snapshot. Queries on the snapshot will be filled from the original database for pages that haven't changed, and from the snapshot copy of the original for pages that were changed.

    So the snapshot starts off with very little space required, and will grow as the replication activity continues during the data warehouse ETL. The snapshot size could grow to the original database size only if every page in the database is being rewritten while you're trying to build the fact tables. If you want a static copy of the database for the duration of the load, while every page is being modified, and you don't have extra disk space for a database snapshot or the time for a new repllication snapshot then you have what I would call incompatible requirements.

    I would think you'd rather find the disk space on the subscriber to hold the snapshot than find the space on the publisher to hold all the log data that will pile up while you have replication turned off. Maybe there's a way to put the burden on the distributor, but I've only used push subscriptions with the publisher & distributer on the same server.

  • Hi Scott,

    now I see where you're coming from.

    I can only suggest this approch to the Project Lead, see what he says.

    "Incompatible requirements" is the best phrase that I have read whilst working on this project that describes our whole internal Datawarehouse philosophy! I'm sure I'm not the only IT worker who experiences getting a project dumped on them where decisions made months beforehand now rear their heads and cause the most difficult working environments...... $$$$ it, not worth getting wound up.

    The Datawarehouse server here is the distributor and subscriber for the replications, so the commands are sent from the CRM system to the Distribution database on the Datawarehouse where they get stored while the repl. is paused. So the space problem in that direction is not a problem.

    Thanks for the help

    Regards,

    WilliamD

  • GermanDBA (12/5/2007)


    Hi again,

    found the problem.

    A scheduled job will only run if the schedule is active. -- Bugfix already reported to M$ 😉

    I don't think that's a bug. One job may have several schedules, some that you want to run, and some that are disabled.

    For example, I have some jobs that have a weekday (Mon-Fri) schedule, and a weekend (Sat,Sun) schedule. Although the job itself is enabled, I may want the weekend job to be disabled and not run.

  • Hi Homebrew,

    Sorry, I think you misunderstood.... sarcasm is not an easy thing to display when writing on a message board.

    I only meant that I made the mistake of not activating the schedule, but because it is a user error the blame is definitely by M$ for making me look stupid! ergo Bugfix request at M$

    Regards,

    WilliamD

Viewing 15 posts - 1 through 15 (of 18 total)

You must be logged in to reply to this topic. Login to reply