Syncing data between two databases

  • Hi all,

    Hoping for some help with a SQL Server issue. I'm sure what I want to do is possible with DTS, scheduled stored procedures or a combination of both, I just don't know exactly how to do it. I don't necessarily need a hugely detailed answer (although the more detail the merrier) - a conceptual answer would probably suffice and I can figure out the specifics.

    Anyway, on to the problem:

    Our website uses a SQL Server 2000 database. As this database has limited space available, I need to archive older data to another database on our local SQL Server before removing the data from the website database. I'd like this process to happen automatically, once a day. The database is reasonably complicated, with about 30 tables and > 100 stored procedures.

    So, essentially I need to be able to automatically do a non-destructive sync of the website database to the local database, and then automatically delete older data from several of the tables in the website database. The older records that need to be deleted after doing a sync are date/time stamped so their age can be identified by that. There is an additional complication that the database contains lots of tinyint, smallint, int and bigint identity columns, so their values need to be preserved during the sync.

    Is this possible? If so, how would I go about doing it?

    Thanks very much for any help anyone can provide.

  • Hey Matthew,

    You've got a few options really.

    1) Log Shipping

    2) Replication

    3) DTS/Stored Procs

    Replication and Log Shipping will basically provide a direct copy so you would have two copies.  You would then need to remove data from the website production db, so you would need to add a step to the stored procs to cater for this.

    You could build your own custom DTS to move data between the two servers and delete data based on a timestamp.  It's pretty simple to do actually.  I'd wrap it all in a transaction (the insert to the backup server and the deletion from the prod server) to ensure the whole process completes as one batch.

    Hopefully that will get you started, but please post back if you have any other questions.

    Clive

  • The definition of Database complexity varies depending on your point of view. We have one database that has 400+ tables no stored procedures and no foreign key constraints. However, there are literally hundreds of constraints applied via the client application. Slightly more complex wouldn't you say

    You don't mention the sort of data volume, though if you're short on space and there are a lot of int type columns, I'm betting you've got millions of rows in a table or two. If this is the case you'll probably want to archive data out in small batches to limit any adverse affects on the website.

    The approach you take will also depend on the structure of the database. To give you an idea I'll simplify one of our website databases for an example.

    Four tables in a hierarchy - Users, Accounts, Sessions, History

    Each of these tables has a foreign key constraint to the table above it in the hierarchy and a datetime stamp for when the record was created.

    This makes the archival process a very simple 4 step process.

    1. History - archive any records over a month old.
    2. Sessions - archive any records without History records.
    3. Accounts - archive any records without Session records.
    4. Users - archive any records without Account records.

    Each step is wrapped in it's own transaction and has it's own stored procedure. Basically, the stored procedures consist of an INSERT statement to copy records to the Archive database and a DELETE statement to remove the records just copied.  The whole process is run weekly by a scheduled job.

    --------------------
    Colt 45 - the original point and click interface

  • Hi guys,

    Thanks for your help so far, really appreciate the responses.

    Firstly, to clarify the volume of data - it is not that great. The two tables which are going to require regular archiving have about 350,000 and 150,000 records respectively (and growing at a rate of about 70,000 records/month and 30,000 records/month respectively). The database in question is actually quite small, but we have very limited space available.

    Secondly, I'm pretty sure the website database server is SQL Server Standard Edition, not Enterprise, so Log Shipping is out. I'm pretty sure Replication doesn't require EE, but I doubt it would be enabled/allowed as the server in question is a shared SQL hosting server and what you can and can't do is quite restricted. So, that leaves DTS and/or stored procedures.

    From the sounds of it a DTS approach may be best, as AFAIK you cannot connect to another SQL server inside a stored procedure (which we'd need to do to suck the database from one server to another with a select / insert - correct me if I'm wrong.

    My knowledge of DTS isn't great - I'm a much better DB designer /programmer than DBA. I have used DTS before, but never for anything more complicated than making single copies of databases between servers.

    I can see options in the DTS Wizard to copy data from one database to another while appending data, but without any sort of date control - how do I go about setting it up so that it only copies data between the last sync and a specified date/time? Also, how do I use DTS to delete the archived data from the website?

    Again, really appreciate any help provided, thanks.

  • DTS Package is used for transforming the data and copying data from

    source to destination .It is not used for deleting the data and more over there

    will be no date control.So I think it would be better to use a Back-up and Restore..

    ..hema

  • Hema, A Backup and Restore won't be of any use in this situation because the data will still reside on the original server. That being the case you might as well use DTS then

    Mathew, as you're more of a " DB designer /programmer ", stick with what you know and works best. You can knock up a couple of stored procedures to do the data transfer without too much fuss. If you can setup a linked server that would be optimal, if not then take a look OPENROWSET or OPENDATASOURCE.

    For the dates you could take the Minimum date in the table as the "From Date". You could also maintain a checkpoint field in a seperate table if that works better for you. Then depending on how much data you want to keep, subtract x number of days off the current date to get the "To Date".

    Given the relatively small volume of records, you could get away with archiving each table in one go without having to worry about batching. 

     

    --------------------
    Colt 45 - the original point and click interface

  • Hi Matthew,

    You can reference different SQL Server boxes from a stored procedure but you will need to have set up a Linked Server.  I don't know if this option is available to you?

    Is there a column in your table structure that will hold the timestamp information or will this be a new column on the archive server?

    If the table structure will be different and there will be a new column for the timestamp, simply add a default value to that column.

    If the table structure is the same, on the data transformation, you could remove the transformation for the datefield and add set the destination column to default value.

    Clive

  • I am not sure what Hemasunder means, you can Delete, Copy etc in a DTS and add in Date control no problem.

    If I was writing a DTS for what you describe its structure would probably be something like:

    1) Create temp tables matching the tables I want to move

    2) Get the latest date(s)/ID(s) from the Destination DB

    3) Copy any Later data from the source to the temp tables

    4) Copy the temp table data to the Destination tables (using a transform task is easiest)

    5) Delete old data from source tables

    This is actually pretty simple compared to some of the things I`ve had DTS`s do, they are a particularly powerful part of SQL Server that I think is somewhat under used.

     


    "Don`t try to engage my enthusiasm, I don`t have one."

    (Marvin)

  • Given the recent rash of next to useless posts by Hemasunder, I'd say they're just trying to increase their post count.

     

    --------------------
    Colt 45 - the original point and click interface

  • If you do decide to use replication, I beleive that if you delete out records from the publisher, that deletion will be replicated to the subscriber. 

    I am in agreement with the others that DTS with stored procedures is a good approach. What we generally do is insert a set of records into another database, backup that database (we use Litespeed for this), pull down the backup, restore the database locally, and merge records using a stored procedure to get all the records that are not currently in place. The backup database generally is limited to x number of days depending on the table.

    Cliff

  • Thanks for the tip on linked servers, OPENROWSET and OPENDATASOURCE - with that info I should think I can what I want to do with stored procedures.

    Does OPENROWSET/OPENDATASOURCE require a DSN to be setup, or can the communication be done directly over TCP/Ip (specify sql.domain.com and a database name)?

    Once I have the stored proc(s) written, I assume I then need to schedule them in a Server Agent job?

    I will also need to guard against failures - essentially, don't do the delete from the website DB unless the initial copy succeeded. What is the easiest way to do this - seperate stored procs, and then use the job option "On failure action:" to stop the job if the first step fails?

    Thanks again for the help.

  • Thanks for the linked server tip, will check it out. I'm not sure if it is available and can't check right now, but I now also have OPENROWSET/OPENDATASOURCE as backup options.

    To answer your question, one of the main two tables I need to archive has a timestamp column, but the other doesn't. However, the data in the two tables is linked (primary key in one is foreign key in another), and it should be possible to deduce which records I need to archive from the non-time-stamped table based on that relationship. I say should as it's a little more complicated than I describe - if I can't, I'll just add a timestamp to the currently-non-timestamped table.

  • Thanks for the info - DTS isn't my strong point.

    Are the five steps you describe the actual five steps you'd add to the DTS package, or do they need to be broken down further? I take it I need to create the packages manually to do things at this complexity level - no more wizard 🙂

    Also, why do you copy to a temp table first, rather than a direct copy? Is it simply a data protection thing - move the data to a disposable table locally first, and then move it to the "real" local DB over a much-more-reliable intra-server copy?

  • Dupe post, sorry.

  • What's the reason for doing it via a database backup, rather than a linked server (or something similar) copy?

    Regardless, this approach won't work for us unfortunately - as I said earlier in the thread the website DB is on a shared SQL server we have very little control over - definately no access to the backup files on the server's disk.

Viewing 15 posts - 1 through 14 (of 14 total)

You must be logged in to reply to this topic. Login to reply