To Replicate or Not To Replicate. THAT is the question

  • I have an upcoming project. I need to move a small amount of data from 200 remote SQL Servers to our central office SQL Server. The server and all the sites are all connected via a decent VPN, and all the remote sites have "linked server" connections established back to the central server.

    The data I need to move is very small. One table, each row is less than 200 bytes, and I would expect each remote site to send less than 100 of these rows per day.

    I'm just not sure which direction to go. My last project, was very similar, except it was only a 60 day promotion sort of thing. (This new one is permanent). It all went great. I created a simple sproc that sent the data from the remote sites to the central server using linked servers and 4 part names. I wrapped that sproc in a scheduled job and pushed it out to all the sites. Never had a single problem.

    My gut feeling here is that I should be using replication for this, but some say that 200 publishers/subscriptions is going to be a nightmare. I'm new to replication. I've studied it very carefully, BUT this would be my first actual implementation! I've done some testing to prove that our remote sites can in fact replicate to the server. Testing looks good, both our SQL2000 sites, and our SQL2005 sites are able to replicate test data to the server.

    It would be very easy to just chicken out use my many years of TSQL skills/experience and the linked servers to code my way out of this one, but I'd like to use the right tool for the right job and add replication to my toolbox.

    Cheers!

    .

  • I had a friend that had a multi-client scenario where each client had their own database. They had a central database with some common information which was replicated to each client database. This was all on one server and they had over 250 databases being replicated to.

    Replication will handle this, and it will handle down connections and restart things, something your linked server/sproc process will not necessarily do.

    What I'd suggest is that you set up your linked server code if you are under pressure to get this done. Then implement replication between 2 servers and learn how it works. If you have issues, you can fail back to the linked servers, but if you get it working, you can slowly move things in that direction.

    I'd also recommend that you document the project on a blog. It's a good career milestone to work through.

  • Awesome, thanks Steve. I'm feeling much better about this adventure having heard that vote of confidence from "The Man"!

    One more question. The one thing I'm having trouble grasping is this: Can I have all my publishers replicate to ONE and only ONE destination table on the subscriber? After all, my goal here is "consolidation". I don't want to have a separate table for each publisher. According to most of the docs I've read, it seems like each publisher must publish to a separate table on the subscriber. All the documentation I've read seem a bit ambiguous on this point.

    .

  • Replication does not restrict you from publishing from many servers to a single destination table. All you need to worry about is being able to make sure that the primary keys on each source table are unique across all servers.

  • Thanks. I get that. The thing is, the data really wasn't designed with replication in mind. The primary keys are "int identitiies" on the publishers. What I really need is a way to inject a user defined column in the article before it leaves the publisher, OR I'm going to have to change the schema on the publishers for the sake or replication.

    ...unless someone has a better idea.

    .

  • Are you planning to get all the data (200 odd rows) once a day (something like day end for that day's entire records)

    OR

    planning to get the data into your central server as and when the data is inserted in the remote SQL Servers.

    Where will your distributor be?

    The reason is, it can be possible to think about SSIS to do such a task.


    Bru Medishetty

    Blog -- LearnSQLWithBru

    Join on Facebook Page Facebook.comLearnSQLWithBru

    Twitter -- BruMedishetty

  • My plan is to "pull" once a day. The Distributor would live on the subscriber. I was thinking the same thing, this might be a job for SSIS. I'm pretty handy with SSIS for import/export stuff locally, but I'm not sure how to go about using it for this job. Would I do a push or a pull with SSIS?

    .

  • I'm begining to think that perhaps Replication is not practical unless the data can be replicated "as is". Is that a fair statement?

    .

  • Yes, unless the data is needed in a near real time and the volume is large, there is no point in setting up replication.

    With SSIS you will be pulling the data from all the servers.


    Bru Medishetty

    Blog -- LearnSQLWithBru

    Join on Facebook Page Facebook.comLearnSQLWithBru

    Twitter -- BruMedishetty

  • I'm not sure I'm The Man, but thanks.

    I wouldn't necessarily discount replication. As Bru mentioned you can have issues with pushing this data to the same destination. There are a couple ways people handle this. One is set your identity ranges in each publisher to separate groups. So you could have 0-.5billion on 1 server, .50001B to .7001 B on another, etc.

    The other thing you can do is customize the query to send data and add a column that contains the name of the source server. I've seen people do this and it allows you to track back a row to the source.

    Is SSIS better? I like SSIS because it's contained, and if it fails, I am aware of it and can re-run things. It also isolates a server and THAT particular ETL process if there is an issue. However replication is automatic. If the link drops or there is some other issue, the source holds the rows and restarts sending when it can. It doesn't require any "date" columns or other tracking that you might need to build in SSIS.

    Personally I'm more of an SSIS guy for data transfers, but replication can work, and work well.

  • I appreciate all the help guys.

    So, sounds like I should probably save the replication for a rainy day. Like I said, last time I was in this situation, I wrapped a sproc. in a scheduled job, pushed it out to the sites, and it all worked great. It is slightly awkward pushing that stuff out and executing it on 200 different sites, but I can certainly live with that. Can SSIS make my life any easier in this regard? Seems like if I could push out a real SSIS package instead of a bunch of loosely coupled scripts, that would be the ticket. I have quite a few SQL 2000 servers out there, I'm afraid they might get in my way. Would I have to have a DTS package for the SQL2000 boxes, and a dtsx for the >SQL2000 boxes?

    .

  • You need not have DTS packages on the SQL Server 2000 boxes.

    Though, I am not completely aware of the internals of the data and how complex / simple it is (your data). I would plan to do a have SSIS package to pull the data from the remote SQL 2000 / 2005 servers. I would not be required to have individual DTS packages for each SQL Server, rather, plan accordingly and have 10 packages to pull data from 20 SQL server each. Again, it can be a single package also, but for better manageability, I would choose to break down into multiple (10) SSIS packages all running on the Central SQL Server.

    With this I don't need to administer / troubleshoot the packages on all the remote SQL servers.

    Of course, it needs more plan and workout than what I say, but I assume you got my point.


    Bru Medishetty

    Blog -- LearnSQLWithBru

    Join on Facebook Page Facebook.comLearnSQLWithBru

    Twitter -- BruMedishetty

  • Great idea, I had not thought of that. This would give me the truly centralize management I've been looking for. Thank you Bru.

    .

  • If you have the same schema, you could have one package that pulls data and a second package that passes in a server name and calls the first package. It could be as simple as 2 packages.

    I'd dig into SSIS a little and see what you think. There's a great book by Brian Knight and company to get you started, and you can also look here: http://qa.sqlservercentral.com/articles/Integration+Services+%28SSIS%29/

    or here: www.sqlis.com

  • As a matter of fact, I have the McNight book. In fact I have two of them. A fat one and a skinny one 🙂 (I'm a book junky). I've just been getting into SSRS for importing/exporting files. I hadn't really considered it for the heavy lifting. I've always jumped into Visual Studio and churned out C# code for that sort of thing, but again. I think I'm making more work for myself than I need to.

    SO, I took your advice today, and pretty much got all the major functionality for this new project coded into a SSRS package in an afternoon. VERY AWESOME, SSIS will be my first stop for all things data from now on.

    What is it with you "published" DBA types? You're into Karate, and McKnight's into ultimate fighting. I guess I'm not violent enough to ever get published! 🙂

    Thanks for all the help!!

    .

Viewing 15 posts - 1 through 14 (of 14 total)

You must be logged in to reply to this topic. Login to reply