Merging individual databases into a database

  • I was thinking how to enter the title for this subject and came up blank so here it goes.

    Case:

    We have around 90+ databases, average db size 700 MB, and we have been merging some of the data in them into an individual database to provide a special service for our clients.

    Access to the individual databases was set, so only the particular users can access each database.

    So far, we have been merging the data with no problem. Now, because of the amount of data being merged, the time to complete this task takes around 24+ hours, so the data in the production system is one day off (data from Monday will be shown in the production system Wed. morning - 4:00 AM.)

    Of course we could not do this merge in the production system, so we are using a secondary machine to run the update into a repository system. After the task has being completed, we back up and restore this database into the production machine.

    So the question is, and I hope some one out there can give me a better way to do it, and cut the time to complete the task.

    We have been researching this issue for quite some time and came up with a few options.

    First. Run the update (like a BCP) into a plain db, no PK, FK, indexes, etc and then, rebuild all at the end of the update process.

    Second. Change the RAID 5 to RAID 0. This could speed up the process, but how much time we will be taking off? Because we are merging the data from other systems, we are not worried about data integrity because all checks are done at the 90+ databases, well worry yes but not much becasue we can always go back to the individual db's to make the changes.

    Third option. Upgrade the server, better yet buy a machine with more power tha the one we are using at this time. The current server has two 900+ processors, 350 GB, 2 GB of RAM, RAID 5. We thought we could bring a new machine with the following specifications: 8 processors 1 GB each, 580 GB, 36 of RAM. Could this powerful machine speed up the process to cut the time of the whole process? Inveting the amount of money the machine could cost will not make any sense if the process remains basically the same. Let's say, if we cut the time from 24 hours to 20, it won't justify the cost. If this improve the proces and the time is cut to, just a number, 12 hours. It will be a cost we will be willing to take in order to improve client service.

    I use a compare utility from Red-Gate, which I really recommend, but it only works for db's and not so large tables (one of our tables has 113+ millons records and the size of the db is a little over 50 GB) So there is no option to use this utility to make the changes.

    Sorry for writing this very long request, but there is no way I could explain the whole process easier. I just hope to find a better solution and improve our process. Thank you.

  • Instead of restoring the database to the Production server why don't you think of settingup some kind of replication.

  • Thank you.

    That was my first approach. Log shipping or replication, but the problem is that some databases come in their original format and then translated into SQL.

    There are others that, are being received in a ZIP file and these ones are full database and then translated into SQL.

    Some others come from different vendors and they come as flat text files.

    These individual databases has their own PK and constraints.

    In the merged database, I need to add a code column to differenciate one record in one system from another in another client system (It could be case 0010003 in North Carolina and case 0010003 in Chicago.

    To make it short. Replication or log shipping could be the best option, but in our environment, the replication or logshpping didn't work before.

    In any event, your suggestion is greatly appreciated.

  • We have saying where I come from: "You can't put 10 pounds of S**T into a 5 pound bag."

    Processing 50Gb of data is going to take some time, no matter how you slice it. RAID 0 will probably INCREASE your elapsed time, not decrease it. With this much data, you are I/O bound. I would say 2 or more smaller (read: cheaper) servers with faster internal data paths would give you better throughput.

    I think you're on the right track by exporting them all to flat files, combining into a single source and then bulk loading as one table. You need to provide as much parallel I/O as you can. A multi-CPU machine spends more time coordinating the multiple CPUs. 8 single servers would give you better through put.

    Your only challenge then is the network to pump all the data to a single location. Consider compressing the files on the source and expanding them at the destination.

    It would be interesting to hear your solution.

Viewing 4 posts - 1 through 3 (of 3 total)

You must be logged in to reply to this topic. Login to reply