General ETL question - ETL from unknown number of tables ?

  • AFAIK, ETL integrates data from a fixed number of sources/tables. Is it possible that ETL can integrate data from an unknown number of tables which might be added in the future, given that these tables might be of different types (customer, shipping, employee, accounting etc.) ? Are there any solutions in SSIS which handle these kinds of scenarios ?

    EDIT -

    Let A,B,C be tables. We can have the following ETL scenarios - A + B + C = 1 row or A + B + C = 3 rows

    I have the first one. Later, we can add tables D,E,F,G...etc all of which might be different from each other in terms of the data they contain.

  • I would recommend looking into BIML. There is a Stairway for it.

    Doesn't do exactly what you are asking, but probably does what you want in a better way.

  • Nevyn (1/14/2014)


    I would recommend looking into BIML. There is a Stairway for it.

    Doesn't do exactly what you are asking, but probably does what you want in a better way.

    Can you please tell me how BIML could be helpful in this case ?

    Thanks.

  • this is sort of similar to a question i asked the other day:

    http://qa.sqlservercentral.com/Forums/Topic1530825-363-1.aspx

    i wanted to grab 1-N queries and use SSIS to send them to a multi sheet excel document;

    because my SSIS familiarity is limited, i ended up doing it in a programming language instead, as it seemed to me that SSIS just introduced an extra layer to something i already knew how to do.

    anyway, this project on codeproject grabs every table in a database and sends them to an excel worksheet;

    i would think that the basic logic, of getting the list of tables, and dynamically defining and then doing something with them might be similar enough to investigate:

    http://www.codeproject.com/Articles/301542/Creating-Excel-File-and-ExcelSheets-dynamically-fr

    Lowell


    --help us help you! If you post a question, make sure you include a CREATE TABLE... statement and INSERT INTO... statement into that table to give the volunteers here representative data. with your description of the problem, we can provide a tested, verifiable solution to your question! asking the question the right way gets you a tested answer the fastest way possible!

  • Lowell (1/14/2014)


    this is sort of similar to a question i asked the other day:

    http://qa.sqlservercentral.com/Forums/Topic1530825-363-1.aspx

    I think you might have wanted to put a different question. That one is mine.

  • A simple queue system should work for you. There are plenty of SSIS queue frameworks on the internets. 😀

  • SQL Padawan (1/14/2014)


    A simple queue system should work for you. There are plenty of SSIS queue frameworks on the internets. 😀

    Please tell me more about this and how it could be helpful in my situation. Google search for SSIS queue gave me many confusing terms. Thanks.

  • blasto_max (1/14/2014)


    AFAIK, ETL integrates data from a fixed number of sources/tables. Is it possible that ETL can integrate data from an unknown number of tables which might be added in the future, given that these tables might be of different types (customer, shipping, employee, accounting etc.) ? Are there any solutions in SSIS which handle these kinds of scenarios ?

    EDIT -

    Let A,B,C be tables. We can have the following ETL scenarios - A + B + C = 1 row or A + B + C = 3 rows

    I have the first one. Later, we can add tables D,E,F,G...etc all of which might be different from each other in terms of the data they contain.

    Basically, SSIS out of the box cannot handle this. SSIS needs fixed metadata while designing the package.

    You can work around this by scripting either in .NET tasks or by using dynamic SQL. In those cases SSIS is simply reduced to a framework for executing scripts.

    BIML is a viable option because it can generate SSIS packages on the fly using metadata. All you have to do is put metadata about the to be loaded table in a table. BIML will take this metadata and use it to construct a package. Of course you need the write the BIML script, but there are excellent tutorials on the net.

    Need an answer? No, you need a question
    My blog at https://sqlkover.com.
    MCSE Business Intelligence - Microsoft Data Platform MVP

  • blasto_max (1/14/2014)


    SQL Padawan (1/14/2014)


    A simple queue system should work for you. There are plenty of SSIS queue frameworks on the internets. 😀

    Please tell me more about this and how it could be helpful in my situation. Google search for SSIS queue gave me many confusing terms. Thanks.

    Well..I am describing a queue where work is queued (pushed) and de-queued(pop'ed). Typically you have metadata in tables for things such as Packages,Package Order, Sources, etcs... Then you usually have something called a queue table (ie dbo.WorkQueue). Work is selected from this table and execution results are logged. As you add packages to the system(package creation can be done through BIML) it can be added to the metadata tables through stored procedures. I am basing this off the fact that you know how to create simple packages to move data from source A to B with a data flow.

    A good and excellent resource: https://www.simple-talk.com/sql/ssis/ssis-and-parallelism-the-unseen-minions[/url]

Viewing 9 posts - 1 through 8 (of 8 total)

You must be logged in to reply to this topic. Login to reply