Best way to graph/draw an ETL data flow

  • I am re-posting this stackoverflow question on behalf of a poster -

    I found the following Version 1 in one of our detailed design documents. If I were going to create it, I would have probably displayed it as shown in Version 2

    However, I'm thinking that there probably is a more common standard way of showing the flow of data by programs acting on it and that, while I prefer version 2, it probably is unnconventional and non-standard.

    In Version 1, programs are shown as boxes. In Version 2, programs are depicted as arrows indicating that they are moving data from one table to another.

    Q: What is the better more accepted way? Sequence Diagrams? If Sequence Diagrams were used, I think it wouldn't show the flow of data from one table to the next as well...

    Version 1:

    Version 2:

  • PLEASE stop double posting on this site. Many of us monitor all forums on this site and your duplicate posts just split up answers.

    No replies here, please. Please see the duplicate thread at the following URL.

    http://qa.sqlservercentral.com/Forums/Topic1532241-363-1.aspx

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.
    "Change is inevitable... change for the better is not".

    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)
    Intro to Tally Tables and Functions

  • Sorry about that. I put a link to this post in the duplicate post. Is that okay ?

  • I'm developing my first ETL system, so take this with a grain of salt.

    The way I planned my ETL system was according to "The Data Warehouse Toolkit" by Ralph Kimball. Many developers have recommended the book and just from reading/applying, I feel it's a great guide to planning my ETL systems.

    I broke down my high-level workflow of the ETL system similar to your approach. This did not include every process of the entire workflow. This was a general direction of the path that I wanted to take. That's what I kind of pick up from your second diagram. There feels like missing process from STAGING and SCRUBBING that's not outlined in the flow.

    When I had that completed, I began working on a more detailed flow. This included every process in that flow to not only ensure I had a blueprint of what I was about to design down to the last procedure, but a guide for the next developer who may work on my system.

    From there, I designed the work flow for each of those processes. For example, if I had a procedure call for scrubbing data, then I ensured every step of that process was designed in a separate flow.

    So, to answer your question and summarize. All of the above. I had a high-level overview of the work flow (like your second diagram), a granular view of that same work flow and drilldowns of every process within that workflow. That way management could view the high-level overview where they may not understand all the specifics, developers could view the more detailed overview and drill down to see the more specifics on data flow for each process if needed.

Viewing 4 posts - 1 through 3 (of 3 total)

You must be logged in to reply to this topic. Login to reply