ETL process SSIS or VB.Net

  • Currently we do not have a standard in place for what platform to develop ETL processes. Some developers have been using VB.Net to rewrite old dts packages. I and other developers have been using SSIS to rewrite dts packages. It has come to a point of which would be more efficient and the reason we should choose one over the other.

    Thanks πŸ˜€

  • gnewkirk (10/23/2009)


    Currently we do not have a standard in place for what platform to develop ETL processes. Some developers have been using VB.Net to rewrite old dts packages. I and other developers have been using SSIS to rewrite dts packages. It has come to a point of which would be more efficient and the reason we should choose one over the other.

    Thanks πŸ˜€

    The BI end in SQL Server 2005 use VB but SQL Server 2008 uses both VB and C# so it depends on your needs because you can consume C# for one task and consume SQL in another step that is almost like ADO.NET using the SSIS pipeline instead of data access.

    Kind regards,
    Gift Peddie

  • Hi,

    If I understand properly - you dont mean what langauge to use in SSIS but rather whether to go with pure .NET vs SSIS. Am I correct.

    We are a pure .NET shop and dont use SSIS. I am in constant deliberation as to whether we should use a commercial ETL tool (SSIS).

    for us, the main reason for not using a commercial ETL tool is because it’s abstract from a specific SQL server product version release (i.e. like dts which was re-tired) and simply the fact of its flexibility. It is appreciated that SSIS can make for quicker application development and has specific modules which are require for which are very helpful but we do implement re-usable custom class code which goes some way to balancing this out. My view is that a properly architected .NET solution is easier to deploy, more reliable, more flexible, easier to understand and can also achieve very good performance.

    I would be very keen to hear POV here.

  • You are correct. What I am trying to ascertain are people's point of view on whether to use SSIS or develop their own code from .Net Framework for ETL processes and the reasons.

    You have made some good points about using .Net. You must of spent some time creating those common classes that everyone could use for your ETL processes.

    My POV is use the tool that was developed for that purpose. Why should I go and spend time developing an ETL tool when one already exists. SQL Server used DTS now SSIS for ETL processes. SSIS is much more flexible then DTS. Granted moving packages from DTS to SSIS is daunting, so was migrating code from vb6 to vb.NET.

    SSIS has a lot of strengths to it; going across servers to access data, insert the same data in multiple destination, execute SQL statements, execute tasks outside of SSIS, using VB.Net as the scripting language, view your data through the data flow process, built in aggregate functions and a lot more.

    I guess a more relevant subject would be when to use SSIS. Thanks for your reply.

  • gnewkirk (12/15/2009)


    You are correct. What I am trying to ascertain are people's point of view on whether to use SSIS or develop their own code from .Net Framework for ETL processes and the reasons.

    You have made some good points about using .Net. You must of spent some time creating those common classes that everyone could use for your ETL processes.

    My POV is use the tool that was developed for that purpose. Why should I go and spend time developing an ETL tool when one already exists. SQL Server used DTS now SSIS for ETL processes. SSIS is much more flexible then DTS. Granted moving packages from DTS to SSIS is daunting, so was migrating code from vb6 to vb.NET.

    SSIS has a lot of strengths to it; going across servers to access data, insert the same data in multiple destination, execute SQL statements, execute tasks outside of SSIS, using VB.Net as the scripting language, view your data through the data flow process, built in aggregate functions and a lot more.

    I guess a more relevant subject would be when to use SSIS. Thanks for your reply.

    There is nothing in .NET that compares to even DTS so I don't understand what you are talking about and I am talking as a .NET expert. ADO.NET comes with the bulk copy class but if you know what we do with DTS and now SSIS in the Asp.net response object then you may not think .NET tool you develop can compare to either DTS or SSIS for ETL job of any size.

    Kind regards,
    Gift Peddie

  • Hi,

    Being an avid developer of both toold (VB.NET and SSIS) I'm one to use the tool that best suits the need. I think that SSIS should be used for ETL as that what it was designed to do. Can one do it in VB.NET or C#? Sure, but in most cases and because of how SSIS handles data movement, it is almost always more efficient for moving and transforming data.

    Programming to me is meant for developing applications so that what I uses VB.NET and C# for.

    Thanks,

    Strick

  • I think you also need to look at the skill set. In our organization we have several individuals who can write SQL code to extract a result set and dump it out to a file using the GUI in SSIS. They picked up the necessary SSIS skills in about a week. They have very limited to no .Net skills but they are experts in their business areas.

  • da-221236 (12/15/2009)


    Hi,

    If I understand properly - you dont mean what langauge to use in SSIS but rather whether to go with pure .NET vs SSIS. Am I correct.

    We are a pure .NET shop and dont use SSIS. I am in constant deliberation as to whether we should use a commercial ETL tool (SSIS).

    for us, the main reason for not using a commercial ETL tool is because it’s abstract from a specific SQL server product version release (i.e. like dts which was re-tired) and simply the fact of its flexibility. It is appreciated that SSIS can make for quicker application development and has specific modules which are require for which are very helpful but we do implement re-usable custom class code which goes some way to balancing this out. My view is that a properly architected .NET solution is easier to deploy, more reliable, more flexible, easier to understand and can also achieve very good performance.

    I would be very keen to hear POV here.

    With arguments like this, many people would ask why MS wasted all these resources and time to create SSIS.

    Here is my opinion about the custom .NET code PROS you have listed:

    easier to deploy - not true

    more reliable - not true

    more flexible - true

    easier to understand - not true

    good performance - not true

    Now here come the CONS you didn't mention:

    - harder to maintain

    - more time to design properly and implement, meaning more expensive.

    - less documentation and people skilled enough to understand and use

    - need programmers to maintain and use. SSIS at least have the possibility of being useful to administrators.

    ---
    SSIS Tasks Components Scripts Services | http://www.cozyroc.com/

  • CozyRoc (12/16/2009)


    With arguments like this, many people would ask why MS wasted all these resources and time to create SSIS.

    Here is my opinion about the custom .NET code PROS you have listed:

    easier to deploy - not true

    more reliable - not true

    more flexible - true

    easier to understand - not true

    good performance - not true

    Now here come the CONS you didn't mention:

    - harder to maintain

    - more time to design properly and implement, meaning more expensive.

    - less documentation and people skilled enough to understand and use

    - need programmers to maintain and use. SSIS at least have the possibility of being useful to administrators.

    True - but then why is SSIS and SQL server 2005 is rubbish at importing and exporting CSVs.

    Perhaps I agree on the easier to understand point and a couple of your cons (less documentation and people skilled enough to understand and use, need programmers to maintain and use.). Dont agree with you on the "Easier to deploy" bit. Not sure what this means: SSIS at least have the possibility of being useful to administrators"

    The main point here is that .NET is it abstract from a specific SQL Server product version release (i.e. like SQL Server 2000 DTS which was retired and then the fact of having to migrate 400 dts packages on to newer platforms). And the main reason - flexibility. Each to their own - depends on company and function and skill set in house.

  • True - but then why is SSIS and SQL server 2005 is rubbish at importing and exporting CSVs.

    Perhaps I agree on the easier to understand point and a couple of your cons (less documentation and people skilled enough to understand and use, need programmers to maintain and use.). Dont agree with you on the "Easier to deploy" bit. Not sure what this means: SSIS at least have the possibility of being useful to administrators"

    The main point here is that .NET is it abstract from a specific SQL Server product version release (i.e. like SQL Server 2000 DTS which was retired and then the fact of having to migrate 400 dts packages on to newer platforms). And the main reason - flexibility. Each to their own - depends on company and function and skill set in house.

    Every solution has limitations. SSIS supports CSV import/export, but I think you are referring to the dynamic data flows where you can support variable number of columns. If that is the case, I agree it is limited and CozyRoc does in fact provide solution for this limitation.

    Regarding the deployment in SSIS, if you don't use third-party components the only thing you have to do is:

    - Install SSIS on your server. This is usually pretty straightforward and is done once.

    - Deploy your DTSX package to the server - either file or import in SQL Server.

    Deployment of third-party SSIS components is not so complicated. You have to deploy in GAC and deploy in a couple of SQL Server sub-folders. This is it!

    When I said SSIS is possible to be used by administrators is that there are certain pieces like the scripts, which require programming skills. But many common tasks can be accomplished visually and doesn't require programming.

    The SSIS is complete rewrite and you can regard it as totally different application compared to DTS. MS didn't design SSIS to be compatible with DTS in any way. It tries to solve similar problems, only better. This is probably one of the main reasons why so many companies are still running their DTS packages. None is forcing you to move to another platform and you can even run your existing DTS packages side-by-side with SSIS packages.

    ---
    SSIS Tasks Components Scripts Services | http://www.cozyroc.com/

  • Head Banging!!!

    You guys danced around the issue but I didn't see where you got to the point! Bottom Line is Money$$$$ and Impact to Organization

    As the Boss Man what's "rarely" in charge of my programming group (which is rarely amongst programmers/developers); proper tool for the proper job is my RULE.

    As a Business Manager sometimes, I hate to say this, but I pay 1/2 the salary for an SSIS ETL Developer and the code and tool are "exposed" for easy revision "on the fly." I don't need a fancy VB.Net or C# program to 1) Move Data, 2)Transform Data 3)Load Data or more important "Display Data".

    If I use VB. NET and C# for ETL and this guy quits, gets mad, or goes on to "Bigger and Better" VB.NET/C# Salary, I'm Hosed!!! If I had a nickel for every time it happened πŸ™‚ I would have retired 10 years a:sick:go. Here's something you need to appreciate; we hired a Gentleman in 1994, we sent him to school and he developed Apps in VB and he's still with us. When he passes on we will have a severe impact. He was 57 at the time, he's now 72 and "most" of his "CODE" isnt' documented. It's in the process of it but as you can see this is critical. No one wants to be held hostage by their programmers but it happens so very often.

    Programming languages are made for programming just like the one used to create SQL, SSIS and SSRS in the first place. I need VB.NET/C# programmers to provide me strong tools (Classes, Modules, Function etc..) things that are not on the shelf at the market.

    Yes you can use VB.NET/C# to do ETL but only under these conditions: 1) Source data never changes type or fields are not rearranged frequently. 2) Little validation and manipulation to ensure data integrity is adhered to "within" the program (no critical delivery issues, as in, all or nothing processing) 3) Output conventions remain with design change that is destination table or Excel or ASP or SharePoint etc.

    Hope that helps.

  • Hey Jack, I definately agree with your view point! Right on the Money!!!

  • No tool or method is perfect.

    There are other tools and methods instead of using SSIS or c#:

    Take a look at BCP - been around for a long time, included with SQL Server.

    Its very fast, can import or export using delimited files or fixed width by creating a simple format file.

    BCP can be run from a shell script or can be run from T-SQL using xp_cmdshell. (Note that other OS commands like ftp, copy are available).

    Once you get the data imported into a table, you can write transformation code in SQL. This can be a lot faster than trying to do tasks like data cleaning in SSIS. This technique has been referred to as ELT (Extract, Load, Transform)

    Bulk Insert is another way of loading data into SQL Server.

    Choosing the correct tool may be dependent on the skills of the developers available to the company. Good rule is to try to keep it simple.

    I have written SQL scripts using BCP for years. One advantage for all of my clients is that they never had to convert DTS scripts to SSIS scripts using these techniques as the code runs fine on SQL Server 2000, 2005, and 2008. Anyone with decent SQL skills would be able to read and modify my scripts.

Viewing 13 posts - 1 through 12 (of 12 total)

You must be logged in to reply to this topic. Login to reply