Converting columns to rows..

  • I have an extract file that I am reading from a legacy system that contains the products that each client has in this format.

    account_id, product1,.....product30,support1,.....support30

    Each client can have up to 30 different product types and the option to pay for support for each product. My destination database looks like this:

    account_id, product_id, support

    In VB I could do this easily by looping through the recordset then iterating through each column of each row and performing an insert using that data.

    I am trying to learn some .net but I just don't get it so far. I can't figure out how to pass a recordset to a script task. Or how to connect to a database and execute a stored proc within a script task. The looping logic would of course be the same and easy.

    I guess I could have an ugly stored proc and pass the entire row of 61 fields to the proc but that is messy. Anyone have any ideas on the most elegant solution? If I could use VB and pass in a recordset I could do this in about 20 lines of code. The product can be different for each product field so this is not an unpivot situation.

    Thanks,

    Chris

  • If you import it in to a temp denormalized, then your renormalization process is simply a series of UNION statements

    Something like

    Insert myfinalTable (account_id, product_id, support)

    select account_id, product1, support1 from #temp

    UNION ALL

    select account_id, product2, support2 from #temp

    UNION ALL

    select account_id, product3, support3 from #temp

    ...etc...

    should be substantially faster than any looping process.

    ----------------------------------------------------------------------------------
    Your lack of planning does not constitute an emergency on my part...unless you're my manager...or a director and above...or a really loud-spoken end-user..All right - what was my emergency again?

  • How about using the unpivot component?

    Kindest Regards,

    Frank Bazan

  • That is an excellent idea Matt. Thanks. One more question. All of my logic right now resides in a single data flow. To be honest, I don't really understand the control flow or see how it is that useful. I mean, if you wanted to do the same data flow actions against multiple databases I see the usefulness but not otherwise. I wish I could call a stored proc that would create a temp table, bulk load the data, perform the insert that you described and then drop the temp table. But I don't think I can do that from the data flow. It seems that I would need to create a permanent table, do a row by row insert and then call a stored proc to do my final insert using the union select.

    I just don't understand why the bulk insert and other control flow options aren't just available at the data flow level. And I don't understand how you pass data out of the data flow to the control flow.

  • You're in SSIS! (sorry - missed what forum you posted to) That does change things a little. Frank is right - you probably should use the unpivot component. I was assuming you were hitting this from a stored procedure.

    You COULD use my recommendation - but within SSIS - it's actually harder to use it in that context.

    The thing to remember with SSIS is that during the data flow - you can conceptually think of each stage you connect as dumping data into a temp table. Meaning - it converts/merges/(whatever that stage does) the data, and makes the transformed stuff available IN ADDITION TO the original data.

    Each of the steps has error controls (so that you can control what to do when it screws up along the way). You connect the steps by dragging the green (success) or red (failure) arrows from the first step onto the next, and then from there to the next, etc... Again - you can also in each step go in a tell it what to do with a problem state (like - don't cause an error if something is going to get truncated).

    As far as creating the SSIS package (from scratch)

    - start by adding a new data flow, then right-click on it and Edit. this will flip you into the data flow view.

    - drag in a flat file object from the source. Edit that to open up your source file. Make sure that the data types being detected are correct: correct if need be.

    - drag in the UNPIVOT component from the transformations area. As mentioned before, drag the green arrow from the flat file source onto this thing.

    - right-click, edit the UNPIVOT transform. select all of the product fields and all of the support fields.

    put "PRODUCT_ID" as a literal next to each of the product_x fields, and put the num of the instance into the pivot value. Repeat the same process for the support_x fields, using the literal "SUPPORT".

    so you'd end up with something like this:

    Input column Destination column Pivot key value

    Product_1 ProductID 1

    Product_2 ProductID 2

    Product_3 ProductID 3

    Product_4 ProductID 4

    etc...

    Support_1 Support 1

    Support_2 Support 2

    Support_3 Support 3

    Support_4 Support 4

    etc...

    - Finally - drag in a SQL Server destination object. Connect the green arrow from the UNPIVOT component to it.

    - Edit its properties and either set it to an existing table, or click New to have the table built for you. If you click new, be sure to go through and a. put a "real name" on the table and b. to remove all of the Product_x and Support_x, but not the PRODUCTID and SUPPORT fields.

    - simply make sure the column mappings match the right fields up with the right columns.

    And voila - you're done. Run it and test it out. It takes longer to describe it than to do it. No permanent junk table, and while there might be some temp tables in there - you didn't build them so they're not something you need to clean up (SSIS takes care of its own mess).

    Of course - if you do still prefer to NOT do it using SSIS, we can talk creating a stored proc that will do what you need.

    ----------------------------------------------------------------------------------
    Your lack of planning does not constitute an emergency on my part...unless you're my manager...or a director and above...or a really loud-spoken end-user..All right - what was my emergency again?

  • I tried messing with unpivot for a while. I just don't understand it. I eventually thought maybe this is not an unpivot situation.

    I set the options as you explained but I have a few things that aren't accounted for. I have account_id that needs to fit in somewhere. Also, there is the PivotKey flag that needs to be set to True or False for each Output columns.

    Right now I have 4 Output columns

    - Pivot Key Value

    - account_id

    - product_id

    - support

    Under Custom Properties PivotKey is set to True for "Pivot Key Value" Output. I don't understand what that output means or where it came from. The error I get is:

    "PivotKeyValue is not valid. In an UnPivot transform with more than one unpivoted DestinationColumn, the set of PivotKeyValues per destination must match exactly"

    I really appreciate your help.

    Thanks,

    Chris

  • hmm...This might be fun without seeing what you're looking at...

    A few comments:

    - the accountID should be in your final output because it was in the original output. You didn't define /mention it during the unpivot process, did you?

    - in the "regular editor" - there are two sets of check boxes in the top area. One is the one popping items up, the other is called "pass through". Leave all of the pass-throughs alone (meaning checked).

    - you can output the pivot key if you want to. That's a choice.

    The rules about pivot key values:

    - because we're creating 2 "virtual columns", we need to have the SAME amount of each column to unpivot (so we need 30 product columns and 30 support columns). The key value itself doesn't matter, but a. it needs to be UNIQUE within that specific virtual column, and b. it needs to be the same on the "matching" value in the OTHER virtual column. So if you want PRODUCT_1 and SUPPORT_1 to be on the same row during the unpivot, you need to put the same value in their pivot key.

    So in the example below - if this makes it any easier, the ONLY things you should have in the unpivot transform screen would be thirty or so pairs as highlighted below (notice that the pivot keys match):

    Input column Destination column Pivot key value

    Product_1 ProductID 1

    Support_1 Support 1

    ------------------------------------------------------

    Product_2 ProductID 2

    Support_2 Support 2

    ------------------------------------------------------

    Product_3 ProductID 3

    Support_3 Support 3

    ------------------------------------------------------

    Product_4 ProductID 4

    Support_4 Support 4

    ------------------------------------------------------

    The lines are just for show on here.

    Again - you could use bob, shirley, fred,wilma, or red, blue, green. just as long as each pair has a matching value, and that no other pair has that value you're okay.

    And again - remove the account from this list (don't try to unpivot it). Make sure the "pass through" is check on for it.

    It will still be available in the final output.

    ----------------------------------------------------------------------------------
    Your lack of planning does not constitute an emergency on my part...unless you're my manager...or a director and above...or a really loud-spoken end-user..All right - what was my emergency again?

  • I am damn close to having everything working now. I was including account_id in the unpivot and that was messing me up. I can't tell you how much I appreciate the help. The only thing I need to figure out now is how to insert only the rows that do not contain Null values. The old process that did this task I am working on took 2 hours and 15 minutes. Mine is at 8 minutes right now and my dev SQL server is in Pittsburgh and I am running the package from my laptop in Atlanta.

    But I am going to try tweak some stuff next week. Maybe there is a way to exclude rows with null product_ids within SSIS? It would be nice to be able to do a bulk insert. The slowest part of my process right now is where I update a credit_hold flag for a customer for about 2400 accounts.

    But again, thank you so much!!!

    Edit: Just used a conditional split. Duh. It completely works now!

  • Look at inserting a CONDITIONAL SPLIT task (after the unpivot, but before the final destination). your condition will need to determine which rows don't have nulls ......

    This would also be where you will need to configure error handling (meaning - don't fail the package when the productid is null - just throw it away).

    Glad it's working for you.

    ----------------------------------------------------------------------------------
    Your lack of planning does not constitute an emergency on my part...unless you're my manager...or a director and above...or a really loud-spoken end-user..All right - what was my emergency again?

  • Yep, figured out the conditional split and ninja edited my post. Hehe. Thanks again. You live near Atlanta, GA? If so, I owe you some beer.

  • I'm currently one state north (Raleigh), but I have family around Atlanta.

    Careful - you never know when some crazed carolinian might show up demanding beer:)

    ----------------------------------------------------------------------------------
    Your lack of planning does not constitute an emergency on my part...unless you're my manager...or a director and above...or a really loud-spoken end-user..All right - what was my emergency again?

Viewing 11 posts - 1 through 10 (of 10 total)

You must be logged in to reply to this topic. Login to reply