Union All returns duplicates???

  • I am trying to build a dimension for a cube using SSIS.  The dimension consists of contract IDs and other data associated with a contract.  The list of contracts is pulled from our business application, but the transaction (fact) data may have contract IDs that aren't in the business application.  I want to explicitly add "Unknown" members to the dimension if a transaction contains a contract ID that is not already in the dimension table.  I have set this up as follows:

    Select distinct Contract ID from one fact table (one partition) using an OLE DB data source.  Select distinct Contract ID from another fact table (another partition) using an OLE DB Data source.  Merge the two Contract ID lists together using a Union All transformation (I expect distinct Contract IDs at this point).  For each Contract ID from the fact tables, check for existing Contract ID in dimension table using a Lookup to the dimension table.  If doesn't exist (i.e. error output from lookup), add record to dimension table.

    What I find is that the Union All doesn't return distinct results.  The only input columns are Contract ID from each of the two data sources, and the only output should be Contract ID, but if both data sources contain a particular Contract ID, I am getting two instances (rows) of that Contract ID in the result from the Union All.  Am I misunderstanding how Union All is supposed to work?

    By the way, I have also tried this with a Merge transform, with the same results.

  • Union ALL returns ALL rows.

     

    Union returns distinct rows only.

     

    Hope this helps.

  • Thanks, I understand how that works in a SQL statement.  However, there doesn't appear to be a SSIS transform called "Union".  Merge doesn't appear to do what I want either.  Is there a single transform that would do what I expect, or would it be easiest to just slap on an Aggregate transform after the Union All that groups by Contract ID?

  • Can't help you there.  I'm not an ssis expert... not a ssis user for that matter.  I'll let the other DBAs figure this one out.  Good luck with this project.  Let us know if you find a usefull solution before someone else posts it.

  • Instead of creating multiple OLE DB Sources and trying to merge the results using transforms, I created a single OLE DB Source and wrote the SQL to do what I want (union results from three tables).  One of the reasons I'm trying to use SSIS is to AVOID writing custom SQL, but oh well...

  • You can do this is SSIS in two steps...

    1. Use a merge transform (as you mentioned above)
    2. Use a SORT transform, and sort the data on ContractID, making sure you check the box which says "Remove rows with duplicate sort values".

    Hope this helps

Viewing 6 posts - 1 through 5 (of 5 total)

You must be logged in to reply to this topic. Login to reply