Importing Data from Excel to SQL via SSIS minus duplicate records

  • I have a basic SSIS package that uploads data in an Excel format to a table in SQL. The problem is data in Excel may contain duplicate data already in the SQL table; is there a way to upload the data but if it the record already exists not to import it.

    On the table the primary key is the materialno but I try to import the subsequent files and it just fails the package all together.

    I know with SQL if I am trying to insert data from one table to the other I can do EXISTS but didn't know if the same concept could be applied some way using SSIS.

    Thanks!

  • Use a merge join or a lookup on the sql table. With merge join, set your excel recordset as the left input and use a conditional split where the joined value in the right input is null and send that output to your destination table. With a lookup, use the nomatch output.

  • Yup, basically what christina said.

    Use the lookup unless you're really running into memory restrictions. Lookup is *way* faster, and it's designed for this exact purpose. Just make sure that you set the error handler to direct only the rows which find no lookup to your destination, again, as christina said.

    Two things to keep in mind though. First, this will work to prevent you from inserting records which already exist, but it won't prevent you from inserting records which are duplicated in your source, the Excel file. To avoid doing this, you'll need to use the SORT transformation. Select all the rows to output, and choose the eliminate duplicates option.

    Secondly, the Lookup transform is *CASE SENSITIVE*. This has caused me no end of problems in the past. MAKE SURE if you are doing any string comparison, you LTRIM, RTRIM, and UPPER/LOWER both sides of the transformation, or else you're going to run into problems.

  • To expand further:

    if you use the MERGE JOIN, you need to sort on the join keys first, using the SORT component.

    If memory or performance is an issue, consider staging the data from the Excel file in a table in SQL Server, and do your validation with TSQL statements.

    Need an answer? No, you need a question
    My blog at https://sqlkover.com.
    MCSE Business Intelligence - Microsoft Data Platform MVP

  • christina-518819 (11/8/2011)


    Use a merge join or a lookup on the sql table. With merge join, set your excel recordset as the left input and use a conditional split where the joined value in the right input is null and send that output to your destination table. With a lookup, use the nomatch output.

    How do I do a lookup on the sql table through SSIS?

  • Use the Lookup Transform

    In the properties, use a SQL Query instead of a table.

    Your query should be as basic as possible. For your purposes, since you're looking just to check existence, it should be something like:

    SELECT LookupID

    FROM LookupTable

    In the second tab, join the column from your data source to the LookupID in the Lookup table.

    Then, go to the Error Manager, and change the error handler to Redirect Row on Lookup failure.

    Finally, attach the Error output of your Lookup to the destination you want to output to.

  • kramaswamy (11/8/2011)


    Use the Lookup Transform

    In the properties, use a SQL Query instead of a table.

    Your query should be as basic as possible. For your purposes, since you're looking just to check existence, it should be something like:

    SELECT LookupID

    FROM LookupTable

    In the second tab, join the column from your data source to the LookupID in the Lookup table.

    Then, go to the Error Manager, and change the error handler to Redirect Row on Lookup failure.

    Finally, attach the Error output of your Lookup to the destination you want to output to.

    Worked great!!

    Thanks!

  • kramaswamy (11/8/2011)


    In the second tab, join the column from your data source to the LookupID in the Lookup table.

    Then, go to the Error Manager, and change the error handler to Redirect Row on Lookup failure.

    Finally, attach the Error output of your Lookup to the destination you want to output to.

    If you are working with SSIS 2008, you don't need to set-up error handling, you can just choose to direct the rows to the no match output. (this was not possible in SSIS 2005, hence the error handling)

    Need an answer? No, you need a question
    My blog at https://sqlkover.com.
    MCSE Business Intelligence - Microsoft Data Platform MVP

  • Oh cool. Good change, glad they did that, was kinda silly needing to use the error handler to redirect no matches.

Viewing 9 posts - 1 through 8 (of 8 total)

You must be logged in to reply to this topic. Login to reply