SSIS package failure due to PK violation

  • Hi,

    I have SSIS package reading from flat file source and writing to SQL destination, but due to some duplicate rows in flat file the package is failing for Primary Key violation. How this can be handled in package.Flat file is nearly 4 million records and 150 cols in it.

    Creating a temp table and removing duplicates will cause performance issue. Any help.

  • hi.

    In this case you can use Look up task in the package by using which you can come to know which are duplicate rows and which are new rows.

    First dump all the flat file data into one stagging table. This stagging table will be input to your look up task.

    Once you know the row is new then you can insert this row into paritcular table using Execute SQL Task.

    And if duplicate row is there then just overwrite that row on ignore that row.

  • Or, if the duplicates are definitely all within the text file (and not duplicates of data that's already in the destination table) then you don't need a lookup - you just need to eliminate duplicates from your staging table.

    John

  • How do you get a PK violation?

    Are you inserting the PK values yourself?

    Need an answer? No, you need a question
    My blog at https://sqlkover.com.
    MCSE Business Intelligence - Microsoft Data Platform MVP

  • Hi,

    Duplicates due to 2 reasons:

    1. Duplicate rows in flat file- source

    2. Trying to insert existing row into destination SQL

    Step I followed:

    1.Temp table(same as destination but without PK),removed duplicates and then inserted into destination

    2.Lookup with existing SQL table before inserting from flat file.

    My query is about performance issue.

  • Aarathy (8/20/2010)


    My query is about performance issue.

    Well, I don't think you have any choice. Data integrity is more important than performance. Experiment with temp tables, staging tables, outer joins, lookups and anything else you can think of, and see which combination gives you the best performance.

    John

  • John Mitchell-245523 (8/20/2010)


    Or, if the duplicates are definitely all within the text file (and not duplicates of data that's already in the destination table) then you don't need a lookup - you just need to eliminate duplicates from your staging table.

    Eliminate the rows!!! :-):-):-)

    An approach could be change the Primary key constraint....introduce a candidate key composing of two or more columns to identify a record...

    Raunak J

  • Hi,

    If you do not want to get into look up tables and stuff, try implementing bad data re-direction technique.

    This way when you execute the package, all the duplicate rows will get diverted to either a text file or another staging table.

    Hope this is helpful.

    Thanks,

    Mmohsin

  • I'd suggest the redirect as my favoured approach but if your using buk insert this wouldnt be possible in just one task.

    Personally I'd be looking as to why the duplicates are occuring and try and stop it in the first place. However, if thats not possible then you might have to bulk into a staging and then data flow into your final destination, in which case I'd redirect errors so they can be audited and reported on.

Viewing 9 posts - 1 through 8 (of 8 total)

You must be logged in to reply to this topic. Login to reply