setting up DTS package to not import duplicates

  • i want to set up a DTS package usign the SQL Server 2000 Enterprise Manager that will import data from a fixed width text file into a database table.  and i knwo how to do that part.  but what i need is to make it so that it will only import a record if it is not a duplicate record.  how would i do that?

  • Three possibilities come to mind:

    1) Insert trigger on the database to check for dupes and reject.

    2) Make the transform task a VB script, run a sql query to check for dupes and only DTS those that aren't.

    3) DTS to a "staging" table, then run a SQL command / stored proc to copy from the staging table to the desired table.

  • You have two problems when you import text SQL Server sees null values because of the nature of text so if you import into a table with primary key your import will fail.  So first import into a temp table then destination make sure the destination table is set up with unique constraint or even better unique index because you can use IGNORE_DUP_KEY with your create index statement.  Your import will continue but the duplicate will not be inserted.  Run a search for UNIQUE Constraint and IGNORE_DUP_KEY with limitations in the BOL (books online). One more thing unique constraint is nullable while unique index is not.   Hope this helps.

    Kind regards,

    Gift Peddie

    Kind regards,
    Gift Peddie

  • interesting idea for the trigger - but what woudl the code look liek to check for dupes and reject?

  • anyone?

  • Have a look at the examples here:

    http://msdn.microsoft.com/library/default.asp?url=/library/en-us/tsqlref/ts_create2_7eeq.asp

    Essentially, you'd do a select on inserted and the table in question, joining on the PK / unique data. 

    Very rough pseudo-code:

    If EXISTS(select.....)

    begin

    RAISERROR ('Non-unique insert.', 16, 1)

    ROLLBACK TRANSACTION

    end

     

  • "Essentially, you'd do a select on inserted and the table in question, joining on the PK / unique data. "

    thanks for the reply but thats the part i am really having trouble with.

    lets just say my data looked like this, both in the database and the text file i am importing from:

    ID  PK

    FNAME 

    LNAME

    then how would i make my select statement?

    i know that the following would find duplicates

    SELECT ID

    FROM my_table

    WHERE ID= the_id_being_imported

    but how do i access the id of the record its trying to import?  is there a special variable or something for that?

  • inserted is the name of the "table" in the insert trigger.

    An example from the link:

    SELECT @min_lvl = min_lvl,

       @max_lvl = max_lvl,

       @emp_lvl = i.job_lvl,

       @job_id = i.job_id

    FROM employee e INNER JOIN inserted i ON e.emp_id = i.emp_id

  • ok i'm sorry if i am being dumb but i dont get what your doing there:

    SELECT @min_lvl = min_lvl,

       @max_lvl = max_lvl,

       @emp_lvl = i.job_lvl,

       @job_id = i.job_id

    FROM employee e INNER JOIN inserted i ON e.emp_id = i.emp_id

    i dont really get what you are selecting here.  could you please explain it?

     

     

  • "employee" is the table in the database

    "inserted" is the 'table' containing the data to be inserted.

    so, for your example:

    SELECT ID

    FROM my_table

    WHERE ID= the_id_being_imported

    it would look like:

    SELECT ID

    FROM my_table t

    INNER JOIN inserted i

    on t.ID = i.ID

  • ok you must have posted as i was writign this post - so it looks liek i am on the right track.  thank you so much for your help!

  • ok now one more question - the data i am importign into my table froim the text file does not have a single column that is unique.  so i am appentign an identity int column to each record inserted.  that identity int is the PK. 

    but i think that a combination fo 3 feilds from the incoming data could be used to determine if the record is unique or not. 

    so the question is, is there a way to do the inner join on 3 feilds?

  • sure

    INNER JOIN inserted i ON

    t.Col1 = i.col1

    and

    t.col2 = i.col2

    and

    t.col3 = i.col3

    There's also the CHECKSUM function that you might want to look into.

     

  • If you are importing via a DTS package and have a text field as PK but no Identity column, the trigger solution WONT WORK.

     

    Here is what you can do to reject duplicate rows during bulk insert (assuming you have a text primary key or no PK column). Select the column you want to function as a unique identifier or the PK column. Create an Index on the PK column and a UNIQUE Constraint with IGNORE_DUP_KEY on. This will eliminate all duplicate rows during DTS bulk copy. (you wont need transformation logic, etc to handle this. yes it will throw an error but it will not stop at the first duplicate and will run all the records)

Viewing 14 posts - 1 through 13 (of 13 total)

You must be logged in to reply to this topic. Login to reply