how to load distinct rows

  • HI, i have 7000 rows in my excel file and some of them are repeated rows. i want to transform the data from excel source to oledb destination. to get the distinct rows into my destintion which transformation is better or is there any other way to get the data without repeating rows?

  • priya.pk20 (11/15/2011)


    HI, i have 7000 rows in my excel file and some of them are repeated rows. i want to transform the data from excel source to oledb destination. to get the distinct rows into my destintion which transformation is better or is there any other way to get the data without repeating rows?

    Two options:

    * use a SORT component. It has a checkbox at the bottom that you can select to filter out duplicate values. This can have bad performance, but for 7000 rows you should be fine.

    * stage the data into a staging table in SQL Server and use TSQL to delete the duplicate rows. This has the best performance, but for 7000 rows this might be overkill.

    Need an answer? No, you need a question
    My blog at https://sqlkover.com.
    MCSE Business Intelligence - Microsoft Data Platform MVP

  • Thank you koen:-),

    from your answer, i have another question, if the data is larger than 7000 rows , what would be the best way..:satisfied:

    thanks

    priya

  • Use the suggested staging table (may be temporary) and SELECT DISTINCT.

  • is there any other ways to remove redundant data in loading using ssis?

  • Staging data into a table is the best and cleanest way as suggested above.

    But if for some reason you dont want to use this approach, you could try reading distinct rows in the

    source editor itself. In the excel source editor choose SQL Command as the data access mode.

    Write a query similar to the following:

    select distinct col1,col2,col3

    from [sheet1$]

  • priya.pk20 (11/17/2011)


    is there any other ways to remove redundant data in loading using ssis?

    Not with the generic SSIS Excel Source. You would need a C# program that would use Interop to read the Excel sheet.

  • samvanga (11/17/2011)


    Staging data into a table is the best and cleanest way as suggested above.

    But if for some reason you dont want to use this approach, you could try reading distinct rows in the

    source editor itself. In the excel source editor choose SQL Command as the data access mode.

    Write a query similar to the following:

    select distinct col1,col2,col3

    from [sheet1$]

    Does the JET provider (or ACE for 2007 and up) recognise the DISTINCT keyword? If it does, that would be great.

    Need an answer? No, you need a question
    My blog at https://sqlkover.com.
    MCSE Business Intelligence - Microsoft Data Platform MVP

  • Koen Verbeeck (11/17/2011)[hrDoes the JET provider (or ACE for 2007 and up) recognise the DISTINCT keyword? If it does, that would be great.

    Turns out it doesn't. I just learned this

    Earlier i was using 97-2003 version of excel. And it worked with a JET 4.0 provider. Below is the complete connection sting i was using.

    Provider=Microsoft.Jet.OLEDB.4.0;

    Data Source=D:\Documents and Settings\New Microsoft Excel Worksheet.xls

    ;Extended Properties="Excel 8.0;HDR=YES";

  • Here is an article on how to do this:

    http://merlecarr.wordpress.com/2011/03/29/ssis-select-distinct-from-excel-data-source/

    Thanks,

    Fitsum

  • fitsumkh (11/18/2011)


    Here is an article on how to do this:

    http://merlecarr.wordpress.com/2011/03/29/ssis-select-distinct-from-excel-data-source/

    Thanks,

    Fitsum

    Wow! Very smart!

    Thanks -- I learned something...

  • thanks for all, for ur sharings..

Viewing 12 posts - 1 through 11 (of 11 total)

You must be logged in to reply to this topic. Login to reply