Read flat file but ignore everything in filename after character plus insert date into SQL Server table

  • I am fairly new to SSIS but not to SQL Server. Here are the conditions of the flat files:

    1) there will only be one flat file (.csv) per destination table in a designated directory at a time

    2) the directory path will always be the same

    3) there will be multiple flat files but each is input for its own table in SQL Server

    4) The part of the flat file that varies comes after an underscore character in the filename. It is the first underscore of two in the filename

    5) Here are two files and their destination tables for an example:

    ProdSmallCellCarcinoma_10-29-2015_PCP.csv

    ProdSquamousCellCarcinoma_10-29-2015_PCP.csv

    each of these to be imported into tables named SmallCellCarcinoma and SquamousCellCarcinoma, respectively.

    I would like to get the flat file connector to just look at the part of the string before the first _ so I don't have to hard code the exact file name each time.

    I would also like to store the date part of the filename in a sql server column and stop before the _PCP part.

    I don't need to loop through files in a directory because I have a separate flat file connection for each file type and there is only one file of each type in the directory each month.

    I do have the hard-coded filename version working to import the data out now I need to take the next step.

    Thanks in advance.

  • pharmkittie (11/3/2015)


    I am fairly new to SSIS but not to SQL Server. Here are the conditions of the flat files:

    1) there will only be one flat file (.csv) per destination table in a designated directory at a time

    2) the directory path will always be the same

    3) there will be multiple flat files but each is input for its own table in SQL Server

    4) The part of the flat file that varies comes after an underscore character in the filename. It is the first underscore of two in the filename

    5) Here are two files and their destination tables for an example:

    ProdSmallCellCarcinoma_10-29-2015_PCP.csv

    ProdSquamousCellCarcinoma_10-29-2015_PCP.csv

    each of these to be imported into tables named SmallCellCarcinoma and SquamousCellCarcinoma, respectively.

    I would like to get the flat file connector to just look at the part of the string before the first _ so I don't have to hard code the exact file name each time.

    I would also like to store the date part of the filename in a sql server column and stop before the _PCP part.

    I don't need to loop through files in a directory because I have a separate flat file connection for each file type and there is only one file of each type in the directory each month.

    I do have the hard-coded filename version working to import the data out now I need to take the next step.

    Thanks in advance.

    To handle the varying file names, use FOREACH containers. An example file spec would be ProdSmallCellCarcinoma_*.csv. The fact that it will iterate over only one file is not a problem.

    The FEL can be configured to assign the name of the file which is being processed to a variable.

    A derived column within a data flow can be used to carve out the bit of the filename you are interested in – ie, between two underscores.

    If you haven't even tried to resolve your issue, please don't expect the hard-working volunteers here to waste their time providing links to answers which you could easily have found yourself.

  • To handle the varying file names, use FOREACH containers. An example file spec would be ProdSmallCellCarcinoma_*.csv. The fact that it will iterate over only one file is not a problem.

    The FEL can be configured to assign the name of the file which is being processed to a variable.

    A derived column within a data flow can be used to carve out the bit of the filename you are interested in – ie, between two underscores.

    Thanks Phil! I understand (I think) the first two steps but could you be a little more specific about how to do the derived column within a data flow to carve out the token between underscores please? Do I use a regular expression?

    Thanks again.

  • If you have 2012 or later. an easy way is to use TOKEN.

    Assuming the file name is held in a variable called Filename (yes, I'm feeling creative today), your expression would look something like this:

    TOKEN( @[User::Filename] , "_" ,2 )

    A more convoluted method using a combination of FINDSTRING and SUBSTRING would be needed for earlier SSIS versions.

    If you haven't even tried to resolve your issue, please don't expect the hard-working volunteers here to waste their time providing links to answers which you could easily have found yourself.

  • Thanks very much Phil. I'm going to implement what you've told me and I will post my results here.

  • I just want to thank you again Phil. I did as you advised for all the flat file and Excel file types I have to import and everything worked great, I was able to parse out the type of file and date string from the filename to store in the DB table destinations. From that point I made some computed columns in the SQL tables to get the year and the quarter of the data.

    I especially liked the part where you said I could go ahead and use a ForEach container even if I only had to "loop through" 1 file.

    Thanks again.

  • I appreciate the fact that you took the time to post back with your success story 🙂

    And it's great to hear that you were able to build a working solution; well done.

    If you haven't even tried to resolve your issue, please don't expect the hard-working volunteers here to waste their time providing links to answers which you could easily have found yourself.

  • hi i am having the same issue.. can you please check this

    I am using ssis component "Azure Source Blob" there is a file in blob storage with name "Material123445.csv"

    i want to fetch this file with its frist name "Material" and rest of the name "123445" should be ignore because these digits will be changing on daily basis. so these digits cant be hardcoded

    how to handle this thing please check the image attached

  • waqarbaig777 - Wednesday, August 29, 2018 1:08 AM

    hi i am having the same issue.. can you please check this

    I am using ssis component "Azure Source Blob" there is a file in blob storage with name "Material123445.csv"

    i want to fetch this file with its frist name "Material" and rest of the name "123445" should be ignore because these digits will be changing on daily basis. so these digits cant be hardcoded

    how to handle this thing please check the image attached

    The solution I outlined earlier in this thread should work perfectly well in this case: please try it.

    If you haven't even tried to resolve your issue, please don't expect the hard-working volunteers here to waste their time providing links to answers which you could easily have found yourself.

Viewing 9 posts - 1 through 8 (of 8 total)

You must be logged in to reply to this topic. Login to reply