Importing a CSV file using a DTS

  • Hi

    I am having a problem importing data using a CSV file using a DTS package. Typically when I import the data from a csv it work successfuly, but one particular CSV file contains columns which contains comma's as part of the data. When SQL server imports directly from the CSV file additional columns are created knocking the rest of the row of track.

    ROW 1

    "THOMAS J","Person","4 street","town","city","country",000,000,000,9999

    ROW 2

    "person2","person2","59 street hall,","Mount Saint Annes,",000,000,000,0000

    ROW 1 creates ten columns which is what is expected however ROW2 creates 12 rows.

    Not all of the cells in the CSV cells have double quotes as text qualifiers so an error is thrown when double quotes are used (invalid delimiter). The file is too far large to adjust manually (replace all)and the data cannot be adjusted prior to import. The file cannot be adjusted to for example Excel Document. Is there a possible work around for this problem.

  • You just need to specify " as your string delimiter.

    [font="Courier New"]ZenDada[/font]

  • Wow, offhand, I'm not sure what to do here. Since some items are not in ", I'm not sure the above will work.

    I think I'd write some VBScript or small Perl program to rip through the file and clean out the commas between ". It's a little tedious to get working, but once you do, You don't have to worry about it.

    Are the commas always before the "?

  • Ah hah.  Maybe those are really sets of single quotes indicating empty strings?  That gives you ten valid fields.  Turn off all string delimiters and try it.  You don't need them if all of your commas are valid field delimiters.  Try that first, if you have to clean it up first and replace double quotes and single quotes with nothing.  Sorry, I misunderstood you the first time!

    [font="Courier New"]ZenDada[/font]

  • ROW 1

    "THOMAS J","Person","4 street","town","city","country",000,000,000,9999

    ROW 2

    "person2","person2","59 street hall,","Mount Saint Annes,",000,000,000,0000

    ROW3

    "person2","person3","99, main street,","Some place, again,",000,000,000,0000

    All the quotes are double quotes.  Turning of all delimiters creates one column.  This would require the result to be parsed once it has been entered in order to break it into the different columns which could get messy.  The delimiter type (comma) cannot be changed either. Removing all the quotes does not work it returns rows with varying number of columns.

    About 5,000 to 15,000 rows need to be inserted each time the DTS is run.  Currently it is being done by a series of inserts which takes forever to run! Any Ideas?

     

  • Well, ideally the person responsible for providing you with the CSV should be asked to clean it up.  It's really a piece of junk.  But if you are stuck, clean it up in this order:

    replace ,", with ,"", where the characters to either side are not " or ,

    (see BOL for pattern matching/regular expressions)

    then remove the comma's within literal strings,

    then remove the quotes. 

    That will leave you with the 10 expected fields, at least based on your example.

    [font="Courier New"]ZenDada[/font]

  • Hi,

    You can solve the problem as:

    -Write a stored procedure to :

          .copy the table you want to export to another table

          .look thru each field of the table and change any comma in any record to a space

          .export that file to CSV format

     

  • Hi,

    You can solve the problem as:

    -Write a stored procedure to :

          .copy the table you want to export to another table

          .look thru each field of the table and change any comma in any record to a space

          .export that file to CSV format

     

  • Hi,

    I am sorry to post my answer accidentally before I am done:

    (Continued)

         . delete the temporary table

    All of the 4 steps above can be in one stored procedure.

     

     

    If you want, you can even:

    -modify step #2, replace any comma by ~

    -create a macro in excel to convert all '~' to ',' then set it up to run again the the output file. If you are not familiar with macro, you can carry out this step manually.

     

    Another solution:

    . Export the original table to xls file

    .Replace all ',' with '~' or sapce

    . Save the xls to csv format

    If you need further help, please email me at lamd@vmcmail.com

    VMC-WED-MIS-DL

  • Thanks for the suggestions guys.  I used some vb code to treat the file the same way as an excel file.  For some reason Excel is able to correctly identify all the delimiters correctly in the csv file... why excel is able to do this and SQL server not I don't know but it works perfectly.  This also allows me to extract the specific data i need...

    strOpen = mvarDirectoryPath & "\File.csv"

     

      Set oEmpWorkbook = Workbooks.Open(strOpen, , True)

     

      'Select Columns A-J and Rows from start to first column data Ends

      Set oEmpRng = Range("A1:J1", Range("A1").End(xlDown))

     iTotRows = oEmpRng.Rows.Count

      For iRow = 1 To iTotRows   

        strForeName = SQLText(oEmpRng.Cells(iRow, 4))

        strSurName = SQLText(oEmpRng.Cells(iRow, 5))

        strAddr1 = SQLText(oEmpRng.Cells(iRow, 6))

        strAddr2 = SQLText(oEmpRng.Cells(iRow, 7))

        strAddr3 = SQLText(oEmpRng.Cells(iRow, 8))

        strAddr4 = SQLText(oEmpRng.Cells(iRow, 9))

        strPostCode = SQLText(oEmpRng.Cells(iRow, 10))

      Next

Viewing 10 posts - 1 through 9 (of 9 total)

You must be logged in to reply to this topic. Login to reply