Best approach to import text file?

  • I need to import into MS SQL DB text file with text:

    *RECORD*

    *FIELD* NO

    1

    *FIELD* TI

    ...

    *FIELD* TX

    ...

    *RECORD*

    *FIELD* NO

    2

    *FIELD* TI

    ...

    *FIELD* TX

    ...

    *THEEND*

    What is the best approach to do this? Use bcp, SSIS package or something else?

  • That's like asking which is the best way to eat ice cream. Pick the method that you are most comfortable with.

  • BCP with a format file that uses the end of line characters as delimiters (as well as some others).

    In actuality, that's not the "best" way... the absolute best way would be for whoever is providing the data to provide it in a more normal fashion instead as an unverifiable EAV-like format.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.
    "Change is inevitable... change for the better is not".

    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)
    Intro to Tally Tables and Functions

  • Jeff Moden (10/1/2011)


    BCP with a format file that uses the end of line characters as delimiters (as well as some others).

    In actuality, that's not the "best" way... the absolute best way would be for whoever is providing the data to provide it in a more normal fashion instead as an unverifiable EAV-like format.

    Do you mean as a well-formatted CSV file, or something else?

    http://sqlvince.blogspot.com/[/url]

  • This may be one of the simplest ways to import a text file:

    DECLARE @TxtData VARCHAR(MAX);

    SELECT @TxtData = BulkColumn FROM OPENROWSET(BULK 'C:\textfile.txt', SINGLE_CLOB) AS T;

    SELECT @TxtData;

    But, as other have mentioned, you must have some sort of separator like a comma in order to get the correct values out of it. You could also use BCP as mentioned.

    Best regards,

    Best regards,

    Andre Guerreiro Neto

    Database Analyst
    http://www.softplan.com.br
    MCITPx1/MCTSx2/MCSE/MCSA

  • Now if you need to pass the filename as a variable you won't be able to use OPENROWSET that way. It needs a constant value as filename.

    But you could do this:

    IF OBJECT_ID('tempdb.dbo.#TMP_TXTDATA', 'U') IS NOT NULL

    DROP TABLE dbo.#TMP_TXTDATA;

    GO

    CREATE TABLE dbo.#TMP_TXTDATA(DataFromFile NVARCHAR(MAX));

    GO

    DECLARE @filename NVARCHAR(MAX) = N'C:\\textfile.txt';

    DECLARE @Query NVARCHAR(MAX) = N'INSERT INTO dbo.#TMP_TXTDATA SELECT BulkColumn FROM OPENROWSET(BULK ''' + @filename + ''', SINGLE_CLOB) AS T';

    EXEC sp_executesql @Query;

    SELECT DataFromFile FROM dbo.#TMP_TXTDATA;

    But be aware that you must check and double-check for SQL injection as the dynamic sql is not properly parameterised.

    Best regards,

    Best regards,

    Andre Guerreiro Neto

    Database Analyst
    http://www.softplan.com.br
    MCITPx1/MCTSx2/MCSE/MCSA

  • vince_sql (10/1/2011)


    Jeff Moden (10/1/2011)


    BCP with a format file that uses the end of line characters as delimiters (as well as some others).

    In actuality, that's not the "best" way... the absolute best way would be for whoever is providing the data to provide it in a more normal fashion instead as an unverifiable EAV-like format.

    Do you mean as a well-formatted CSV file, or something else?

    I was thinking "TSV" (Tab Separated Values) so you don't have to worry so much about embedded quotes.

    Personally, I'd prefer it if folks sent me files delimited with ASCII character 31 (Unit Separator) for delimiters and ASCII character 30 (Record Separator) for "end of line"... that way, I don't ever have to worry about characters that folks might embed. Most people don't like that, though, because you can't open them in a spreadsheet, as many people seem to like to do.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.
    "Change is inevitable... change for the better is not".

    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)
    Intro to Tally Tables and Functions

  • codebyo (10/1/2011)


    Now if you need to pass the filename as a variable you won't be able to use OPENROWSET that way. It needs a constant value as filename.

    But you could do this:

    IF OBJECT_ID('tempdb.dbo.#TMP_TXTDATA', 'U') IS NOT NULL

    DROP TABLE dbo.#TMP_TXTDATA;

    GO

    CREATE TABLE dbo.#TMP_TXTDATA(DataFromFile NVARCHAR(MAX));

    GO

    DECLARE @filename NVARCHAR(MAX) = N'C:\\textfile.txt';

    DECLARE @Query NVARCHAR(MAX) = N'INSERT INTO dbo.#TMP_TXTDATA SELECT BulkColumn FROM OPENROWSET(BULK ''' + @filename + ''', SINGLE_CLOB) AS T';

    EXEC sp_executesql @Query;

    SELECT DataFromFile FROM dbo.#TMP_TXTDATA;

    But be aware that you must check and double-check for SQL injection as the dynamic sql is not properly parameterised.

    Best regards,

    A question on that, Andre... what do you do with the data once you've imported it into a table since you're not reassembled the rows of individual elements?

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.
    "Change is inevitable... change for the better is not".

    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)
    Intro to Tally Tables and Functions

  • Jeff Moden (10/2/2011)


    codebyo (10/1/2011)


    Now if you need to pass the filename as a variable you won't be able to use OPENROWSET that way. It needs a constant value as filename.

    But you could do this:

    IF OBJECT_ID('tempdb.dbo.#TMP_TXTDATA', 'U') IS NOT NULL

    DROP TABLE dbo.#TMP_TXTDATA;

    GO

    CREATE TABLE dbo.#TMP_TXTDATA(DataFromFile NVARCHAR(MAX));

    GO

    DECLARE @filename NVARCHAR(MAX) = N'C:\\textfile.txt';

    DECLARE @Query NVARCHAR(MAX) = N'INSERT INTO dbo.#TMP_TXTDATA SELECT BulkColumn FROM OPENROWSET(BULK ''' + @filename + ''', SINGLE_CLOB) AS T';

    EXEC sp_executesql @Query;

    SELECT DataFromFile FROM dbo.#TMP_TXTDATA;

    But be aware that you must check and double-check for SQL injection as the dynamic sql is not properly parameterised.

    Best regards,

    A question on that, Andre... what do you do with the data once you've imported it into a table since you're not reassembled the rows of individual elements?

    I used to have some code that would split the values using a separator and a tally table or some other technique. I can't remember exactly now. My code is all at work.

    But another (easier) way to do that would probably be:

    IF OBJECT_ID('tempdb.dbo.#TMP_TXTDATA', 'U') IS NOT NULL

    DROP TABLE dbo.#TMP_TXTDATA;

    GO

    CREATE TABLE dbo.#TMP_TXTDATA(DataFromFile NVARCHAR(MAX));

    GO

    DECLARE @filename NVARCHAR(MAX) = N'C:\\textfile.txt';

    DECLARE @Query NVARCHAR(MAX) = N'BULK INSERT dbo.#TMP_TXTDATA FROM ''' + @filename + ''' WITH (FIELDTERMINATOR = '';'', ROWTERMINATOR = '''')';

    EXEC sp_executesql @Query;

    SELECT DataFromFile

    FROM dbo.#TMP_TXTDATA;

    That way we would handle with the field and row terminators at once.

    I used to use OPENROWSET with the BULK option when the data was not correctly formatted or was too hard to deal by the time of the import.

    🙂

    EDIT: for some reason I can't display the ("\ n") character correctly in the ROWTERMINATOR = '''' part.

    Best regards,

    Best regards,

    Andre Guerreiro Neto

    Database Analyst
    http://www.softplan.com.br
    MCITPx1/MCTSx2/MCSE/MCSA

  • Look at the example the OP gave, Andre... each element is on a single row with row indicators between "whole records". You don't need a splitter for this. Think about what would be used as a delimiter and an "end of record" marker for this case. 🙂

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.
    "Change is inevitable... change for the better is not".

    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)
    Intro to Tally Tables and Functions

  • Jeff Moden (10/2/2011)


    Look at the example the OP gave, Andre... each element is on a single row with row indicators between "whole records". You don't need a splitter for this. Think about what would be used as a delimiter and an "end of record" marker for this case. 🙂

    Hmm, you're right.

    I wasn't thinking about how he would manipulate data after importing it.

    Maybe if the OP could attach an example file or even the actual file would be easier to work on.

    I have made a simple test changing the code above in my post with this part I was able to import the right amount of rows and fields:

    DECLARE @Query NVARCHAR(MAX) = N'BULK INSERT dbo.#TMP_TXTDATA FROM ''' +

    @filename + ''' ' +

    ' WITH (FIELDTERMINATOR = ''\*FIELD\*'', ' +

    ' ROWTERMINATOR = ''\*RECORD\*'')';

    Perhaps it's an ugly way but after a few REPLACE()s it worked.

    Could someone post an example using BCP with that format?

    I'd love to see that. 🙂

    Best regards,

    Best regards,

    Andre Guerreiro Neto

    Database Analyst
    http://www.softplan.com.br
    MCITPx1/MCTSx2/MCSE/MCSA

  • As it turned out, some fields can have null values. So, the correct sample will be:

    *RECORD*

    *FIELD* NO

    1

    *FIELD* TI

    ...

    *FIELD* TX

    ...

    *RECORD*

    *FIELD* NO

    2

    *FIELD* TI

    ...

    *FIELD* TX

    ...

    *FIELD* CN

    ...

    *THEEND*

    I've tried to make a format file but now getting error: Bulk load data conversion error (truncation) for row 1, column 4 (CN).

  • Can you attach the file that you're trying to import? Please... make sure there's no one's private information if you do.

    Also, thank you for posting the BCP error but I can't tell a thing about it without seeing the format file and the target table definition. In fact, I'd need to see the import file for that, as well.

    Also... for any given file, do all of the "full records" have the same number of "fields". There may actually be an even simpler way to do this.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.
    "Change is inevitable... change for the better is not".

    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)
    Intro to Tally Tables and Functions

  • codebyo (10/2/2011)


    Jeff Moden (10/2/2011)


    Look at the example the OP gave, Andre... each element is on a single row with row indicators between "whole records". You don't need a splitter for this. Think about what would be used as a delimiter and an "end of record" marker for this case. 🙂

    Hmm, you're right.

    I wasn't thinking about how he would manipulate data after importing it.

    Maybe if the OP could attach an example file or even the actual file would be easier to work on.

    I have made a simple test changing the code above in my post with this part I was able to import the right amount of rows and fields:

    DECLARE @Query NVARCHAR(MAX) = N'BULK INSERT dbo.#TMP_TXTDATA FROM ''' +

    @filename + ''' ' +

    ' WITH (FIELDTERMINATOR = ''\*FIELD\*'', ' +

    ' ROWTERMINATOR = ''\*RECORD\*'')';

    Perhaps it's an ugly way but after a few REPLACE()s it worked.

    Could someone post an example using BCP with that format?

    I'd love to see that. 🙂

    Best regards,

    The BCP format file would use similar delimiters and row terminators as you have in your BULK INSERT example (which is very nicely done, BTW). Perhaps if the OP attaches a data file, we can give it a whirl although I believe that the BULK INSERT you've created will be just as fast, if not faster. It's just the nature of BULK INSERT to be fast.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.
    "Change is inevitable... change for the better is not".

    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)
    Intro to Tally Tables and Functions

  • WaitingWonder2 (10/3/2011)


    As it turned out, some fields can have null values. So, the correct sample will be:

    *RECORD*

    *FIELD* NO

    1

    *FIELD* TI

    ...

    *FIELD* TX

    ...

    *RECORD*

    *FIELD* NO

    2

    *FIELD* TI

    ...

    *FIELD* TX

    ...

    *FIELD* CN

    ...

    *THEEND*

    I've tried to make a format file but now getting error: Bulk load data conversion error (truncation) for row 1, column 4 (CN).

    So... we've got lots of ideas posted for how to handle all of this. You haven't posted the format file as requested and we don't know what else you've tried. If you were to attach a file of data that more closely matches what you're really trying to do, then we might be able to solve it for sure, for you. Right now, we're pretty much relegated to just making suggestions.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.
    "Change is inevitable... change for the better is not".

    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)
    Intro to Tally Tables and Functions

Viewing 15 posts - 1 through 15 (of 20 total)

You must be logged in to reply to this topic. Login to reply