BULK INSERT with UNICODE files

  • My ultimate goal is to import some mix of Chinese and English from

    a set of files into a database. Since I have multiple files and since

    I'm not good at writing iterations in DTS, I'd like to use stored

    procs with BULK INSERT. The input format will be csv with quoted

    strings and there is no guarantee that there are no commas in the

    strings. Therefore I'll need a format file to strip the quotes. But

    for now, I'm trying very simple things...

    To make the long story short: as long as I am using ASCII files, I

    don't have any problem, but if I save the input file as UNICODE,

    things start falling apart. Here are the details:

    /* Contents of the the input files xx-a.txt and xx-u.txt: */

    one,two,three

    un,deux,troix

    eins,zwei,drei

    /* Contents of the the input files yy-a.txt and yy-u.txt: */

    one two three

    un deux troix

    eins zwei drei

    /* Input files were written in Notepad,

    -a saved as ANSI and -u saved as UNICODE */

    /* Contents of format file xx.fmt: */

    8.0

    3

    1 SQLCHAR 0 10 "," 1 a SQL_Latin1_General_CP1_CI_AS

    2 SQLCHAR 0 10 "," 2 b SQL_Latin1_General_CP1_CI_AS

    3 SQLCHAR 0 10 "\r\n" 3 c SQL_Latin1_General_CP1_CI_AS

    /* Contents of format file yy.fmt: */

    8.0

    3

    1 SQLCHAR 0 10 "" 1 a SQL_Latin1_General_CP1_CI_AS

    2 SQLCHAR 0 10 "" 2 b SQL_Latin1_General_CP1_CI_AS

    3 SQLCHAR 0 10 "" 3 c SQL_Latin1_General_CP1_CI_AS

    /* the test table */

    CREATE TABLE [dbo].[xx] (

    [a] [char] (10) COLLATE SQL_Latin1_General_CP1_CI_AS NULL ,

    [char] (10) COLLATE SQL_Latin1_General_CP1_CI_AS NULL ,

    [c] [char] (10) COLLATE SQL_Latin1_General_CP1_CI_AS NULL

    )

    /* test 1: ASCII input with FORMATFILE */

    BULK INSERT xx

    FROM 'E:\test\xx-a.txt'

    WITH (

    FORMATFILE = 'e:\test\xx.fmt'

    )

    /* test 2: UNICODE input with FORMATFILE */

    BULK INSERT xx

    FROM 'E:\test\xx-u.txt'

    WITH (

    DATAFILETYPE = 'widechar',

    FORMATFILE = 'e:\test\xx.fmt'

    )

    /* test 3: UNICODE input with FIELDTERMINATOR */

    BULK INSERT xx

    FROM 'E:\test\xx-u.txt'

    WITH (

    DATAFILETYPE = 'widechar',

    FIELDTERMINATOR = ','

    )

    /* test 4: ASCII input, fixed length */

    BULK INSERT xx

    FROM 'E:\test\yy-a.txt'

    WITH (

    FORMATFILE = 'e:\test\yy.fmt'

    )

    /* test 5: UNICODE input, fixed length */

    BULK INSERT xx

    FROM 'E:\test\yy-u.txt'

    WITH (

    DATAFILETYPE = 'widechar',

    FORMATFILE = 'e:\test\yy.fmt'

    )

    /*------------------------------------------------------------------*/

    Results:

    Test 1, Test3 and Test 4 work fine.

    Test 2 gives the following error message:

    Bulk insert data conversion error (truncation) for row 1, column 2 (b).

    Test 5 produces a checker board:

    line 1: a='one', b='', c='two'

    line 2: a='', b='three', c=''

    etc

    System used: English Windows 2000 AS SP4, SQL Server 2000 EE 8.00.760 (SP3)

    /*------------------------------------------------------------------*/

  • This was removed by the editor as SPAM

  • For the archive:

    This here is not a solution either, but it maybe a hint that what I want is simply not possible:

    I tried Test 2 (that's the only relevant one anyway) in bcp, and I got the following:

    --------------------------------------------------

    E:\test>bcp testDB..xx in xx-u.txt -f xx.fmt -Psecret -w

    Warning: -w overrides -f.

    Starting copy...

    SQLState = S1000, NativeError = 0

    Error = [Microsoft][ODBC SQL Server Driver]Unexpected EOF encountered in BCP dat

    a-file

    0 rows copied.

    --------------------------------------------------

    That warning clearly tells us that in bcp you can't have Unicode data and format files at the same time. That leads me to the next question: Are BULK INSERT and bcp two different implementations or just different interfaces for the same thing? If it is the latter, then I can give up. In DTS, on the other hand, I don't have any problem doing the imports...

Viewing 3 posts - 1 through 2 (of 2 total)

You must be logged in to reply to this topic. Login to reply