INSERT INTO versus SELECT INTO data import strategy

  • We have a temporary table defined as:

    CREATE TABLE #tmp_wages(

    [ssn] [varchar](9) NULL,

    [first_name] [varchar](32) NULL,

    [last_name] [varchar](32) NULL,

    [date_of_birth] [datetime] NULL,

    [emp_id] [varchar](9) NULL,

    [emp_name] [varchar](40) NULL,

    [yr_qtr] [varchar](6) NULL,

    [yr_qtr_dec] [float] NULL,

    [wages] [money] NULL,

    [import_date] [datetime] NULL

    )

    In another temporary table, #tmp_import, defined as:

    CREATE TABLE #tmp_import (import_text varchar(1024))

    We then use a user-defined function that creates XML records, one record per line of text, whose values are then inserted into #temp_import, thus giving us one large line of text per row that needs to be parsed, today that we use the following:

    INSERT INTO #tmp_wages

    SELECT SUBSTRING(import_text, 1, 9) AS ssn,

    NULL AS first_name,

    NULL AS last_name,

    CASE WHEN ISDATE(SUBSTRING(import_text, 50, 2) + '/' + SUBSTRING(import_text, 52, 2) + '/' + SUBSTRING(import_text, 46, 4)) = 1

    THEN CONVERT(datetime, SUBSTRING(import_text, 50, 2) + '/' + SUBSTRING(import_text, 52, 2) + '/' + SUBSTRING(import_text, 46, 4))

    ELSE NULL END AS date_of_birth,

    SUBSTRING(import_text, 59, 9) AS emp_id,

    RTRIM(SUBSTRING(import_text, 68, 40)) AS emp_name,

    SUBSTRING(import_text, 54, 4) + '-' + SUBSTRING(import_text, 58, 1) AS yr_qtr,

    CONVERT(float, SUBSTRING(import_text, 54, 4)) + (CONVERT(float, SUBSTRING(import_text, 58, 1))-1)/4 AS yr_qtr_dec,

    CONVERT(money, ISNULL(SUBSTRING(import_text, 108, 10),0))/100 AS wages,

    GETDATE() AS import_date

    FROM #temp_import

    WHERE LEN(LTRIM(RTRIM(SUBSTRING(import_text, 1, 9)))) = 9

    CREATE CLUSTERED INDEX index_temp_wages ON #tmp_wages(ssn, emp_id, yr_qtr)

    This works well until you have a file with some 6000+ rows. It seems like it always gets hung up on the INSERT INTO #tmp_wages part and just spins indefinitely. Now, if we use a SELECT INTO without defining #tmp_wages.

    I guess I have two questions: 1) Why does SELECT INTO work fine for inserting a large number of rows where INSERT INTO does not, and 2) Is there a better way of doing this without using bcp or something similar?

    Thank you in advance,

    Jim

  • Do you have any control on the xml that is coming to you?

    We had a case where the app was creating the xml for us to consume in the database and parse into a relational model. We changed the app to send the data via Table valued parameters and saw significant improvement.

    Jason...AKA CirqueDeSQLeil
    _______________________________________________
    I have given a name to my pain...MCM SQL Server, MVP
    SQL RNNR
    Posting Performance Based Questions - Gail Shaw[/url]
    Learn Extended Events

  • The file uploaded to us is that just a text file, one line of text is a record, each column is (x) characters long. Using a user-defined function in SQL Server, we are creating the XML, dumping that into a single table column.

    It would almost make more sense to me to do all the parsing of the text file into XML on the application side, then just dump it into a table and go on from there.

  • Does the application require that it be xml in order to use that info? It seems like a backwards way of doing this to me. You receive the text file and dump that into a relational database. Then have the application read the relational data. If the application really needs to have it in xml, the application would be faster at converting it.

    Jason...AKA CirqueDeSQLeil
    _______________________________________________
    I have given a name to my pain...MCM SQL Server, MVP
    SQL RNNR
    Posting Performance Based Questions - Gail Shaw[/url]
    Learn Extended Events

Viewing 4 posts - 1 through 3 (of 3 total)

You must be logged in to reply to this topic. Login to reply