flat file import

  • Hi, new to the forum and this is my first post so please go easy on me.

    I need to sort out a data importing task from a flat file into the sql database, ive got the basic swing of importing into a table if the flat file is 'layed out' for that table - however its not all that simple, the flat file that i have to import is like follows;

    tbl_table_name_1

    data¦data¦data¦data¦data¦data¦data¦data¦data¦data¦

    tbl_table_name_2

    data¦data¦data¦data¦data¦data¦data¦data¦data¦data¦

    tbl_table_name_3

    data¦data¦data¦data¦data¦data¦data¦data¦data¦data¦

    and to be honest ill have very little (if any at all) say over that format. What i need to do is have a script/DTS Package that will read the first line as the destination table, then read the second line as the data - update/create as needed and the move to the next table name and on and on.

    Can anyone offer any pointers here - a helping nudge in the right direction (or the completed script) would be most helpful.

    Thank you.

    Martin.

  • Import the flat file to one table with one column.  Write a stored procedure to insert the data| lines into seperate tables as the next SQLTask.  Then move the data accordingly.

    Derrick Leggett
    Mean Old DBA
    When life gives you a lemon, fire the DBA.

  • I had a flat file that was composed of four tables , which each line had a prefix A B C or D denoting what table the line belongs to. And there could be many lines per table before the next table was gotten to, plus the FK was only in A not B C D so there was no way to link the data in the last three tables to the first table.

     

    So what i did first was write a program in C# that read through this one file and created 4 seperate text files, one for each table and wrote the PK or FK to each appropiate file, and then in Sql Server I wrote a package that imported the 4 files into their respective tables.  Works fast and was easy to do.

  • I'd go with Derrick's suggestion. I've always found that data manipulation was much faster and cleaner once the data was in a SQL database.

    Doing things like setting PK/FK is easy when you can update rows in sets rather than one at a time.

    --------------------
    Colt 45 - the original point and click interface

  • If each of the tables data is in separate files, your task becomes a 1000 times easier.

    With a flat file, you can import using the BULK INSERT command or BCP. (Hint: Bulk Insert is faster and better). But both of these commands would expect separate files, one for each table.

    At some point you need to split your data stream into the separate tables. If it's a flat file now, separate it BEFORE putting it into the database. It'll be easier if your data is sequential.

     

     


    Julian Kuiters
    juliankuiters.id.au

  • I would also go with Derrick on this, but here's more detail.  Anyone have a better idea?

    1.  Import the data into a table with 2 fields.  Field one is DataRowID and is int /not null / identity.  Field 2 is varchar 4000 (or whatever max length your data row can be) (I gave it the great name of Field1).  I named this table Test2

    2.  Move THAT data into a table that has 3 fields, DataRowID(same as above, but not identity), TableName, DataRow. I named this table Test3

    This script had 4 parts. 

    1.  I select all of the rows that have the tablename in them - criteria is Field1 like 'tbl_%'.

    2.  I select all of the rows that don't have the Field1 like 'tbl_%'

    3.  Join the two tables using data row > tablename row AND data row < next tablename row or, if there's no more tablename rows, < the last row number + 1.

    4.  The data is then inserted into Test3.

    Here's the script to move the data into a nice flat table like I described. 

    --BEGIN SQL CODE

    select  DataRow.RowID, tableNameCol.tblName,

     DataRow.Field1

    into Test3

    from  (select TableNameRow.RowID,

      TableNameRow.Field1

       as tblName

     from test2

      as TableNameRow

     where rowid 

      =  (select max(MaxRowTable.RowID)

       from  Test2

        as MaxRowtable

       where  Field1 like ('tbl_%')

        and MaxRowTable.RowID

        <= TableNameRow.rowid ) )

         as TableNameCol

    inner join test2 as DataRow on DataRow.RowID > TableNameCol.RowID  and

    DataRow.RowID < isnull((select min(MaxRowTable.RowID)

       from  Test2

        as MaxRowtable

       where  Field1 like ('tbl_%')

        and MaxRowTable.RowID

        > TableNameCol.rowid), (select max(RowID) + 1 from Test2))

    -- END SQL CODE

    3.  You can then select which records need to be inserted based on some part of field1 (substring(field1,11,10)).  Let's just say that's the key for your insert table.  That would lead to the following script.  (replace <tablename> with the actual tablename in which you'll be inserting/updating records.

    --BEGIN SQL CODE

    Insert table <tablename>

    select substring(field1,1,10) as NewField1,

             substring(field1,11,10) as NewKeyField,

             substring(field1,21,10) as NewField2 (yada, yada)

    FROM Test3 where substring(Field1,11,10) not in (select NewKeyField from <tablename&gt and tblname = '<tablename>'

    -- END SQL CODE

    For updates, it would be similar

    -- BEGIN SQL CODE

    UPDATE TABLE <tablename>

    SET  NewField1 = substring(field1,1,10),

           NewField2 = substring(field1,21,10)

    FROM <tablename>, Test3

    WHERE substring(Test3.field1,11,10) = NewKeyField

    AND tblName = '<tablename>'

    -- END SQL CODE

  • Thank you everyone for your posts, it would seem though that i may have left off an important detail that may have an impact upon your suggestions;

    this would be more indicative of the data that needs to be imported;

    tbl_table_name_1

    data¦data¦data¦data¦data

    tbl_table_name_1

    data¦data¦data¦data¦data

    tbl_table_name_1

    data¦data¦data¦data¦data

    tbl_table_name_2

    data¦data¦data¦data¦data¦data¦data¦data¦data¦data¦

    tbl_table_name_2

    data¦data¦data¦data¦data¦data¦data¦data¦data¦data¦

    tbl_table_name_3

    data¦data¦data¦data¦data¦data¦data

    tbl_table_name_3

    data¦data¦data¦data¦data¦data¦data

    tbl_table_name_3

    data¦data¦data¦data¦data¦data¦data

    tbl_table_name_3

    data¦data¦data¦data¦data¦data¦data

    tbl_table_name_3

    data¦data¦data¦data¦data¦data¦data

    this can cover many table_names and many items (100s of rows) may need to be added/updated in a given table, and then only 1 row in another.

    your examples should get me on the right track but i realise that my data example was not accurate.

    thank you. Martin

  • Martin,

    Have you considered using a text editor such as PFE to manipulate your data before importing into SQL Server. You can download pfe for free from http://www.lancs.ac.uk/people/cpaap/pfe/

    This does depend on the volume and consistency of the data within your text file, but it allows you to run a macro to move the "table name" row of data to the beginning of the "data" row. The data can then be sorted, and seperated into seperate table specific text files.

    Hope this is of use.

    Thanks,

    Liz

  • The best approach if there's a lot of data and the distribution of table names is large and not always sequential would be to combine records n and n+1 into a single row with two columns: Table_Name and Table_Data. Once you have a table with one row per table/data pair it should be fairly easy to read the table and perform inserts of the Table_Data portion into the appropriate Table_Name using a separate routine for each Table_Name.

    Make sense?

     

  • One way is to use the age old approach and it may be OK in terms of peformance if the number of records to be loaded is in 100s or in thousands but may not be if in millions.

    Load the data to a temp table which has columns equal to the maximum number of columns.

    Create a stored proc with a cursor. Go throgh the data rows using the cursor and move data into appropriate table based on the header row.

    good luck

     

     

  • Yikes! Data like that can be pre-processed with PERL really easily. Then use BULK IMPORT / BCP to get it into sql server.

    Personally though, I'd ask for a better separated feed from the data provider, and save myself a lot of hassle.


    Julian Kuiters
    juliankuiters.id.au

  • Thanks for your replies - we have been looking into these methods and wheras they may work they do seem to be a very long way round.

    Is there a way that we could use an ActiveX script to read the tbl_name line, then move to the next line and grab that as the data, we could the construct an If, Else, Update, Insert etc procedure based upon the tbl_name for that line of data which can then return a complete or failiure message that we can error manage, then it moves on to the next line (the next tbl_name) and repeat.

    As for a separated feed - i have been back to the data provider and for many (valid) reasons this will ot be an option.

    Thank you for continued suggestions/support, i am new to this but this needs to be cracked asap.

    Thanks again - Martin.

  • The method I suggested still works with your data structure.   The ActiveX script would definitely work but it would probably take literally 1000's of times longer (although with hundreds of records, we're still talking about seconds).  The method I've suggested uses set based manipulation which is orders of magnitude (not sure what that actually means, but it sounds very powerful) faster than reading 1 line and then acting on it, reading another...

    The biggest problem with vbscript is that it would require a lot of coding know-how and it's not that easy to work with. 

    The advantage of vbscript is that you wouldn't have to worry about handling new tablenames. 

    The method I suggested has hard coded names, although you could (relatively) easily write dynamic scripts to do the same thing.  

  • Right, our developer has picked up the baton on this one and using bits of your suggestions and some of his own research and bits that I had found has come up with a method that is working for us using ActiveX - sample to follow.

    What we have is a DTS Package that drops tbl_temp_Import (if it exists) and then creates a table called tbl_temp_Import with just 1 column, we then connect to the flat file and dump all of the data from it into tbl_temp_Import. We then use a custom ActiveX Script to interrogate the data in tbl_temp_Import and act upon it.

    Here is a sample of the ActiveX Script containing 1 CASE and its corresponding SUB;

    Dim objconn

    Set objconn  = CreateObject("ADODB.Connection")

    objconn.ConnectionTimeout = 5 ' Sets a 5 Second Timeout for connection to the database

    objconn.Open "Provider=SQLOLEDB.1;Persist Security Info=;SERVER=;UID=;PWD=;DATABASE="

    Function Main()

     Dim rsData, strTempTableVar, rsCheck, arrItems, strSQL

     Set rsData = objConn.Execute("SELECT * FROM [temp_import]")

     If not rsData.eof then

      Do while not rsData.eof

       strTempTableVar = rsData("col001")

       Set rsCheck = objConn.Execute("SELECT COUNT(*) As 'count' FROM SysObjects WHERE name = 'tbl_"&strTempTableVar&"'")

       if rsCheck("count") > 0 Then

           'The table exists

        'MOVE TO DATA ROW

        If not rsData.eof then

         rsData.MoveNext

        Else

         'CANNOT MOVE TO DATA ROW - ERROR REPORT AND EXIT

         Main = DTSTaskExecResult_Success

         EXIT FUNCTION

        End If

        arrItems = SPLIT(rsData("col001"),"¦")

        '### CALL ASSOCIATED SUB

        SELECT CASE UCASE(strTempTableVar)

         CASE "CUSTOMER"

          Customer arrItems

         CASE "CUSTOMER-DELIVERY-ADDRESS"

          customer_delivery_address arrItems

        END SELECT    

       else

           'The table doesn't exist - do something

       end if

       arrItems = ""

       If not rsData.eof then

        'MOVE TO NEXT ROW CONTAINING TBL NAME

        rsData.MoveNext

       End If

      Loop

     Else

      'NO DATA!

     End If

     objconn.close

     Main = DTSTaskExecResult_Success

    End Function

    Function fnCheckRecord(tblName,tblCell,idValue)

     Dim intReturnCode,rsCheck

     intReturnCode = 0

     Set rsCheck = objConn.Execute("SELECT ["&tblCell&"] FROM ["&tblName&"] WHERE ["&tblCell&"] ="&idValue&"")

     If not rsCheck.eof then

      intReturnCode = 1

     End If

     fnCheckRecord = intReturnCode

    End Function

    Sub sbCheckDeleteDate(tblName,tblCell,idValue,tblIdCell)

     Dim rsCheck

     Set rsCheck = objConn.Execute("SELECT ["&tblCell&"] FROM ["&tblName&"] WHERE ["&tblIdCell&"] ="&idValue&"")

     If not rsCheck.eof then

      If rsCheck(""&tblCell&"") <> "?" then

       'NEED TO REMOVE

       objConn.Execute("DELETE FROM ["&tblName&"] WHERE ["&tblIdCell&"] ="&idValue&"")

      End If

     End If

    End Sub

    Sub customer_delivery_address(arrItems)

     Dim intAction

     'CHECK THAT ARRAY ITEMS MATCH COL INPUTS

     If UBOUND(arrItems) = 16 then

      intAction = fnCheckRecord("tbl_customer_delivery_address","delivery-address-number",arrItems(11))

      If intAction = 1 then

       '### RECORD ALREADY EXISTS

       strSQL = "UPDATE tbl_customer_delivery_address SET [address] = '"&fnValidateForSQL(arrItems(0))&"',[address2] = '"&fnValidateForSQL(arrItems(1))&"',[address3] = '"&fnValidateForSQL(arrItems(2))&"',[address4] = '"&fnValidateForSQL(arrItems(3))&"',[address5] = '"&fnValidateForSQL(arrItems(4))&"',[country-code] = '"&fnValidateForSQL(arrItems(5))&"',[customer-fname] = '"&fnValidateForSQL(arrItems(6))&"',[customer-number] = "&fnValidateForSQL(arrItems(7))&",[customer-sname] = '"&fnValidateForSQL(arrItems(8))&"',[daytime-tel] = '"&fnValidateForSQL(arrItems(9))&"',[delete-date] = '"&fnValidateForSQL(arrItems(10))&"',[email-address] = '"&fnValidateForSQL(arrItems(12))&"',[friendly-name] = '"&fnValidateForSQL(arrItems(13))&"',[house-name-number] = '"&fnValidateForSQL(arrItems(14))&"',[notes] = '"&fnValidateForSQL(arrItems(15))&"',[post-code] = '"&fnValidateForSQL(arrItems(16))&"' WHERE [delivery-address-number] = "&arrItems(11)&""

      Else

       '### INSERT NEW RECORD

       strSQL = "SET IDENTITY_INSERT tbl_customer_delivery_address ON INSERT INTO tbl_customer_delivery_address ([address],[address2],[address3],[address4],[address5],[country-code],[customer-fname],[customer-number],[customer-sname],[daytime-tel],[delete-date],[delivery-address-number],[email-address],[friendly-name],[house-name-number],[notes],[post-code]) VALUES ('"&fnValidateForSQL(arrItems(0))&"','"&fnValidateForSQL(arrItems(1))&"','"&fnValidateForSQL(arrItems(2))&"','"&fnValidateForSQL(arrItems(3))&"','"&fnValidateForSQL(arrItems(4))&"','"&fnValidateForSQL(arrItems(5))&"','"&fnValidateForSQL(arrItems(6))&"',"&fnValidateForSQL(arrItems(7))&",'"&fnValidateForSQL(arrItems(8))&"','"&fnValidateForSQL(arrItems(9))&"','"&fnValidateForSQL(arrItems(10))&"',"&fnValidateForSQL(arrItems(11))&",'"&fnValidateForSQL(arrItems(12))&"','"&fnValidateForSQL(arrItems(13))&"','"&fnValidateForSQL(arrItems(14))&"','"&fnValidateForSQL(arrItems(15))&"','"&fnValidateForSQL(arrItems(16))&"') SET IDENTITY_INSERT tbl_customer_delivery_address OFF"

      End If

      objConn.Execute(strSQL)

      '### CHECK DELETE DATE

      sbCheckDeleteDate "tbl_customer_delivery_address","delete-date",arrItems(11),"delivery-address-number"

     Else

      'WRONG NUMBER OF ARRAY ELEMENTS/REPORT ERROR

     End If

    End Sub

    As im sure you can see - at a basic level what this code does is connect to tbl_temp_Import and get the first row (a table name), if that table exists then it moves to the next row (the delimited data) and acts upon it by selecting the correct CASE based upon the table name and then runs the SUB which processes the data, if it contains the delete flag then the row is Deleted otherwise it is Inserted or Updated. Then it moves on to the next row and so on.

    As you can see there are many lines of code here (this sample contains the processing for 1 table - in total we have about 20, some more complex that others), however now that the full script is completed we have tested it and we know that it works. The next step is to see if we can streamline this in any way and we need some pointers/best practice concerning error management so if anyone has any suggestions then please feel free.

    We are looking to have emails constructed containing the details of the errors which can be punted out to us at the end of the process containing the details of the error so that we can act upon them. I am looking into SQL Server Agent at the moment but as yet am unsure whether I can use it to grab the errors from the ActiveX Script, we may need to include the email stuff within the script so any code examples/pointers/conventions would be great. In the DTS Designer I can see that there is an email task so I am going to look into that soon also.

    Things are moving forward now so thank you for your help so far, further input would be most appreciated.

    Thank you, Martin & Karl

  • I'm glad your solution works, though I say this with full acknoweldgement ofmy bias against vbscript, but YIKES, including the in-line SQL 

    You sure it's not easier to break your original file down into seperate files (one per table) before hand and then use DTS to do simple file:table mapping, using its designer to handle the SQL stuff?

     

     

Viewing 15 posts - 1 through 15 (of 17 total)

You must be logged in to reply to this topic. Login to reply