Help needed to decode comma delimited string and insert into tables

  • JJH82 (2/22/2011)


    hi kumar20, the script cannot process some name characters, e.g. '&', '<'

    ',2 Column_Names "FIELD_NAME" "READ_FLG" ,2 Old_Column_Values "Rural & Remote" "N" ,2 New_Column_Values "" ""'

    '+2 Column_Names "ATTRIB_VALUE" "SCORE" +2 New_Column_Values "<5 volunteers:1 staff" "0" +2 Old_Column_Values "" ""'

    is there a way to overcome that?

    Sorry I didn't get to the code last night. The above smacks of another question that I need to ask... will the values ever have embedded commas? Will they ever have embedded double quotes?

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.
    "Change is inevitable... change for the better is not".

    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)
    Intro to Tally Tables and Functions

  • There will not be any comma in the strings, but the strings are wrapped in double quotes

    Jeff Moden (2/23/2011)


    JJH82 (2/22/2011)


    hi kumar20, the script cannot process some name characters, e.g. '&', '<'

    ',2 Column_Names "FIELD_NAME" "READ_FLG" ,2 Old_Column_Values "Rural & Remote" "N" ,2 New_Column_Values "" ""'

    '+2 Column_Names "ATTRIB_VALUE" "SCORE" +2 New_Column_Values "<5 volunteers:1 staff" "0" +2 Old_Column_Values "" ""'

    is there a way to overcome that?

    Sorry I didn't get to the code last night. The above smacks of another question that I need to ask... will the values ever have embedded commas? Will they ever have embedded double quotes?

  • Ok... thanks. I'll be back.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.
    "Change is inevitable... change for the better is not".

    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)
    Intro to Tally Tables and Functions

  • @JJH82,

    Dang... I've just not been able to devote the time to this problem. Do you have a solution, yet, or do you still need help on this one?

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.
    "Change is inevitable... change for the better is not".

    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)
    Intro to Tally Tables and Functions

  • JJH82 (2/17/2011)


    a table called CLOB contains ROW_ID (pk)

    PARSE_CLOB (where the strings sit)

    Are you using OPENROWSET BULK SINGLE_CLOB to load the initial table from a file? If so, best performance would come from combining the reading and decoding into a single step. If you can provide a sample text file and a table definition to hold the final parsed output, I'd be happy to provide a solution.

  • SQLkiwi (2/24/2011)


    JJH82 (2/17/2011)


    a table called CLOB contains ROW_ID (pk)

    PARSE_CLOB (where the strings sit)

    Are you using OPENROWSET BULK SINGLE_CLOB to load the initial table from a file? If so, best performance would come from combining the reading and decoding into a single step. If you can provide a sample text file and a table definition to hold the final parsed output, I'd be happy to provide a solution.

    That'll be worth the price of admission... please see attached. 🙂

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.
    "Change is inevitable... change for the better is not".

    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)
    Intro to Tally Tables and Functions

  • to decode 9632 records into 32,000+, it took 9 hours and 45 minutes using kumar20's method.

    I created a loop to process 1000 Records at a time.

    any comments or suggestions please?

    SET NOCOUNT ON

    DECLARE @T_Data TABLE

    (

    ID INT IDENTITY(1, 1), Data NVARCHAR(MAX)

    )

    INSERT INTO @T_Data

    SELECT ',2 Column_Name "Col1" "Col2" , 2 New_Column_Value "A" "B", 2 Old_Column_Value "a" "b"'

    UNION

    SELECT ',2 Field_Name "F1" "F2", 2 New_Field_Name "F1" "F2", 2 Old_Field_Name "f1" "f2" ,1 Column_Name "Col3" ,1 New_Column_Value "C", 1 Old_Column_Value "c"'

    UNION

    SELECT ',1 Field_Name "F3", 1 New_Field_Name "F3", 1 Old_Field_Name "f3" ,1 Column_Name "Col6" ,1 New_Column_Value "1<", 1 Old_Column_Value "2&"'

    DECLARE @T_FirstLevel TABLE

    (

    Lvl1ID INT IDENTITY(1, 1), ID INT, Lvl1Data NVARCHAR(MAX)

    )

    INSERT INTO @T_FirstLevel

    SELECT ID, LTRIM(SUBSTRING(Data, number+1, CHARINDEX(',', Data, number+1)-number - 1) ) AS 'Data'

    FROM (SELECT ID, Data + ',' AS Data FROM @T_Data) A

    INNER JOIN master..spt_values N ON N.number < LEN(A.Data)

    AND N.type = 'p'

    WHERE LTRIM(SUBSTRING(Data, number+1, CHARINDEX(',', Data, number+1)-number - 1) ) <> ''

    AND SUBSTRING(Data, number, 1) = ','

    DECLARE @T_SecondLevel TABLE

    (

    Lvl2ID INT IDENTITY(1, 1), Lvl1ID INT, ID INT, Lvl2Data NVARCHAR(MAX),

    ColRow INT

    )

    INSERT INTO @T_SecondLevel

    SELECT Lvl1ID, ID, LTRIM(SUBSTRING(Lvl1Data, number+1, CHARINDEX(' ', Lvl1Data, number+1)-number - 1) ) AS 'Data'

    , ROW_NUMBER() OVER (PARTITION BY Lvl1ID ORDER BY Lvl1ID, ID) AS ColRow

    FROM (SELECT Lvl1ID, ID, ' ' + Lvl1Data + ' ' AS Lvl1Data FROM @T_FirstLevel) A

    INNER JOIN master..spt_values N ON N.number < LEN(A.Lvl1Data)

    AND N.type = 'p'

    WHERE LTRIM(SUBSTRING(Lvl1Data, number+1, CHARINDEX(' ', Lvl1Data, number+1)-number - 1) ) <> ''

    AND SUBSTRING(Lvl1Data, number, 1) = ' '

    DECLARE @T_Final TABLE

    (

    Lvl2ID INT, Lvl1ID INT, ID INT, Lvl2Data NVARCHAR(MAX), ColRow INT,

    Cnt INT, GrpCnt INT, Batch INT

    )

    INSERT INTO @T_Final

    SELECT T.*, (N.number + 1) AS GrpCnt, NULL AS Batch

    FROM (SELECT

    Lvl2ID, Lvl1ID, ID, Lvl2Data

    , CASE WHEN ColRow > 3 THEN 3 ELSE ColRow END AS ColRow

    , COUNT(CASE WHEN PATINDEX('%"%', Lvl2Data) > 0 THEN 1 ELSE NULL END) OVER (PARTITION BY Lvl1ID) AS Cnt

    FROM @T_SecondLevel) T

    LEFT JOIN master..spt_values N ON (N.type = 'p' AND N.number <= (T.Cnt - 1) AND T.ColRow < 3)

    ORDER BY T.Lvl2ID

    --create a table for collection final results

    IF EXISTS (

    SELECT *

    FROM sysobjects

    WHERE id = object_id(N'[dbo].T_Final'))

    drop table [dbo].T_Final

    CREATE Table T_Final

    (

    RowID int,

    FieldType nvarchar(8),

    FieldName nvarchar(4000),

    NewValue nvarchar(4000),

    OldValue nvarchar(4000)

    )

    --creating a loop to run 1000 records at a time

    DECLARE @y int

    DECLARE @x int

    DECLARE @Count int

    DECLARE @start int

    DECLARE @end int

    SET @x = 1000

    SET @Count = 0

    set @y = (select max(ID) from @T_Final)

    --print @y

    --print @y/@x+1

    while (@Count < @y/@x+1)

    begin

    set @start = @x*@Count+1

    if @x*(@Count+1)>@y

    set @end = @y

    else

    set @end = @x*(@Count+1)

    --print 'Start from '+convert(nvarchar(12),@start)+' End at '+convert(nvarchar(12),@end)

    --

    --set @Count = @Count+1

    --end

    SET @T1 = getdate()

    UPDATE F1

    SET GrpCnt = F2.Row

    FROM @T_Final F1

    INNER JOIN (SELECT *, ROW_NUMBER() OVER (PARTITION BY Lvl1ID ORDER BY Lvl2ID) AS Row

    FROM @T_Final WHERE GrpCnt IS NULL ) F2 ON F1.Lvl2ID = F2.Lvl2ID

    WHERE F1.GrpCnt IS NULL AND F1.ID between @start AND @end

    /*

    UPDATE F1

    SET Batch = F2.Lvl2Data

    FROM @T_Final F1

    INNER JOIN @T_Final F2 ON F1.Lvl1ID = F2.Lvl1ID

    AND F2.ColRow = 1

    */

    SET @T2 = getdate()

    print 'Runtime of 1st updating @T_Final table is '+convert(nvarchar(12),datediff(ss,@T1,@T2))+' seconds.'

    SET @T1 = getdate()

    UPDATE F1

    SET Batch = N.Number

    FROM @T_Final F1

    INNER JOIN master..spt_values N ON F1.Lvl1ID BETWEEN N.number AND (N.number * 3)

    WHERE N.number <> 0 AND F1.ID between @start AND @end

    --SELECT * FROM @T_Final

    SET @T2 = getdate()

    print 'Runtime of 2nd updating @T_Final table is '+convert(nvarchar(12),datediff(ss,@T1,@T2))+' seconds.'

    SET @T1 = getdate()

    INSERT INTO T_Final

    SELECT F1.ID

    , MAX(CASE WHEN F1.Lvl2Data LIKE 'Column%' THEN 'Column'

    WHEN F1.Lvl2Data LIKE 'Field%' THEN 'Field' END) as 'FieldType'

    , MAX(CASE WHEN F1.Lvl2Data IN ('Column_Names', 'Field_Names') AND F2.ColRow = 3 THEN F2.Lvl2Data END) as 'FieldName'

    , MAX(CASE WHEN F1.Lvl2Data LIKE 'New%'/*IN ('New_Column_Values', 'New_Field_Names')*/ AND F2.ColRow = 3 THEN F2.Lvl2Data END) AS 'NewValue'

    , MAX(CASE WHEN F1.Lvl2Data Like 'Old%'/*IN ('Old_Column_Values', 'Old_Field_Names')*/ AND F2.ColRow = 3 THEN F2.Lvl2Data END) AS 'OldValue'

    FROM @T_Final F1

    INNER JOIN @T_Final F2 ON F1.Lvl1ID = F2.Lvl1ID AND F1.GrpCnt = F2.GrpCnt AND F1.Batch = F2.Batch AND F1.ID between @start AND @end

    GROUP BY F1.ID, F1.GrpCnt, F1.Batch

    ORDER BY 1, 2

    SET @T2 = getdate()

    print 'Runtime of producing final table is '+convert(nvarchar(12),datediff(ss,@T1,@T2))+' seconds.'

    set @Count = @Count+1

    end

  • JJH82 (2/27/2011)


    to decode 9632 records into 32,000+, it took 9 hours and 45 minutes using kumar20's method.

    I created a loop to process 1000 Records at a time. any comments or suggestions please?

    Are you able to use CLR code on this server? The task at hand is procedural by nature, and would benefit from being written in a language that is optimized for that. The reason I haven't responded with a solution so far is that the majority of the cost is in decoding the records, not bulk loading the CLOB in the first place - the optimization I had in mind just combines the bulk load with the decoding.

  • yes, i believe so

  • JJH82 (2/27/2011)


    yes, i believe so

    Specifically, I am asking if it is OK to use a SQLCLR stored procedure or function on this SQL Server. Do you have any experience in a .NET language like C#?

  • yes, I can create or use stored procedures on the server.

    :crazy:but i have not .Net programming experience

  • JJH82 (2/27/2011)


    to decode 9632 records into 32,000+, it took 9 hours and 45 minutes using kumar20's method.

    I created a loop to process 1000 Records at a time.

    any comments or suggestions please?

    Just out of curiosity, what was the maximum length of any of those records in bytes?

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.
    "Change is inevitable... change for the better is not".

    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)
    Intro to Tally Tables and Functions

  • nvarchar(4000) is set to the column, so I assume that no string is longer than 4000.:-)

  • JJH82 (2/27/2011)


    nvarchar(4000) is set to the column, so I assume that no string is longer than 4000.:-)

    Very cool. I'm sure that Paul will come up with something great on the CLR side of the house. It's been proven many times that a properly written CLR will beat any other kind of splitter/parser in T-SQL.

    Still, I'll give it a try on the T-SQL side.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.
    "Change is inevitable... change for the better is not".

    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)
    Intro to Tally Tables and Functions

  • thanks guys

Viewing 15 posts - 16 through 30 (of 40 total)

You must be logged in to reply to this topic. Login to reply