create merged strings from a source table

  • --Source table :

    Create table T

    (attr_id int,

    property_id varchar (255),

    value varchar (255))

    GO

    Insert into T (attr_id, property_id, value)

    Select 11, 'ENT' , '-23'

    UNION ALL

    Select 22, 'HSD' , 'more than 532'

    UNION ALL

    Select 11, 'ENT' , '>34'

    UNION ALL

    Select 22, 'HSD' , '-23'

    UNION ALL

    Select 11, 'ENT' , '<3456'

    UNION ALL

    Select 22, 'HSD' , '-23'

    GO

    Select * from T

    --the Source table looks like:

    attr_id property_id      value [varchar]

    11       ENT                -23

    22       HSD                more than 532

    11       ENT                   >34

    22       HSD                -23

    11       ENT                <3456

    22      HSD                   -23

    --and 100000 more records like this

    --Task: Create another table in which all values for every unique pair of attr_id and property_id would be MERGED; separate them by '; '; truncate the trailing ';'

    --result table needs to look like:

    attr_id property_id      valueMerged

    11       ENT               -23; <3456; >34

    22       HSD               -23; -23; more than 532

    --?: does anyone know an efficient way of doing this?

    --What I ended up doing was creating a finction:

    CREATE  FUNCTION GetConcatString_v2(@attr_id int, @property_id varchar(255))

    RETURNS varchar(1000)

    AS

    BEGIN

        DECLARE @S varchar(1000)

        SET @S = ''

    SELECT @S = @S + value + '; '

        FROM (select value from T WHERE attr_id = @attr_id and property_id = @property_id

     AND value is not null) S1

        ORDER BY value

        if len(@s) = 0  set @S = 'empty string'

        SELECT @S = left(@s, len(@s)-1)

        RETURN @S

    END

    --and then using it in SELECT:

    select attr_id, property_id, dbo.getConcatString_v2(attr_id, property_id) as valueMerged

    --INTO _D_src_Tindep_merged

    from T WHERE value is not null

    group by attr_id, property_id

       order by attr_id, property_id

    --Could it be done with less code?

    --the 'real' source table contains 100000 rows. It took the above code 15 min to precess the source data into merged strings.

    Thanx

  • It's pretty much the only way to do this.

    Check the execution plan and make sure that the function's query is as optmizied as possible.

    I'd also make sure that there's a compound index on those colums :

    value, attr_id, property_id (in the best order possible). This will allow to make an index seek each time with having to go to the clustered index, speeding the query greatly (especially if it's doing some sort of scan now).

  • thanks for yr response. I'm glad I was on a right track after all.

     

  • Did you check for the execution plan? Did it use a scan? has the compound index done anything to speed this up?

  • i haven't had a chance yet; at work now, busy w/ smth else 

  • K... please post the results.. I'm curious to see if I'm right about this one.

  • I was able to introduce a clustered key 'ckey' in the source table

    chemid   prop_id    value     ckey   pkey

    203       VC          1.71      1        95359

    1655     TPT        160.75      2        24700

    1290      TC            552     3        36531

    1290      TC          552       3        41925

    1347     HSTD      -570.9     4        35296

    1347     HSTD      -569.024   4         37084

    for every unique couple (chemid , prop_id&nbsp

    when I run the above SELECT with a slightly modified function , I got the performace enhancment of ~300X! Instead of 15 min in ran 3 sec!

    thanks for the tip Remi!

    ______________________

    select ckey, dbo.GetConcatString_clusteredTest(ckey) as valueMerged

    INTO _D_src_Tindep_clusterTest_del

    from _D_src_Tindep_del where value is not null --and ckey < 100

    group by ckey

     

     

  • correction: 'chemid' in the previous post sh've been representing 'attr_id'

    I slightly misrepresented the data structure , but the bottom line is clustered index helped a lot!

  • The compound index could have had the exact same effect. This would have been the case because all the values needed to generate the select would have been retrieved directly from the index, therefore not reading any data page from the table.

  • just realized that I should not have introduced that new field 'ckey' (and 'pkey' for that matter). I could simply create a clustered (compound?) index for ('chem_id' and 'property_id) and then run ver1 of the code with the same (can i say 'blazing' ?) speed.

    Remi, could u pls explain in brief difference between clustered and compound indexes?

    Thanks

  • in the prevous post I meant ' create clustered index in Enterprise Manager interface'

  • Doesn't even need to be clustered (but I'd like to see the speed difference, if any).

    Check this for information on covering (compound) index (covering being the best term for this technic).

    http://www.sql-server-performance.com/nb_index_intersection.asp

Viewing 12 posts - 1 through 11 (of 11 total)

You must be logged in to reply to this topic. Login to reply