Parsing data

  • Hi all

    i have a rek.i have a column >>102392-293,98390283-109,1928374-495

    now i want the o/p as

    102392-293

    98390283-109

    1928374-495

    Its parsing the column based on the comma.The Comma seperated values here are jus 3,but it may range upto 10 values.

    I have done this in sql server user defined function.In my proj i have to parse 80 million rows of these kind.SO many suggested me to go for c# parsing.So can anyone tell me the code for this funciton in c# and how can i use that function in sql server 2005.

    I want to call this function from a stored proc and do a cross apply.

    Thanks

    MohanV

  • You might look into creating a CLR function involving the "String.Split()" method.

  • If you don't use CLR on a regular base or you'd like to see an alternative and still extremely fast way to split a string please have a look at the Tally Table link in my signature or search this site for "string split function".

    There are some solutions out there that won't need a while loop or a cursor and still don't require a CLR.



    Lutz
    A pessimist is an optimist with experience.

    How to get fast answers to your question[/url]
    How to post performance related questions[/url]
    Links for Tally Table [/url] , Cross Tabs [/url] and Dynamic Cross Tabs [/url], Delimited Split Function[/url]

  • The CLR solution is faster than T-SQL solution in all but a very few cases - and even then there's not much in it.

    Full code, by Adam Machanic, is available here: http://sqlblog.com/blogs/adam_machanic/archive/2009/04/28/sqlclr-string-splitting-part-2-even-faster-even-more-scalable.aspx

    Paul

  • Paul White (2/19/2010)


    The CLR solution is faster than T-SQL solution in all but a very few cases - and even then there's not much in it.

    Paul

    That's probably true (I trust your judgement 😉 ).

    But if that would be the only CLR within all my databases I most certainly wouldn't use it, just because of consistency. But as soon as CLR starts to be part of my SQL code, the function you mentioned will be part of it.



    Lutz
    A pessimist is an optimist with experience.

    How to get fast answers to your question[/url]
    How to post performance related questions[/url]
    Links for Tally Table [/url] , Cross Tabs [/url] and Dynamic Cross Tabs [/url], Delimited Split Function[/url]

  • lmu92 (2/20/2010)


    That's probably true (I trust your judgement 😉 ).

    But if that would be the only CLR within all my databases I most certainly wouldn't use it, just because of consistency. But as soon as CLR starts to be part of my SQL code, the function you mentioned will be part of it.

    No need to trust my judgement (thanks though!) - a very full battery of tests was performed here not so long ago. You can see a comprehensive and clear summary of the results on Flo's blog.

    I often hear the objection about not wanting to use CLR just for one thing 🙁

    The trouble is, until you add one, the next one will always be the first one 😉

    CLR stuff is not a panacea, but anyone omitting it from their tool kit is likely making life a little harder than it needs to be...:-D

    Paul

  • Paul White (2/20/2010)


    ...

    I often hear the objection about not wanting to use CLR just for one thing 🙁

    The trouble is, until you add one, the next one will always be the first one 😉

    CLR stuff is not a panacea, but anyone omitting it from their tool kit is likely making life a little harder than it needs to be...:-D

    Paul

    I'm not really refusing to use CLR's... It's more that I haven't been in a situation where I couldn't find a pure SQL solution that performed good enough.

    (Un)Fortunately, in my job I have to deal with a few million rows per table at most and we're not dealing with that heavy traffic on our Server (a few 100k rows per day across all tables).

    But you're right, I should start looking deeper into CLR programming.



    Lutz
    A pessimist is an optimist with experience.

    How to get fast answers to your question[/url]
    How to post performance related questions[/url]
    Links for Tally Table [/url] , Cross Tabs [/url] and Dynamic Cross Tabs [/url], Delimited Split Function[/url]

  • lmu92 (2/20/2010)


    Paul White (2/20/2010)


    ...

    I often hear the objection about not wanting to use CLR just for one thing 🙁

    The trouble is, until you add one, the next one will always be the first one 😉

    CLR stuff is not a panacea, but anyone omitting it from their tool kit is likely making life a little harder than it needs to be...:-D

    Paul

    I'm not really refusing to use CLR's... It's more that I haven't been in a situation where I couldn't find a pure SQL solution that performed good enough.

    (Un)Fortunately, in my job I have to deal with a few million rows per table at most and we're not dealing with that heavy traffic on our Server (a few 100k rows per day across all tables).

    But you're right, I should start looking deeper into CLR programming.

    You could'nt find one here? Sure the CLR solutions are generally faster, but some of the SQL solutions are pretty good too.

    [font="Times New Roman"]-- RBarryYoung[/font], [font="Times New Roman"] (302)375-0451[/font] blog: MovingSQL.com, Twitter: @RBarryYoung[font="Arial Black"]
    Proactive Performance Solutions, Inc.
    [/font]
    [font="Verdana"] "Performance is our middle name."[/font]

  • RBarryYoung (2/21/2010)


    You could'nt find one here? Sure the CLR solutions are generally faster, but some of the SQL solutions are pretty good too.

    That's the super-long thread that prompted both Flo's blog and Adam Machanic's definitive CLR solution.

    Both those a rather more accessible than the thread 🙂

    I just don't find the T-SQL solutions to be as neat, and they aren't as consistent.

    T-SQL just seems like the wrong tool for the job. You can cut boards with a hammer (T-SQL) but a CLR saw might be better.

    Paul

  • For me, T-SQL is almost always good enough for something as simple as a non-quoted CSV string. It just doesn't get called enough (in a good relational design) to me trying to push a customer over the CLR hump, and then handle the follow-up justifications to the Sox auditors.

    But CSV's (with no quoting) are as far as I'll normally go with T-SQL. Anything more complicated than that is "parsing", not "string-splitting" and then I'll push for CLR pretty hard.

    [font="Times New Roman"]-- RBarryYoung[/font], [font="Times New Roman"] (302)375-0451[/font] blog: MovingSQL.com, Twitter: @RBarryYoung[font="Arial Black"]
    Proactive Performance Solutions, Inc.
    [/font]
    [font="Verdana"] "Performance is our middle name."[/font]

  • Well I think we can agree on this much: an occasional requirement to split a string isn't a good enough reason to enable CLR

    But to get the most out of SQL Server you should probably be using CLR stuff anyway, so you might as well use the best method for string splitting while you're at it 😀

    Paul

Viewing 11 posts - 1 through 10 (of 10 total)

You must be logged in to reply to this topic. Login to reply