Calling SQL Magicians

  • In a table there is field "txt1". In one record, this field contains "Profit:The return received on a business undertaking after all operating expenses have been met".

    In another record, the field contains "Profit: The return received on an investment after all charges have been paid. Profit is a good thing."

    Notice the word "profit" is used once in the first record and twice in the second record.

    Here is the question: Can I write a query that will count the instances of the word "profit" and give me a result set like...

    Record     Count

    1            1

    2            2

         

     

  • Yes you can. Use this as an example:

    declare @STR nvarchar(1000)

    set @STR = 'profit profit'

    select len(replace(@str, 'profit', 'profit_'))-len(@str)

    -Mike


    Michael Levy
    ma_levy@hotmail.com

  • Do a search in the script section here. I'm pretty sure you'll find something useful there.

    --
    Frank Kalis
    Microsoft SQL Server MVP
    Webmaster: http://www.insidesql.org/blogs
    My blog: http://www.insidesql.org/blogs/frankkalis/[/url]

  • The LEN(Replace.... finds the instances - even if it is used in another word (i.e. profitability).

    How can I get only EXACT matches??

    Any ideas

  • I'm not a huge fan of the LEN(REPLACE) method (it fails for strings over 8000 characters); instead, I'd recommend:

    First create a table of numbers:

    SELECT IDENTITY(int, 1, 1) AS Number

    INTO #Numbers

    FROM pubs..authors a,

    pubs..authors b,

    pubs..authors c

    Then:

    SELECT YourTable.PK, COUNT(*) AS NumberOfProfit

    FROM #Numbers N

    CROSS JOIN YourTable

    WHERE

    SUBSTRING(YourTable.Col, N.Number, LEN('profit')) = 'profit'

    AND PATINDEX('%[a-z0-9]%', SUBSTRING(YourTable.Col, N.Number + LEN('profit'), 1)) = 0

    GROUP BY YourTable.PK

    ... This query assumes that any non-alphanumeric character is either punctuation or white space. You might want to modify it if that's not the case...

    --
    Adam Machanic
    whoisactive

  • By the way, that query only looks at characters AFTER the word 'profit' ...

    So it would find a match (if it were a word) on 'unprofit'.

    If that's a problem, change it to:

    SELECT YourTable.PK, COUNT(*) AS NumberOfProfit

    FROM #Numbers N

    CROSS JOIN YourTable

    WHERE

    SUBSTRING(YourTable.Col, N.Number, LEN('profit')) = 'profit'

    AND PATINDEX('%[a-z0-9]%', SUBSTRING(YourTable.Col, N.Number + LEN('profit'), 1)) = 0

    AND PATINDEX('%[a-z0-9]%', SUBSTRING(YourTable.Col, N.Number - 1, 1)) = 0

    GROUP BY YourTable.PK

    --
    Adam Machanic
    whoisactive

  • I turned this into a UDF, which you can find here:

    http://sqljunkies.com/WebLog/amachanic/articles/CountSubstring.aspx

    --
    Adam Machanic
    whoisactive

Viewing 7 posts - 1 through 6 (of 6 total)

You must be logged in to reply to this topic. Login to reply