Can someone help me parse a free text field

  • Can someone please help me? I've been looking at this data and my eyes and head now hurt like hell. Here is my dilema:

    I have a free text field, that I need to parse out to various possible usable values.

    These values I need to extract are DrugName, Strength, Form.

    Here is a sample of the data I am looking at (Table: DrugNameTable):

    DrugName

    CIPROFLOXACIN 500 MG TAB

    METOPROLOL TARTRATE 25 MG TAB

    VINORELBINE TARTRATE 35 MG IN SODIUM CHLORIDE 0.9% INJ 50 ML

    VITAMIN E 30 IU/GM OINT TUBE

    What I need to try to accomplish with respect to record #1 would look like the following:

    DrugName = CIPROFLOXACIN

    Strength = 500MG

    Form = TAB

    What I need to try to accomplish with respect to record #2 would look like the following:

    DrugName = METOPROLOL TARTRATE

    Strength = 25MG

    Form = TAB

    What I need to try to accomplish with respect to record #3 would look like the following:

    DrugName = VINORELBINE TARTRATE

    Strength = 35MG

    Form = INJ 50ML

    What I need to try to accomplish with respect to record #4 would look like the following:

    DrugName = VITAMIN E

    Strength = 30IU/GM

    Form = OINT TUBE

    Can someone please help me with this. I have tried parsing it by space, which doesnt work obviously. I replaced the ' MG' with 'MG' which simplifies part of the issue. I have just spent a day looking for any possible solution to this other than someone else doing manual work in Excel.

    Thanks in advance,

    Steve

  • Try this:

    CREATE TABLE DrugNameTable (DrugName varchar(100))

    GO

    INSERT INTO DrugNameTable (DrugName)

    SELECT 'CIPROFLOXACIN 500 MG TAB'

     UNION SELECT 'METOPROLOL TARTRATE 25 MG TAB'

     UNION SELECT 'VINORELBINE TARTRATE 35 MG IN SODIUM CHLORIDE 0.9% INJ 50 ML'

     UNION SELECT 'VITAMIN E 30 IU/GM OINT TUBE'

    GO

    SELECT CASE WHEN DrugName LIKE '%MG%' THEN

      RTRIM(LEFT(DrugName,CHARINDEX('MG',DrugName)-5))

      ELSE RTRIM(LEFT(DrugName,CHARINDEX(' ',DrugName,9)))

      END AS DrugName

     , CASE WHEN DrugName LIKE '%MG%' THEN

      LTRIM(RTRIM(SUBSTRING(DrugName,CHARINDEX('MG',DrugName)-4,7)))

      ELSE LTRIM(RTRIM(SUBSTRING(DrugName,CHARINDEX(' ',DrugName,9),9)))

      END AS Strength

     , CASE WHEN DrugName LIKE '%TAB' THEN

      LTRIM(RIGHT(RTRIM(DrugName),3))

      ELSE LTRIM(RTRIM(RIGHT(RTRIM(DrugName),9)))

      END AS Form

    FROM DrugNameTable

    GO

    Andy

  • Hi Steve,

    I was preparing a parser for you while Andy posted the better solution.

    Anyhow, if you can't predict the literals each time, here is the parser I wrote: (assuming no drug name contains numeric characters and UOMs (Units of Measure) have no spaces in the name and that there is always a space after UOM name.) You may have to trim the "Form" field a bit.

    Afterward you might select distinct on each field to identify those which broke parser assumptions. Hopefully not too many.

    CREATE TABLE #DrugTable ([TextField] varchar(500),[DrugName] varchar(255),[Strength] varchar(50),[Form] varchar(255))

    GO

    INSERT INTO #DrugTable (TextField,DrugName,Strength,Form)

    --------------------------------------------------------------------------------LIVE

    --SELECT Drugname,'','','' FROM DrugNameTable

    --------------------------------------------------------------------------------TEST

    SELECT 'CIPROFLOXACIN 500 MG TAB','','',''

     UNION SELECT 'METOPROLOL TARTRATE 25 MG TAB','','',''

     UNION SELECT 'VINORELBINE TARTRATE 35 MG IN SODIUM CHLORIDE 0.9% INJ 50 ML','','',''

     UNION SELECT 'VITAMIN E 30 IU/GM OINT TUBE','','',''

    ------------------------------------------------------------------------------------

    GO

    declare @MYFIELD varchar(4000),

    @MYCHAR integer,

    @MYSUBSTR varchar(1000),

    @MYCONTROL integer

    DECLARE @MYFREETEXTFIELD varchar(4000)

    DECLARE mytable CURSOR FOR

    select TextField from #DrugTable

    OPEN mytable

    FETCH NEXT FROM mytable INTO @MYFREETEXTFIELD

    WHILE (@@FETCH_STATUS = 0)

    BEGIN

    set @MYFIELD = @MYFREETEXTFIELD

    set @MYCHAR = 0

    set @MYCONTROL = 0

    set @MYSUBSTR = ''

      

    while @MYCHAR < len(@MYFIELD)

    begin

    set @MYCHAR = @MYCHAR + 1

    --SET CONTROL PHASES

    --set control to rest of field

    if @MYCONTROL = 2 and substring(@MYFIELD,@MYCHAR,1) = ' '

    begin

    update #DrugTable set Strength = @MYSUBSTR where TextField = @MYFIELD

    set @MYSUBSTR = ''

    set @MYCONTROL = 3

    end

    --set control to quantity unit

    if @MYCONTROL = 1 and substring(@MYFIELD,@MYCHAR,1) = ' '

    begin

    set @MYCONTROL = 2

    end

    --set control to quantity

    if @MYCONTROL = 0 and isnumeric(substring(@MYFIELD,@MYCHAR,1)) = 1

    begin

    update #DrugTable set DrugName = @MYSUBSTR where TextField = @MYFIELD

    set @MYSUBSTR = ''

    set @MYCONTROL = 1

    end

    --DO OUTPUT BUILDS

    --build drug name

    if @MYCONTROL = 0

    begin

    set @MYSUBSTR = @MYSUBSTR + substring(@MYFIELD,@MYCHAR,1)

    end

    --build quantity and unit

    if @MYCONTROL = 1 or @MYCONTROL = 2

    begin

    set @MYSUBSTR = @MYSUBSTR + substring(@MYFIELD,@MYCHAR,1)

    end

    --build rest of field

    if @MYCONTROL = 3

    begin

    set @MYSUBSTR = @MYSUBSTR + substring(@MYFIELD,@MYCHAR,1)

    end

    end

    update #DrugTable set Form = right(@MYSUBSTR,len(@MYSUBSTR) - 1) where TextField = @MYFIELD

    FETCH NEXT FROM mytable INTO @MYFREETEXTFIELD

    END

    CLOSE mytable

    DEALLOCATE mytable

    GO

    --------------------------------------------------------------------------------LIVE

    --insert or append to actual table from #DrugTable as required

    --------------------------------------------------------------------------------TEST

    select * from #DrugTable

    ------------------------------------------------------------------------------------

    GO

    DROP TABLE #DrugTable

    GO

  • Here's a set-based version of Ian's suggestion. It's not pretty, but I don't think anything you do will be pretty.

    As with Ian's suggestion, I haven't removed 'IN SODIUM CHLORIDE 0.9% '. You'll have to figure out your own rules for that, but the techniques you've seen in the replies should give you an idea of how to do the parsing.

    --data

    declare @DrugNameTable TABLE (DrugName varchar(100))

    INSERT INTO @DrugNameTable (DrugName)

          SELECT 'CIPROFLOXACIN 500 MG TAB'

    UNION SELECT 'METOPROLOL TARTRATE 25 MG TAB'

    UNION SELECT 'VINORELBINE TARTRATE 35 MG IN SODIUM CHLORIDE 0.9% INJ 50 ML'

    UNION SELECT 'VITAMIN E 30 IU/GM OINT TUBE'

    --calculation

    select s, DrugName,

        left(FirstNumberOnwards, FirstSpaceAfterFirstNumber + SecondSpaceAfterFirstNumber - 1) as Strength,

        substring(FirstNumberOnwards, FirstSpaceAfterFirstNumber + SecondSpaceAfterFirstNumber + 1, 100) as Remainder

    from (

        select s, DrugName, FirstNumberOnwards, FirstSpaceAfterFirstNumber,

            patindex('% %', substring(FirstNumberOnwards, FirstSpaceAfterFirstNumber + 1, 100)) as SecondSpaceAfterFirstNumber

        from (

            select s, DrugName, FirstNumberOnwards,

                patindex('% %', FirstNumberOnwards) as FirstSpaceAfterFirstNumber

            from (

                select s, left(s, FirstNumber-2) as DrugName, substring(s, FirstNumber, 100) as FirstNumberOnwards

                from (

                    select DrugName as s, patindex('%[0-9]%', DrugName) as FirstNumber

                    from @DrugNameTable) a) b) c) d

    /*results

    DrugName              Strength  Remainder

    --------------------- --------- ---------------------------------

    CIPROFLOXACIN         500 MG    TAB

    METOPROLOL TARTRATE   25 MG     TAB

    VINORELBINE TARTRATE  35 MG     IN SODIUM CHLORIDE 0.9% INJ 50 ML

    VITAMIN E             30 IU/GM  OINT TUBE

    */

    Ryan Randall

    Solutions are easy. Understanding the problem, now, that's the hard part.

  • Thanks all for helping out here. I'm going to give them all a try today and see which one will yeild the best results re manual work for the Data Entry folks.

    Thanks again for your time on this.

    Steve

Viewing 5 posts - 1 through 4 (of 4 total)

You must be logged in to reply to this topic. Login to reply