Awkward Text Parsing

  • Hi,

    I'm working with some text fields that I need to parse into their elements. An example is:

    ULogEMGUser2009811125653781.log

    Which I need to split into:

    ULog (this is easy as it is always the first 4 characters)

    EMGUser (this represents a user-name and is of varied length and can include numbers)

    2009811125653781 (this represents a datetime)

    The datetime element is particularly problematic as the month can be one digit (8 = August), or two digits (12 = December). The time portion is similarly awkward and not a consistent length, i.e 926 = 9:26 am, and 1431 = 14:31pm, with the remaining digits being seconds and milliseconds.

    The sample data is below:

    create table #temp_log (logfield varchar(50), username varchar(50))

    INSERT INTO #TEMP_LOG (LOGFIELD, USERNAME)

    SELECT 'ULogleaseway2009618122842937.log','leaseway' UNION ALL

    SELECT 'ULogsviveash200961273419432.log','sviveash' UNION ALL

    SELECT 'ULogsmitdav20098484032114.log','smitdav' UNION ALL

    SELECT 'ULogmhealy20096109751394.log','mhealy' UNION ALL

    SELECT 'ULogmcowan20096375218427.log','mcowan' UNION ALL

    SELECT 'ULogsmitdav200971145922925.log','smitdav' UNION ALL

    SELECT 'ULogBen H2009611134749343.log','Ben H' UNION ALL

    SELECT 'ULogN918020097675036275.log','N9180' UNION ALL

    SELECT 'ULogsmitdav200983133757806.log','smitdav' UNION ALL

    SELECT 'ULogsmitdav200971141152369.log','smitdav' UNION ALL

    SELECT 'ULogu0374602009717132353999.log','u037460' UNION ALL

    SELECT 'ULogN9180200976135935619.log','N9180' UNION ALL

    SELECT 'ULogflintp20096285228921.log','flintp' UNION ALL

    SELECT 'ULog91756520096191538115.log','917565' UNION ALL

    SELECT 'ULogAndrew Holderness20096973641281.log','Andrew Holderness' UNION ALL

    SELECT 'ULogtimh20098394052161.log','timh' UNION ALL

    SELECT 'ULogandrew holderness2009628756492.log','andrew holderness' UNION ALL

    SELECT 'ULogflintp20096285228921.log','flintp' UNION ALL

    SELECT 'ULogmhealy2009612741436.log','mhealy' UNION ALL

    SELECT 'ULogleaseway200961963938296.log','leaseway' UNION ALL

    SELECT 'ULogNashp12009714124954828.log','Nashp1' UNION ALL

    SELECT 'ULog11000741720098572636844.log','110007417' UNION ALL

    SELECT 'ULogdaslee20096116331310.log','daslee' UNION ALL

    SELECT 'ULogflintp20097318123924.log','flintp' UNION ALL

    SELECT 'ULogJames Traer20098491743218.log','James Traer' UNION ALL

    SELECT 'ULogmidwoodb2009627104425908.log','midwoodb' UNION ALL

    SELECT 'ULog11000741720098572636844.log','110007417' UNION ALL

    SELECT 'ULogMatthew.DYkes20096483535952.log','Matthew.DYkes' UNION ALL

    SELECT 'ULogmlawrence2009723102343452.log','mlawrence' UNION ALL

    SELECT 'ULog1100074172009848410834.log','110007417' UNION ALL

    SELECT 'ULogN9180200976135935619.log','N9180' UNION ALL

    SELECT 'ULogmdeighton20095271048862.log','mdeighton' UNION ALL

    SELECT 'ULogScanner20096475654859.log','Scanner' UNION ALL

    SELECT 'ULogsmitdav2009728926962.log','smitdav' UNION ALL

    SELECT 'ULogAllieG200961073227926.log','AllieG' UNION ALL

    SELECT 'ULogMSuffling2009625143153875.log','MSuffling' UNION ALL

    SELECT 'ULog501124799200972972243773.log','501124799' UNION ALL

    SELECT 'ULog5014399312009611114512678.log','501439931' UNION ALL

    SELECT 'ULog5014399312009716124332604.log','501439931' UNION ALL

    SELECT 'ULogsmitdav200971122646671.log','smitdav' UNION ALL

    SELECT 'ULogScanner20096475654859.log','Scanner' UNION ALL

    SELECT 'ULog917564200965123257614.log','917564' UNION ALL

    SELECT 'ULogandrear2009714123018812.log','andrear' UNION ALL

    SELECT 'ULogbrian.armstrong20096259134360.log','brian.armstrong' UNION ALL

    SELECT 'ULogJames Traer200961683645812.log','James Traer' UNION ALL

    SELECT 'ULog91756420096510952610.log','917564' UNION ALL

    SELECT 'ULog5014399312009721144226491.log','501439931' UNION ALL

    SELECT 'ULogmcowan200961775429629.log','mcowan' UNION ALL

    SELECT 'ULogmcowan20097373458961.log','mcowan' UNION ALL

    SELECT 'ULogSteve200952610181453.log','Steve'

    I've included the Usernames that I happen to know as a separate field just to illustrate how varied they can be. However, I do not know all usernames so need to be able to parse these from the string.

    I'm trying to parse into the 3 fields as detailed earlier, but with the datetime portion actually returned as datetime datatype.

    Any help greatly appreciated.

    Many thanks....Jason

  • I can't think of a way to do this without knowing something static after the username. Here is what I did and I guess it will work for the rest of this year but not sure what to do after that. I used the string '2009' as the static part of the logfield after the username. Maybe you could at least collect all of these user names so you can use that in the future. Also, I haven't converted to datetime yet.

    create table #temp_log (logfield varchar(50), username varchar(50))

    INSERT INTO #TEMP_LOG (LOGFIELD, USERNAME)

    SELECT 'ULogleaseway2009618122842937.log','leaseway' UNION ALL

    SELECT 'ULogsviveash200961273419432.log','sviveash' UNION ALL

    SELECT 'ULogsmitdav20098484032114.log','smitdav' UNION ALL

    SELECT 'ULogmhealy20096109751394.log','mhealy' UNION ALL

    SELECT 'ULogmcowan20096375218427.log','mcowan' UNION ALL

    SELECT 'ULogsmitdav200971145922925.log','smitdav' UNION ALL

    SELECT 'ULogBen H2009611134749343.log','Ben H' UNION ALL

    SELECT 'ULogN918020097675036275.log','N9180' UNION ALL

    SELECT 'ULogsmitdav200983133757806.log','smitdav' UNION ALL

    SELECT 'ULogsmitdav200971141152369.log','smitdav' UNION ALL

    SELECT 'ULogu0374602009717132353999.log','u037460' UNION ALL

    SELECT 'ULogN9180200976135935619.log','N9180' UNION ALL

    SELECT 'ULogflintp20096285228921.log','flintp' UNION ALL

    SELECT 'ULog91756520096191538115.log','917565' UNION ALL

    SELECT 'ULogAndrew Holderness20096973641281.log','Andrew Holderness' UNION ALL

    SELECT 'ULogtimh20098394052161.log','timh' UNION ALL

    SELECT 'ULogandrew holderness2009628756492.log','andrew holderness' UNION ALL

    SELECT 'ULogflintp20096285228921.log','flintp' UNION ALL

    SELECT 'ULogmhealy2009612741436.log','mhealy' UNION ALL

    SELECT 'ULogleaseway200961963938296.log','leaseway' UNION ALL

    SELECT 'ULogNashp12009714124954828.log','Nashp1' UNION ALL

    SELECT 'ULog11000741720098572636844.log','110007417' UNION ALL

    SELECT 'ULogdaslee20096116331310.log','daslee' UNION ALL

    SELECT 'ULogflintp20097318123924.log','flintp' UNION ALL

    SELECT 'ULogJames Traer20098491743218.log','James Traer' UNION ALL

    SELECT 'ULogmidwoodb2009627104425908.log','midwoodb' UNION ALL

    SELECT 'ULog11000741720098572636844.log','110007417' UNION ALL

    SELECT 'ULogMatthew.DYkes20096483535952.log','Matthew.DYkes' UNION ALL

    SELECT 'ULogmlawrence2009723102343452.log','mlawrence' UNION ALL

    SELECT 'ULog1100074172009848410834.log','110007417' UNION ALL

    SELECT 'ULogN9180200976135935619.log','N9180' UNION ALL

    SELECT 'ULogmdeighton20095271048862.log','mdeighton' UNION ALL

    SELECT 'ULogScanner20096475654859.log','Scanner' UNION ALL

    SELECT 'ULogsmitdav2009728926962.log','smitdav' UNION ALL

    SELECT 'ULogAllieG200961073227926.log','AllieG' UNION ALL

    SELECT 'ULogMSuffling2009625143153875.log','MSuffling' UNION ALL

    SELECT 'ULog501124799200972972243773.log','501124799' UNION ALL

    SELECT 'ULog5014399312009611114512678.log','501439931' UNION ALL

    SELECT 'ULog5014399312009716124332604.log','501439931' UNION ALL

    SELECT 'ULogsmitdav200971122646671.log','smitdav' UNION ALL

    SELECT 'ULogScanner20096475654859.log','Scanner' UNION ALL

    SELECT 'ULog917564200965123257614.log','917564' UNION ALL

    SELECT 'ULogandrear2009714123018812.log','andrear' UNION ALL

    SELECT 'ULogbrian.armstrong20096259134360.log','brian.armstrong' UNION ALL

    SELECT 'ULogJames Traer200961683645812.log','James Traer' UNION ALL

    SELECT 'ULog91756420096510952610.log','917564' UNION ALL

    SELECT 'ULog5014399312009721144226491.log','501439931' UNION ALL

    SELECT 'ULogmcowan200961775429629.log','mcowan' UNION ALL

    SELECT 'ULogmcowan20097373458961.log','mcowan' UNION ALL

    SELECT 'ULogSteve200952610181453.log','Steve'

    SELECT LEFT(logfield,4), SUBSTRING(logfield,5,CHARINDEX('2009',logfield)-5),

    SUBSTRING(logfield,CHARINDEX('2009',logfield),CHARINDEX('.',logfield,CHARINDEX('2009',logfield))-CHARINDEX('2009',logfield)),

    * FROM #temp_log

    DROP TABLE #temp_log

  • Where are these logs coming from?

    Is there any way that you or the developer can "improve" the log files by including a delimiter?

    Building on Matt's earlier example I can overcome the '2009' issue (also when the username starts with 2009)

    I can get the Yr and Mth but the day/time has me stumped.

    Is any part of the sec / millisecs a fixed length?

    Anyway, for what it is worth...some non satisfactory code :hehe:

    create table #temp_log (logfield varchar(50), username varchar(50))

    INSERT INTO #TEMP_LOG (LOGFIELD, USERNAME)

    SELECT 'ULogleaseway20091218122842937.log','leaseway' UNION ALL --- amended

    SELECT 'ULogsviveash2009111273419432.log','sviveash' UNION ALL --- amended

    SELECT 'ULogsmitdav200910484032114.log','smitdav' UNION ALL --- amended

    SELECT 'ULogmhealy20096109751394.log','mhealy' UNION ALL

    SELECT 'ULogmcowan20096375218427.log','mcowan' UNION ALL

    SELECT 'ULogsmitdav200971145922925.log','smitdav' UNION ALL

    SELECT 'ULogBen H2009611134749343.log','Ben H' UNION ALL

    SELECT 'ULogN918020097675036275.log','N9180' UNION ALL

    SELECT 'ULogsmitdav200983133757806.log','smitdav' UNION ALL

    SELECT 'ULogsmitdav200971141152369.log','smitdav' UNION ALL

    SELECT 'ULogu0374602009717132353999.log','u037460' UNION ALL

    SELECT 'ULogN9180200976135935619.log','N9180' UNION ALL

    SELECT 'ULogflintp20096285228921.log','flintp' UNION ALL

    SELECT 'ULog91756520096191538115.log','917565' UNION ALL

    SELECT 'ULogAndrew Holderness20096973641281.log','Andrew Holderness' UNION ALL

    SELECT 'ULogtimh20098394052161.log','timh' UNION ALL

    SELECT 'ULogandrew holderness2009628756492.log','andrew holderness' UNION ALL

    SELECT 'ULogflintp20096285228921.log','flintp' UNION ALL

    SELECT 'ULogmhealy2009612741436.log','mhealy' UNION ALL

    SELECT 'ULogleaseway200961963938296.log','leaseway' UNION ALL

    SELECT 'ULogNashp12009714124954828.log','Nashp1' UNION ALL

    SELECT 'ULog11000741720098572636844.log','110007417' UNION ALL

    SELECT 'ULogdaslee20096116331310.log','daslee' UNION ALL

    SELECT 'ULogflintp20097318123924.log','flintp' UNION ALL

    SELECT 'ULogJames Traer20098491743218.log','James Traer' UNION ALL

    SELECT 'ULogmidwoodb2009627104425908.log','midwoodb' UNION ALL

    SELECT 'ULog11000741720098572636844.log','110007417' UNION ALL

    SELECT 'ULogMatthew.DYkes20096483535952.log','Matthew.DYkes' UNION ALL

    SELECT 'ULogmlawrence2009723102343452.log','mlawrence' UNION ALL

    SELECT 'ULog1100074172009848410834.log','110007417' UNION ALL

    SELECT 'ULogN9180200976135935619.log','N9180' UNION ALL

    SELECT 'ULogmdeighton20095271048862.log','mdeighton' UNION ALL

    SELECT 'ULogScanner20096475654859.log','Scanner' UNION ALL

    SELECT 'ULogsmitdav2009728926962.log','smitdav' UNION ALL

    SELECT 'ULogAllieG200961073227926.log','AllieG' UNION ALL

    SELECT 'ULogMSuffling2009625143153875.log','MSuffling' UNION ALL

    SELECT 'ULog501124799200972972243773.log','501124799' UNION ALL

    SELECT 'ULog5014399312009611114512678.log','501439931' UNION ALL

    SELECT 'ULog5014399312009716124332604.log','501439931' UNION ALL

    SELECT 'ULogsmitdav200971122646671.log','smitdav' UNION ALL

    SELECT 'ULogScanner20096475654859.log','Scanner' UNION ALL

    SELECT 'ULog917564200965123257614.log','917564' UNION ALL

    SELECT 'ULogandrear2009714123018812.log','andrear' UNION ALL

    SELECT 'ULogbrian.armstrong20096259134360.log','brian.armstrong' UNION ALL

    SELECT 'ULogJames Traer200961683645812.log','James Traer' UNION ALL

    SELECT 'ULog91756420096510952610.log','917564' UNION ALL

    SELECT 'ULog5014399312009721144226491.log','501439931' UNION ALL

    SELECT 'ULogmcowan200961775429629.log','mcowan' UNION ALL

    SELECT 'ULogmcowan20097373458961.log','mcowan' UNION ALL

    SELECT 'ULog2009Steve200952610181453.log','TestforCurrentyear' --- amended entry

    DECLARE @CurYear as varchar(4)

    SET @Curyear = DATEPART(yyyy,GETDATE())---- wont work if you import files on 01/01/2009 for logs created 31/12/2008

    UPDATE #temp_log --- in caae you are unlucky to find a username that starts with current year

    SET logfield = LEFT (logfield, 4) + REPLACE(SUBSTRING(logfield, 5, 4), @Curyear, 'xxxx') + RIGHT (logfield, LEN(logfield) - 8)

    SELECT LEFT(logfield,4) AS L1, SUBSTRING(logfield,5,CHARINDEX(@CurYear,logfield)-5) AS L2,

    SUBSTRING(logfield,CHARINDEX(@CurYear,logfield),CHARINDEX('.',logfield,CHARINDEX(@CurYear,logfield))-CHARINDEX(@CurYear,logfield)) AS L3

    INTO #temp_log2

    FROM #temp_log

    SELECT L1, L2, LEFT(L3, 4) AS Year,

    CASE WHEN SUBSTRING(L3, 5, 2) IN (10, 11, 12) THEN SUBSTRING(L3, 5, 2) ELSE '0' + SUBSTRING(L3, 5, 1) END AS Mth,

    L3

    from #temp_log2

    DROP TABLE #temp_log

    DROP TABLE #temp_log2

    ________________________________________________________________
    you can lead a user to data....but you cannot make them think
    and remember....every day is a school day

  • Thanks Guys.

    The logs are generated in-house would you believe :-P, so I will be

    asking that they are improved through use of delimiters, but I've been tasked

    with stripping out the elements in the mean-time.

    Your suggestions are pretty much how I was going to go - use the year as the

    identifier of the date (but check for the year in usernames first).

    I shall of course, be spekaing to the guy who created the logs 😉

    Thanks again,

    Jay

    --

Viewing 4 posts - 1 through 3 (of 3 total)

You must be logged in to reply to this topic. Login to reply