[Tutor] Matching on multiple log lines
Kent Johnson
kent37 at tds.net
Mon Oct 30 12:55:08 CET 2006
wesley chun wrote:
>> so it's guaranteed that 'Writing Message to'
>> will always be followed by 'TRANSPORT_STREAM_ID'
>> before the next occurrence of 'Writing Message to'
>> and all text between can be ignored,
>> and we increment the counter if and only if
>> there is a newline immediately after 'TRANSPORT_STREAM_ID'
>> yes?
>
>
> just throwing this out there... would anyone do something like a
> open('log.txt', 'w').write(str(len(re.split(r'Writing Message
> to([\w\d\s:/\.]+?)TRANSPORT_STREAM_ID Parameter value:
> 0160\r?\n'))), or is this unseemly due the fact that the file may be
> very large?
If the log file can be read into memory then a regex-based solution
might work well though your code looks a bit scrambled to me. Rather
than re.split() I would use re.findall().
To solve this line-by-line I would make a simple state machine that
looks for lines of interest and moves through the states Begin,
Found_Transport_Stream_Id and Found_Writing_Message.
Kent
>
> advantages i see here include: no counter to maintain since you get
> the one answer at the end, your python code is not iterating thru the
> file one line at a time (the faster C code in 're' is), you auto
> matically skip the TRANSPORT_STREAM_IDs that are *not* followed by a
> NEWLINE, etc.
>
> just wondering,
> -- wesley
> - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
> "Core Python Programming", Prentice Hall, (c)2007,2001
> http://corepython.com
>
> wesley.j.chun :: wescpy-at-gmail.com
> python training and technical consulting
> cyberweb.consulting : silicon valley, ca
> http://cyberwebconsulting.com
> _______________________________________________
> Tutor maillist - Tutor at python.org
> http://mail.python.org/mailman/listinfo/tutor
>
>
More information about the Tutor
mailing list