[Tutor] Matching on multiple log lines

Kent Johnson kent37 at tds.net
Mon Oct 30 12:55:08 CET 2006


wesley chun wrote:
>> so it's guaranteed that 'Writing Message to'
>> will always be followed by 'TRANSPORT_STREAM_ID'
>> before the next occurrence of 'Writing Message to'
>> and all text between can be ignored,
>> and we increment the counter if and only if
>> there is a newline immediately after 'TRANSPORT_STREAM_ID'
>> yes?
> 
> 
> just throwing this out there... would anyone do something like a
> open('log.txt', 'w').write(str(len(re.split(r'Writing Message
> to([\w\d\s:/\.]+?)TRANSPORT_STREAM_ID    Parameter value:
> 0160\r?\n'))), or is this unseemly due the fact that the file may be
> very large?

If the log file can be read into memory then a regex-based solution 
might work well though your code looks a bit scrambled to me. Rather 
than re.split() I would use re.findall().

To solve this line-by-line I would make a simple state machine that 
looks for lines of interest and moves through the states Begin, 
Found_Transport_Stream_Id and Found_Writing_Message.

Kent
> 
> advantages i see here include: no counter to maintain since you get
> the one answer at the end, your python code is not iterating thru the
> file one line at a time (the faster C code in 're' is), you auto
> matically skip the TRANSPORT_STREAM_IDs that are *not* followed by a
> NEWLINE, etc.
> 
> just wondering,
> -- wesley
> - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
> "Core Python Programming", Prentice Hall, (c)2007,2001
>     http://corepython.com
> 
> wesley.j.chun :: wescpy-at-gmail.com
> python training and technical consulting
> cyberweb.consulting : silicon valley, ca
> http://cyberwebconsulting.com
> _______________________________________________
> Tutor maillist  -  Tutor at python.org
> http://mail.python.org/mailman/listinfo/tutor
> 
> 




More information about the Tutor mailing list