[Tutor] Logfile Manipulation

Wayne Werner waynejwerner at gmail.com
Mon Nov 9 13:37:29 CET 2009

On Sun, Nov 8, 2009 at 11:41 PM, Stephen Nelson-Smith <sanelson at gmail.com>wrote:

> I've got a large amount of data in the form of 3 apache and 3 varnish
> logfiles from 3 different machines.  They are rotated at 0400.  The
> logfiles are pretty big - maybe 6G per server, uncompressed.
> I've got to produce a combined logfile for 0000-2359 for a given day,
> with a bit of filtering (removing lines based on text match, bit of
> substitution).
> I've inherited a nasty shell script that does this but it is very slow
> and not clean to read or understand.
> I'd like to reimplement this in python.
> Initial questions:
> * How does Python compare in performance to shell, awk etc in a big
> pipeline?  The shell script kills the CPU
> * What's the best way to extract the data for a given time, eg 0000 -
> 2359 yesterday?
> Any advice or experiences?
go here and download the pdf!

Someone posted this the other day, and I went and read through it and played
around a bit and it's exactly what you're looking for - plus it has one vs.
slide of python vs. awk.

I think you'll find the pdf highly useful and right on.

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/tutor/attachments/20091109/6c0dfea7/attachment.htm>

More information about the Tutor mailing list