[Tutor] Simple Stats on Apache Logs

Lao Mao laomao1975 at googlemail.com
Thu Feb 11 10:56:51 CET 2010


Hi,

I have 3 servers which generate about 2G of webserver logfiles in a day.
These are available on my machine over NFS.

I would like to draw up some stats which shows, for a given keyword, how
many times it appears in the logs, per hour, over the previous week.

So the behavior might be:

$ ./webstats --keyword downloader

Which would read from the logs (which it has access to) and produce
something like:

Monday:
0000: 12
0100: 17

etc

I'm not sure how best to get started.  My initial idea would be to filter
the logs first, pulling out the lines with matching keywords, then check the
timestamp - maybe incrementing a dictionary if the logfile was within a
certain time?

I'm not looking for people to write it for me, but I'd appreciate some
guidance as the the approach and algorithm.  Also what the simplest
presentation model would be.  Or even if it would make sense to stick it in
a database!  I'll post back my progress.

Thanks,

Laomao
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/tutor/attachments/20100211/253b681c/attachment.htm>


More information about the Tutor mailing list