[Tutor] Simple Stats on Apache Logs

Fri Feb 12 06:15:33 CET 2010

On Thu, Feb 11, 2010 at 4:56 AM, Lao Mao <laomao1975 at googlemail.com> wrote:
> Hi,
>
> I have 3 servers which generate about 2G of webserver logfiles in a day.
> These are available on my machine over NFS.
>
> I would like to draw up some stats which shows, for a given keyword, how
> many times it appears in the logs, per hour, over the previous week.
>
> So the behavior might be:
>
> $ ./webstats --keyword downloader
>
> Which would read from the logs (which it has access to) and produce
> something like:
>
> Monday:
> 0000: 12
> 0100: 17
>
> etc
>
> I'm not sure how best to get started.  My initial idea would be to filter
> the logs first, pulling out the lines with matching keywords, then check the
> timestamp - maybe incrementing a dictionary if the logfile was within a
> certain time?
>
> I'm not looking for people to write it for me, but I'd appreciate some
> guidance as the the approach and algorithm.  Also what the simplest
> presentation model would be.  Or even if it would make sense to stick it in
> a database!  I'll post back my progress.
>
> Thanks,
>
> Laomao
>
> _______________________________________________
> Tutor maillist  -  Tutor at python.org
> To unsubscribe or change subscription options:
> http://mail.python.org/mailman/listinfo/tutor
>
>

You may also find this link useful
http://effbot.org/zone/wide-finder.htm on parsing logs efficiently
using Python.