better way to do this in python

nn pruebauno at latinmail.com
Mon Apr 4 12:10:56 EDT 2011


On Apr 3, 8:06 am, Mag Gam <magaw... at gmail.com> wrote:
> Thanks for the responses.
>
> Basically, I have a large file with this format,
>
> Date INFO username command srcipaddress filename
>
> I would like to do statistics on:
> total number of usernames and who they are
> username and commands
> username and filenames
> unique source ip addresses
> unique filenames
>
> Then I would like to bucket findings with days (date).
>
> Overall, I would like to build a log file analyzer.
>
>
>
>
>
>
>
> On Sat, Apr 2, 2011 at 10:59 PM, Dan Stromberg <drsali... at gmail.com> wrote:
>
> > On Sat, Apr 2, 2011 at 5:24 PM, Chris Angelico <ros... at gmail.com> wrote:
>
> >> On Sun, Apr 3, 2011 at 9:58 AM, Mag Gam <magaw... at gmail.com> wrote:
> >> > I suppose I can do something like this.
> >> > (pseudocode)
>
> >> > d={}
> >> > try:
> >> >  d[key]+=1
> >> > except KeyError:
> >> >  d[key]=1
>
> >> > I was wondering if there is a pythonic way of doing this? I plan on
> >> > doing this many times for various files. Would the python collections
> >> > class be sufficient?
>
> >> I think you want collections.Counter. From the docs: "Counter objects
> >> have a dictionary interface except that they return a zero count for
> >> missing items instead of raising a KeyError".
>
> >> ChrisA
>
> > I realize you (Mag) asked for a Python solution, but since you mention
> > awk... you can also do this with "sort < input | uniq -c" - one line of
> > "code".  GNU sort doesn't use as nice an algorithm as a hashing-based
> > solution (like you'd probably use with Python), but for a sort, GNU sort's
> > quite good.
>
> > --
> >http://mail.python.org/mailman/listinfo/python-list

Take a look at:
http://code.activestate.com/recipes/577535-aggregates-using-groupby-defaultdict-and-counter/

for some ideas of how to group and count things.



More information about the Python-list mailing list