[Tutor] counting problem

Mon Aug 1 22:41:36 CEST 2005

On Mon, 1 Aug 2005, Kent Johnson wrote:

> cgw501 at york.ac.uk wrote:
> > hi,
> >
> > I have large txt file with lines like this:
> >
> > ['DDB0216437']	1166	1174	9     ZZZ   100
> >
> > What I want to do is quickly count the number of lines that share a
> > value in the 4th column and 5th (i.e. in this line I would count all
> > the line that have '9' and 'ZZZ'). Anyone got any ideas for the
> > quickest way to do this? The solution I have is really ugly. thanks,

A dictionary approach may also be useful.  The following example should
help illustrate the technique:

######
>>> def histogram(iterable):
...     """Returns a list of counts of each unique element in iterable."""
...     d = {}
...     for x in iterable:
...         d[x] = d.get(x, 0) + 1
...     return d.items()
...
>>> histogram("this is a test of the emergency broadcast system this is
only a test")
[('a', 4), (' ', 13), ('c', 2), ('b', 1), ('e', 7), ('d', 1), ('g', 1),
 ('f', 1), ('i', 4), ('h', 3), ('m', 2), ('l', 1), ('o', 3), ('n', 2),
 ('s', 9), ('r', 2), ('t', 9), ('y', 3)]
######

This is a fairly straightforward way of doing letter-frequency stuff.  We
can see from the histogram that the letters {'d', 'f', 'g', 'l'] are
solitary and occur only once.

This tallying approach can also be applied to the original poster's
question with the columns of a text file, as long as we figure out what we
want to tally up.

Good luck!