[Tutor] counting problem
Danny Yoo
dyoo at hkn.eecs.berkeley.edu
Mon Aug 1 22:41:36 CEST 2005
On Mon, 1 Aug 2005, Kent Johnson wrote:
> cgw501 at york.ac.uk wrote:
> > hi,
> >
> > I have large txt file with lines like this:
> >
> > ['DDB0216437'] 1166 1174 9 ZZZ 100
> >
> > What I want to do is quickly count the number of lines that share a
> > value in the 4th column and 5th (i.e. in this line I would count all
> > the line that have '9' and 'ZZZ'). Anyone got any ideas for the
> > quickest way to do this? The solution I have is really ugly. thanks,
A dictionary approach may also be useful. The following example should
help illustrate the technique:
######
>>> def histogram(iterable):
... """Returns a list of counts of each unique element in iterable."""
... d = {}
... for x in iterable:
... d[x] = d.get(x, 0) + 1
... return d.items()
...
>>> histogram("this is a test of the emergency broadcast system this is
only a test")
[('a', 4), (' ', 13), ('c', 2), ('b', 1), ('e', 7), ('d', 1), ('g', 1),
('f', 1), ('i', 4), ('h', 3), ('m', 2), ('l', 1), ('o', 3), ('n', 2),
('s', 9), ('r', 2), ('t', 9), ('y', 3)]
######
This is a fairly straightforward way of doing letter-frequency stuff. We
can see from the histogram that the letters {'d', 'f', 'g', 'l'] are
solitary and occur only once.
This tallying approach can also be applied to the original poster's
question with the columns of a text file, as long as we figure out what we
want to tally up.
Good luck!
More information about the Tutor
mailing list