Best/better way? (histogram)
Bernard Rankin
berankin99 at yahoo.com
Wed Jan 28 09:08:46 EST 2009
>
> The simplest. That would be #3, cleaned up a bit:
>
> from collections import defaultdict
> from csv import DictReader
> from pprint import pprint
> from operator import itemgetter
>
> def rows(filename):
> infile = open(filename, "rb")
> for row in DictReader(infile):
> yield row["CATEGORIES"]
>
> def stats(values):
> histo = defaultdict(int)
> for v in values:
> histo[v] += 1
> return sorted(histo.iteritems(), key=itemgetter(1), reverse=True)
>
> Should you need the inner dict (which doesn't seem to offer any additional
> information) you can always add another step:
>
> def format(items):
> result = []
> for raw, count in items:
> leaf = raw.rpartition("|")[2]
> result.append((raw, dict(count=count, leaf=leaf)))
> return result
>
> pprint(format(stats(rows("sampledata.csv"))), indent=4, width=60)
>
> By the way, if you had broken the problem in steps like above you could have
> offered four different stats() functions which would would have been a bit
> easier to read...
>
Thank you. The code reorganization does make make it easer to read.
I'll have to look up the docs on itemgetter()
:)
More information about the Python-list
mailing list