Candidate for a new itertool

Raymond Hettinger python at
Mon Mar 9 23:55:28 CET 2009

> The data often contains objects with attributes instead of tuples, and
> I expect the new namedtuple datatype to be used also as elements of
> the list to be processed.
> But I haven't found a nice generalized way for that kind of pattern
> that aggregates from a list of one datatype to a list of key plus
> output datatype that would make it practical and suitable for
> inclusion in the standard library.

Looks like you've searched the possibilities thoroughly and no one
aggregation function seems to meet all needs.  That's usually a cue
to not try to build one and instead let simple python loops do the
work for you (that also saves the awkward itemgetter() calls in your
examples).  To my eyes, all three examples look like straight-forward,
easy-to-write, easy-to-read, fast plain python:

>>> d = defaultdict(int)
>>> for color, n, info in data:
...	d[color] += n
>>> d.items()
[('blue', 6), ('yellow', 3), ('red', 4)]

>>> d = defaultdict(list)
>>> for color, n, info in data:
...	d[color].append(n)
>>> d.items()
[('blue', [5, 1]), ('yellow', [3]), ('red', [2, 2])]

>>> d = defaultdict(set)
>>> for color, n, info in data:
...	d[color].add(n)
>>> d.items()
[('blue', set([1, 5])), ('yellow', set([3])), ('red', set([2]))]

I don't think you can readily combine all three examples into a single
aggregator without the obfuscation and awkwardness that comes from
parameterizing all of the varying parts:

def aggregator(default_factory, adder, iterable, keyfunc, valuefunc):
   d = defaultdict(default_factory)
   for record in iterable:
       key = keyfunc(record)
       value = valuefunc(record)
       adder(d[key], value)
   return d.items()

>>> aggregator(list, list.append, data, itemgetter(0), itemgetter(1))
[('blue', [5, 1]), ('yellow', [3]), ('red', [2, 2])]
>>> aggregator(set, set.add, data, itemgetter(0), itemgetter(1))
[('blue', set([1, 5])), ('yellow', set([3])), ('red', set([2]))]

Yuck!  Plain Python wins.


P.S. The aggregator doesn't work so well for:

>>> aggregator(int, operator.iadd, data, itemgetter(0), itemgetter(1))
[('blue', 0), ('yellow', 0), ('red', 0)]

The problem is that operator.iadd() doesn't have a way to both
and store back into a dictionary.

More information about the Python-list mailing list