Candidate for a new itertool

Tue Mar 10 11:35:48 EDT 2009

On Mar 9, 6:55 pm, Raymond Hettinger <pyt... at rcn.com> wrote:
> [prueba]
>
> > The data often contains objects with attributes instead of tuples, and
> > I expect the new namedtuple datatype to be used also as elements of
> > the list to be processed.
>
> > But I haven't found a nice generalized way for that kind of pattern
> > that aggregates from a list of one datatype to a list of key plus
> > output datatype that would make it practical and suitable for
> > inclusion in the standard library.
>
> Looks like you've searched the possibilities thoroughly and no one
> aggregation function seems to meet all needs.  That's usually a cue
> to not try to build one and instead let simple python loops do the
> work for you (that also saves the awkward itemgetter() calls in your
> examples).  To my eyes, all three examples look like straight-forward,
> easy-to-write, easy-to-read, fast plain python:
>
> >>> d = defaultdict(int)
> >>> for color, n, info in data:
> ...     d[color] += n
> >>> d.items()
>
> [('blue', 6), ('yellow', 3), ('red', 4)]
>
> >>> d = defaultdict(list)
> >>> for color, n, info in data:
>
> ...     d[color].append(n)>>> d.items()
>
> [('blue', [5, 1]), ('yellow', [3]), ('red', [2, 2])]
>
> >>> d = defaultdict(set)
> >>> for color, n, info in data:
> ...     d[color].add(n)
> >>> d.items()
>
> [('blue', set([1, 5])), ('yellow', set([3])), ('red', set([2]))]
>
> I don't think you can readily combine all three examples into a single
> aggregator without the obfuscation and awkwardness that comes from
> parameterizing all of the varying parts:
>
> def aggregator(default_factory, adder, iterable, keyfunc, valuefunc):
>    d = defaultdict(default_factory)
>    for record in iterable:
>        key = keyfunc(record)
>        value = valuefunc(record)
>        adder(d[key], value)
>    return d.items()
>
> >>> aggregator(list, list.append, data, itemgetter(0), itemgetter(1))
>
> [('blue', [5, 1]), ('yellow', [3]), ('red', [2, 2])]>>> aggregator(set, set.add, data, itemgetter(0), itemgetter(1))
>
> [('blue', set([1, 5])), ('yellow', set([3])), ('red', set([2]))]
>
> Yuck!  Plain Python wins.
>
> Raymond
>
> P.S. The aggregator doesn't work so well for:
>
> >>> aggregator(int, operator.iadd, data, itemgetter(0), itemgetter(1))
>
> [('blue', 0), ('yellow', 0), ('red', 0)]
>
> The problem is that operator.iadd() doesn't have a way to both
> retrieve
> and store back into a dictionary.

Yes thinking about this more, one probably needs to have two code
paths depending if the type returned by default_factory is mutable or
immutable. But you are probably right that the ratio of redundancy/
variability is pretty low for such a function and the plain written
out for loop is not too painful. The only redundancy is the creation
and manipulation of the dictionary and the explicit looping.