Candidate for a new itertool

pruebauno at latinmail.com pruebauno at latinmail.com
Tue Mar 10 16:35:48 CET 2009


On Mar 9, 6:55 pm, Raymond Hettinger <pyt... at rcn.com> wrote:
> [prueba]
>
> > The data often contains objects with attributes instead of tuples, and
> > I expect the new namedtuple datatype to be used also as elements of
> > the list to be processed.
>
> > But I haven't found a nice generalized way for that kind of pattern
> > that aggregates from a list of one datatype to a list of key plus
> > output datatype that would make it practical and suitable for
> > inclusion in the standard library.
>
> Looks like you've searched the possibilities thoroughly and no one
> aggregation function seems to meet all needs.  That's usually a cue
> to not try to build one and instead let simple python loops do the
> work for you (that also saves the awkward itemgetter() calls in your
> examples).  To my eyes, all three examples look like straight-forward,
> easy-to-write, easy-to-read, fast plain python:
>
> >>> d = defaultdict(int)
> >>> for color, n, info in data:
> ...     d[color] += n
> >>> d.items()
>
> [('blue', 6), ('yellow', 3), ('red', 4)]
>
> >>> d = defaultdict(list)
> >>> for color, n, info in data:
>
> ...     d[color].append(n)>>> d.items()
>
> [('blue', [5, 1]), ('yellow', [3]), ('red', [2, 2])]
>
> >>> d = defaultdict(set)
> >>> for color, n, info in data:
> ...     d[color].add(n)
> >>> d.items()
>
> [('blue', set([1, 5])), ('yellow', set([3])), ('red', set([2]))]
>
> I don't think you can readily combine all three examples into a single
> aggregator without the obfuscation and awkwardness that comes from
> parameterizing all of the varying parts:
>
> def aggregator(default_factory, adder, iterable, keyfunc, valuefunc):
>    d = defaultdict(default_factory)
>    for record in iterable:
>        key = keyfunc(record)
>        value = valuefunc(record)
>        adder(d[key], value)
>    return d.items()
>
> >>> aggregator(list, list.append, data, itemgetter(0), itemgetter(1))
>
> [('blue', [5, 1]), ('yellow', [3]), ('red', [2, 2])]>>> aggregator(set, set.add, data, itemgetter(0), itemgetter(1))
>
> [('blue', set([1, 5])), ('yellow', set([3])), ('red', set([2]))]
>
> Yuck!  Plain Python wins.
>
> Raymond
>
> P.S. The aggregator doesn't work so well for:
>
> >>> aggregator(int, operator.iadd, data, itemgetter(0), itemgetter(1))
>
> [('blue', 0), ('yellow', 0), ('red', 0)]
>
> The problem is that operator.iadd() doesn't have a way to both
> retrieve
> and store back into a dictionary.

Yes thinking about this more, one probably needs to have two code
paths depending if the type returned by default_factory is mutable or
immutable. But you are probably right that the ratio of redundancy/
variability is pretty low for such a function and the plain written
out for loop is not too painful. The only redundancy is the creation
and manipulation of the dictionary and the explicit looping.




More information about the Python-list mailing list