[Python-Dev] Re: "groupby" iterator

Raymond Hettinger python at rcn.com
Mon Dec 1 14:22:32 EST 2003


[David Eppstein]
> My implementation will skip or partially consume the subiterators,
with
> only a very temporary additional use of memory, if you don't keep a
> reference to them.
> But I can see your arguments about visible vs silent failure modes and
> code
> complexity.

I'll add note saying that the behavior may change.  Then, if I was
wrong, we can add support for delinquent subiterators.



[Guido]
>Have you thought about the name yet?  Will it be grouped() or
> grouby()?

This question is still open.

The grouped() name fits well with its list.sorted() and reversed().
Also, it is more natural in context of a default identity function.  

The groupby() name better suggests what's going on under the hood.  The
strong association with SQL is a plus because the analogy is accurate.

I'm leaning away from grouped() because it seems vague and lifeless.
Other tools like izip() and the prospective window() and roundrobin()
functions could also be said to return groups.


[After Guido suggests introducing an SQLlike itertools.count()]
I think this belongs in a new module for accumulator functions including
average, standard deviation, and such.  In the meantime, len(list(grp))
will suffice.

That is even more reasonable when multiple accumulation functions are
used because a list has to be created anyway:

for k, g in groupby(s, key=keyfunc):
     data = list(g)
     print s, len(data), sum(data), average(data)


[Guido on chained attribute lookups]
>Anyway, it's all moot -- Raymond just added operator.itemgetter 
> and operator.attrgetter.

The implementation does not preclude someone adding support for chained
attribute lookups.  IMO, the non-chained version covers the most common
use cases simply and directly.  For complex structures, lambda already
has plenty of versatility: 

    key = lambda r: r.roster[3].personalinfo.physicaladdress.zipcode

IIRC, Tim recently yearned for the days when the lack of support for
nested structures pushed people towards flatter designs.  With him
already looking forward to death, we dare not add to his angst.



Raymond Hettinger





More information about the Python-Dev mailing list