[Python-Dev] Re: "groupby" iterator

Raymond Hettinger python at rcn.com
Sun Nov 30 23:35:56 EST 2003


[Guido van Rossum]
> > But I don't think there's a good use case
> > for what he wants to do instead: save enough state so that the
> > subiterators can be used in arbitrary order.  An application
> > that saves the subiterators for later will end up saving a copy of
> > everything, so it might as well be written so explicitly

[David Eppstein]
> I don't have a good explicit use case in mind, but my objective is to
be
> able to use itertools-like functionals without having to pay much
> attention to which ones iterate through their arguments immediately
and
> which ones defer the iteration until later.

Okay, I've decided on this one.

Though David's idea is attractive in its generality, the use cases favor
the previous implementation.  IOW, there is a reasonable use case for
skipping or partially consuming the subiterators (e.g. "sort s | uniq"
and  "sort s | uniq -d").  For the delinguent subiterators, the user can
just convert them to a list if they are going to be needed later:

groups = []
for k, g in groupby(seq, keyfunc):
    groups.append(list(g))
    <do something with k>

With respect to the principle of least surprise, it is the lesser evil
between having a delinquent subiterator turn-up empty or having an
itertool unexpectedly fall into a memory intensive mode.

The first can be flagged so it won't pass silently.  The second is more
problematic because it is silent and because it is inconsistent with the
memory friendly nature of itertools.

Another minor argument against David's version is that the pure python
version (which will be included in the docs) is longer and harder to
follow.



Raymond Hettinger


P.S.  I'm leaning toward Alex's suggested argument order.  Having a
default identity function is too attractive to pass up.  So the choice
is between a style like map(None, s) or something closer to
list.sorted(s, key=).   Though the latter is not consistent with other
itertools, it wins in the beauty department and its similarity with the
key= is a accurate, helpful analogy.




More information about the Python-Dev mailing list