[Python-Dev] "groupby" iterator

Raymond Hettinger python at rcn.com
Fri Nov 28 18:24:30 EST 2003

> > Here's yet another implementation for itertoolsmodule.c. (see
> > attachment) I wrote it after the shower (really!) :)
> Wow!  Thanks.  Let's all remember to take or showers and maybe Python
> will become the cleanest programming language. :)
> Raymond, what do you think?

Yes.  I recommend taking showers on a regular basis ;-)

I'll experiment with groupby() for a few more days and see how it feels.
The first impression is that it meets all the criteria for becoming an
itertool (iters in, iters out; no unexpected memory use; works well with
other tools; not readily constructed from existing tools).  

At first, the tool seems more special purpose than general purpose.
OTOH, it is an excellent solution to a specific class of problems and it
makes code much cleaner by avoiding the repeated code block in the
non-iterator version.

> I would make one change: after looking at another use case, I'd like
> to change the outer iterator to produce (key, grouper) tuples.  This
> way, you can write things like
>   totals = {}
>   for key, group in sequence:
>       totals[key] = sum(group)

This is a much stronger formulation than the original.  It is clear,
succinct, expressive, and less error prone.

The implementation would be more complex than the original.  If the
group is ignored, the outer iterator needs to be smart enough to read
through the input iterator until the next group is encountered:

>>> names = ['Tim D', 'Jack D', 'Jack J', 'Barry W', 'Tim P']
>>> firstname = lambda n: n.split()[0]
>>> names.sort()
>>> unique_first_names = [first for first, _ in groupby(firstname,
['Barry' , 'Jack', 'Tim']

In experimenting with groupby(), I am starting to see a need for a high
speed data extractor function.  This need is common to several tools
that take function arguments (like list.sort(key=)).  While extractor
functions can be arbitrarily complex, many only fetch a specific
attribute or element number.  Alex's high-speed curry suggests that it
is possible to create a function maker for fast lookups:

students.sort(key=extract('grade'))  # key=lambda r:r.grade
students.sort(key=extract(2))        # key=lambda r:[2]


More information about the Python-Dev mailing list