RE: [Python-Dev] "groupby" iterator

Nov. 28, 2003

      ...
...
Here's yet another implementation for itertoolsmodule.c. (see
attachment) I wrote it after the shower (really!) :)
Wow!  Thanks.  Let's all remember to take or showers and maybe Python
will become the cleanest programming language. :)
Raymond, what do you think?
Yes.  I recommend taking showers on a regular basis ;-)

I'll experiment with groupby() for a few more days and see how it feels.
The first impression is that it meets all the criteria for becoming an
itertool (iters in, iters out; no unexpected memory use; works well with
other tools; not readily constructed from existing tools).  

At first, the tool seems more special purpose than general purpose.
OTOH, it is an excellent solution to a specific class of problems and it
makes code much cleaner by avoiding the repeated code block in the
non-iterator version.
...
I would make one change: after looking at another use case, I'd like
to change the outer iterator to produce (key, grouper) tuples.  This
way, you can write things like
totals = {}
  for key, group in sequence:
      totals[key] = sum(group)
This is a much stronger formulation than the original.  It is clear,
succinct, expressive, and less error prone.

The implementation would be more complex than the original.  If the
group is ignored, the outer iterator needs to be smart enough to read
through the input iterator until the next group is encountered:
...
...
...
names = ['Tim D', 'Jack D', 'Jack J', 'Barry W', 'Tim P']
firstname = lambda n: n.split()[0]
names.sort()
unique_first_names = [first for first, _ in groupby(firstname,
names)]
['Barry' , 'Jack', 'Tim']
In experimenting with groupby(), I am starting to see a need for a high
speed data extractor function.  This need is common to several tools
that take function arguments (like list.sort(key=)).  While extractor
functions can be arbitrarily complex, many only fetch a specific
attribute or element number.  Alex's high-speed curry suggests that it
is possible to create a function maker for fast lookups:

students.sort(key=extract('grade'))  # key=lambda r:r.grade
students.sort(key=extract(2))        # key=lambda r:[2]

Raymond

RE: [Python-Dev] "groupby" iterator

Raymond Hettinger