Re: [Python-Dev] "groupby" iterator

Nov. 29, 2003

      On Saturday 29 November 2003 12:41 am, Guido van Rossum wrote:
   ...
...
...
...
totals = {}
  for key, group in sequence:
      totals[key] = sum(group)
Oops, there's a mistake.  I meant to say:
totals = {}
    for key, group in groupby(keyfunc, sequence):
        totals[key] = sum(group)
...
This is a much stronger formulation than the original.  It is clear,
succinct, expressive, and less error prone.
I'm not sure to what extent this praise was inspired by my mistake of
leaving out the groupby() call.
Can't answer for RH, but, to me, the groupby call looks just fine.

However, one cosmetic suggestion: for analogy with list.sorted, why
not let the call be spelled as
    groupby(sequence, key=keyfunc)
?

I realize most itertools take a callable _first_, while, to be able to
name the key-extractor this way, it would have to go second.  I still
think it would be nicer, partly because while sequence could not
possibly default, key _could_ -- and its one obvious default is to an
identity (lambda x: x).  This would let elimination and/or counting of
adjacent duplicates be expressed smoothly (for counting, it would
help to have an ilen that gives the length of a finite iterable argument,
but worst case one can substitute
    def ilen(it):
        for i, _ in enumerate(it): pass
        return i+1
or its inline equivalent).

Naming the function 'grouped' rather than 'groupby' would probably
be better if the callable was the second arg rather than the first.
...
...
...
...
...
names = ['Tim D', 'Jack D', 'Jack J', 'Barry W', 'Tim P']
firstname = lambda n: n.split()[0]
names.sort()
unique_first_names = [first for first, _ in groupby(firstname,
names)]
['Barry' , 'Jack', 'Tim']
I don't think those semantics should be implemented.  You should be
required to iterate through each group.  I was just thinking that
Right, so basically it would have to be nested like:

ufn = [ f for g in groupby(firstname, names) for f, _ in g ]
...
...
In experimenting with groupby(), I am starting to see a need for a high
speed data extractor function.  This need is common to several tools
that take function arguments (like list.sort(key=)).
Exactly: it was definitely inspired by list.sort(key=).
That's part of why I'd love to be able to spell key= for this iterator too.
...
...
While extractor
functions can be arbitrarily complex, many only fetch a specific
attribute or element number.  Alex's high-speed curry suggests that it
is possible to create a function maker for fast lookups:
students.sort(key=extract('grade'))  # key=lambda r:r.grade
students.sort(key=extract(2))        # key=lambda r:[2]
Perhaps we could do this by changing list.sort() and groupby() to take
a string or int as first argument to mean exactly this.  For the
It seems to be that this would be specialcasing things while an extract
function might help in other contexts as well.  E.g., itertools has several
other iterators that take a callable and might use this.
...
But I recommend holding off on this -- the "pure" groupby() has enough
merit without speed hacks, and I find the clarity it provides more
important than possible speed gains.  I expect that the original, ugly
I agree that the case for extract is separate from that for groupby (although
the latter does increase the attractiveness of the former).

Alex

Re: [Python-Dev] "groupby" iterator

Alex Martelli