
On Saturday 29 November 2003 12:41 am, Guido van Rossum wrote: ...
totals = {} for key, group in sequence: totals[key] = sum(group)
Oops, there's a mistake. I meant to say:
totals = {} for key, group in groupby(keyfunc, sequence): totals[key] = sum(group)
This is a much stronger formulation than the original. It is clear, succinct, expressive, and less error prone.
I'm not sure to what extent this praise was inspired by my mistake of leaving out the groupby() call.
Can't answer for RH, but, to me, the groupby call looks just fine. However, one cosmetic suggestion: for analogy with list.sorted, why not let the call be spelled as groupby(sequence, key=keyfunc) ? I realize most itertools take a callable _first_, while, to be able to name the key-extractor this way, it would have to go second. I still think it would be nicer, partly because while sequence could not possibly default, key _could_ -- and its one obvious default is to an identity (lambda x: x). This would let elimination and/or counting of adjacent duplicates be expressed smoothly (for counting, it would help to have an ilen that gives the length of a finite iterable argument, but worst case one can substitute def ilen(it): for i, _ in enumerate(it): pass return i+1 or its inline equivalent). Naming the function 'grouped' rather than 'groupby' would probably be better if the callable was the second arg rather than the first.
names = ['Tim D', 'Jack D', 'Jack J', 'Barry W', 'Tim P'] firstname = lambda n: n.split()[0] names.sort() unique_first_names = [first for first, _ in groupby(firstname, names)] ['Barry' , 'Jack', 'Tim']
I don't think those semantics should be implemented. You should be required to iterate through each group. I was just thinking that
Right, so basically it would have to be nested like: ufn = [ f for g in groupby(firstname, names) for f, _ in g ]
In experimenting with groupby(), I am starting to see a need for a high speed data extractor function. This need is common to several tools that take function arguments (like list.sort(key=)).
Exactly: it was definitely inspired by list.sort(key=).
That's part of why I'd love to be able to spell key= for this iterator too.
While extractor functions can be arbitrarily complex, many only fetch a specific attribute or element number. Alex's high-speed curry suggests that it is possible to create a function maker for fast lookups:
students.sort(key=extract('grade')) # key=lambda r:r.grade students.sort(key=extract(2)) # key=lambda r:[2]
Perhaps we could do this by changing list.sort() and groupby() to take a string or int as first argument to mean exactly this. For the
It seems to be that this would be specialcasing things while an extract function might help in other contexts as well. E.g., itertools has several other iterators that take a callable and might use this.
But I recommend holding off on this -- the "pure" groupby() has enough merit without speed hacks, and I find the clarity it provides more important than possible speed gains. I expect that the original, ugly
I agree that the case for extract is separate from that for groupby (although the latter does increase the attractiveness of the former). Alex