
[Alex]
However, one cosmetic suggestion: for analogy with list.sorted, why not let the call be spelled as groupby(sequence, key=keyfunc) ?
I realize most itertools take a callable _first_, while, to be able to name the key-extractor this way, it would have to go second. I still think it would be nicer, partly because while sequence could not possibly default, key _could_ -- and its one obvious default is to an identity (lambda x: x). This would let elimination and/or counting of adjacent duplicates be expressed smoothly (for counting, it would help to have an ilen that gives the length of a finite iterable argument, but worst case one can substitute def ilen(it): for i, _ in enumerate(it): pass return i+1 or its inline equivalent).
Though the argument order makes my stomach churn, the identity function default is quite nice:
s = 'abracadabra;
# sort s | uniq [k for k, g in groupby(list.sorted(s))] ['a', 'b', 'c', 'd', 'r']
# sort s | uniq -d [k for k, g in groupby(list.sorted('abracadabra')) if ilen(g)>1] ['a', 'b', 'r']
# sort s | uniq -c [(ilen(g), k) for k, g in groupby(list.sorted(s))] [(5, 'a'), (2, 'b'), (1, 'c'), (1, 'd'), (2, 'r')]
sort s | uniq -c | sort -rn | head -3 list.sorted([(ilen(g), k) for k, g in groupby(list.sorted(s))], reverse=True)[:3] [(5, 'a'), (2, 'r'), (2, 'b')]
While extractor functions can be arbitrarily complex, many only fetch a specific attribute or element number. Alex's high-speed curry suggests that it is possible to create a function maker for fast lookups:
students.sort(key=extract('grade')) # key=lambda r:r.grade students.sort(key=extract(2)) # key=lambda r:[2]
Perhaps we could do this by changing list.sort() and groupby() to take a string or int as first argument to mean exactly this. For the
It seems to be that this would be specialcasing things while an extract function might help in other contexts as well. E.g., itertools has several other iterators that take a callable and might use this.
But I recommend holding off on this -- the "pure" groupby() has enough merit without speed hacks, and I find the clarity it provides more important than possible speed gains. I expect that the original, ugly
I agree that the case for extract is separate from that for groupby (although the latter does increase the attractiveness of the former).
Yes, it's clearly a separate issue (and icing on the cake). I was thinking extract() would be a nice addition to the operator module where everything is basically a lambda evading speed hack for accessing intrinsic operations: operator.add = lambda x,y: x+y Raymond