[Python-Dev] "groupby" iterator
Raymond Hettinger
python at rcn.com
Sat Nov 29 03:26:38 EST 2003
[Alex]
> However, one cosmetic suggestion: for analogy with list.sorted, why
> not let the call be spelled as
> groupby(sequence, key=keyfunc)
> ?
>
> I realize most itertools take a callable _first_, while, to be able to
> name the key-extractor this way, it would have to go second. I still
> think it would be nicer, partly because while sequence could not
> possibly default, key _could_ -- and its one obvious default is to an
> identity (lambda x: x). This would let elimination and/or counting of
> adjacent duplicates be expressed smoothly (for counting, it would
> help to have an ilen that gives the length of a finite iterable
argument,
> but worst case one can substitute
> def ilen(it):
> for i, _ in enumerate(it): pass
> return i+1
> or its inline equivalent).
Though the argument order makes my stomach churn, the identity function
default is quite nice:
>>> s = 'abracadabra;
>>> # sort s | uniq
>>> [k for k, g in groupby(list.sorted(s))]
['a', 'b', 'c', 'd', 'r']
>>> # sort s | uniq -d
>>> [k for k, g in groupby(list.sorted('abracadabra')) if ilen(g)>1]
['a', 'b', 'r']
>>> # sort s | uniq -c
>>> [(ilen(g), k) for k, g in groupby(list.sorted(s))]
[(5, 'a'), (2, 'b'), (1, 'c'), (1, 'd'), (2, 'r')]
>>> sort s | uniq -c | sort -rn | head -3
>>> list.sorted([(ilen(g), k) for k, g in groupby(list.sorted(s))],
reverse=True)[:3]
[(5, 'a'), (2, 'r'), (2, 'b')]
> > > While extractor
> > > functions can be arbitrarily complex, many only fetch a specific
> > > attribute or element number. Alex's high-speed curry suggests
that it
> > > is possible to create a function maker for fast lookups:
> > >
> > > students.sort(key=extract('grade')) # key=lambda r:r.grade
> > > students.sort(key=extract(2)) # key=lambda r:[2]
> >
> > Perhaps we could do this by changing list.sort() and groupby() to
take
> > a string or int as first argument to mean exactly this. For the
>
> It seems to be that this would be specialcasing things while an
extract
> function might help in other contexts as well. E.g., itertools has
> several
> other iterators that take a callable and might use this.
>
> > But I recommend holding off on this -- the "pure" groupby() has
enough
> > merit without speed hacks, and I find the clarity it provides more
> > important than possible speed gains. I expect that the original,
ugly
>
> I agree that the case for extract is separate from that for groupby
> (although
> the latter does increase the attractiveness of the former).
Yes, it's clearly a separate issue (and icing on the cake). I was
thinking extract() would be a nice addition to the operator module where
everything is basically a lambda evading speed hack for accessing
intrinsic operations: operator.add = lambda x,y: x+y
Raymond
More information about the Python-Dev
mailing list