[Python-Dev] Re: "groupby" iterator

Robert Brewer fumanchu at amor.org
Fri Dec 5 00:18:08 EST 2003

How did this thread start again? ;)

Guido van Rossum, a long time ago, wrote:
> In the shower (really!) I was thinking about the old problem of going
> through a list of items that are supposed to be grouped by some key,
> and doing something extra at the end of each group...
> So I realized this is easy to do with a generator, assuming we can
> handle keeping a list of all items in a group.
> for group in groupby(key, sequence):

Greg Ewing's proposal of a "given" keyword (x.score given x) got me
thinking. I figured I would play around a bit and try to come up with
the most readable version of the original "groupby" idea (for which I
could imagine *some* implementation):

    for group in sequence.groups(using item.score - item.penalty):
        ...do stuff with group

Having written this down, it seems to me the most readable so far. The
keyword "using" creates a new scope, within which "item" is bound to the
arg (or *args?) passed in. I don't know about you all, but the thing I
like least about lambda is having to mention 'x' twice:

    lambda x: x.score

Why have the programmer bind a custom name to an object we're going to
then use 'anonymously' anyway? I understand its historical necessity,
but it's always struck me as more complex than the concept being
implemented. Ideally, we should be able to reference the passed-in
objects without having to invent names for them.

Now, consider multi-arg lambdas such as:

    sequence.sort(lambda x, y: cmp(x[0], y[0]))

In these cases, we wish to apply the same operation to each item (that
is, we calculate x[0] and y[0]). If we bind "item" to each argument *in
turn*, we save a lot of syntax. The above might then be written as:
    sequence.sort(using cmp(item[0])) # Hard to implement.

    sequence.sort(cmp(using item[0])) # Easier but ugly. Meh.

    sequence.sort(cmp using item[0])  # Oooh. Nice. :)

    # might we assume cmp(), since sort does...?
    sequence.sort(using item[0])

I like #3, since cmp is explicit but doesn't use cmp(), which looks too
much like a call. Given (cmp using item[0]), the "using block" would
look at the arguments supplied by sort(), call __getitem__[0] for each,
and pass those values in order into cmp, returning the result.

The "item" keyword functions similarly to Guido's Voodoo.foo() proposal,
now that I think about it. There's no reason it couldn't grow some early
binding, either, as suggested, although multiple operations would become
unwieldy. How would you early-bind this?

    sequence.groups(using divmod(item, 4)[1])

...except perhaps by using multiply-nested scopes to bind the "1" and
then the "4"?

Hmm. It would have to do some fancy dancing to get everything in the
right order. Too much like reinventing Python to think about at the
moment. :) The point is, passing the "item" instance through such a
scheme should be the easy part.

Robert Brewer
Amor Ministries
fumanchu at amor.org

More information about the Python-Dev mailing list