[Python-Dev] Re: "groupby" iterator

Guido van Rossum guido at python.org
Fri Dec 5 10:22:42 EST 2003


> Greg Ewing's proposal of a "given" keyword (x.score given x) got me
> thinking. I figured I would play around a bit and try to come up with
> the most readable version of the original "groupby" idea (for which I
> could imagine *some* implementation):
> 
>     for group in sequence.groups(using item.score - item.penalty):
>         ...do stuff with group
> 
> Having written this down, it seems to me the most readable so far. The
> keyword "using" creates a new scope, within which "item" is bound to the
> arg (or *args?) passed in. I don't know about you all, but the thing I
> like least about lambda is having to mention 'x' twice:
> 
>     lambda x: x.score
> 
> Why have the programmer bind a custom name to an object we're going to
> then use 'anonymously' anyway? I understand its historical necessity,
> but it's always struck me as more complex than the concept being
> implemented. Ideally, we should be able to reference the passed-in
> objects without having to invent names for them.

Huh?  How can you reference something without having a name for it?
Are you proposing to add pronouns to Python?

> Now, consider multi-arg lambdas such as:
> 
>     sequence.sort(lambda x, y: cmp(x[0], y[0]))
> 
> In these cases, we wish to apply the same operation to each item (that
> is, we calculate x[0] and y[0]). If we bind "item" to each argument *in
> turn*, we save a lot of syntax. The above might then be written as:
>     sequence.sort(using cmp(item[0])) # Hard to implement.
> 
> or:
>     sequence.sort(cmp(using item[0])) # Easier but ugly. Meh.
> 
> or:
>     sequence.sort(cmp using item[0])  # Oooh. Nice. :)
> 
> or:
>     # might we assume cmp(), since sort does...?
>     sequence.sort(using item[0])
> 
> I like #3, since cmp is explicit but doesn't use cmp(), which looks too
> much like a call. Given (cmp using item[0]), the "using block" would
> look at the arguments supplied by sort(), call __getitem__[0] for each,
> and pass those values in order into cmp, returning the result.

There are lots of situations where the comparison lambda is just a bit
more complex than this, for example:

  lambda x, y: cmp(x[0], y[0]) or cmp(x[1], y[1])

And how would you spell lambda x, y: x+y?  "+ using item"???  That
becomes a syntactical nightmare.  (Or what about lambda x, y: 2*(x+y)?)

I also think you are cheating by using sort() as the example -- other
examples of multi-argument lambdas aren't necessarily so uniform in
the arguments.

> The "item" keyword functions similarly to Guido's Voodoo.foo() proposal,
> now that I think about it. There's no reason it couldn't grow some early
> binding, either, as suggested, although multiple operations would become
> unwieldy. How would you early-bind this?
> 
>     sequence.groups(using divmod(item, 4)[1])
> 
> ...except perhaps by using multiply-nested scopes to bind the "1" and
> then the "4"?

I see all sorts of problems with this, but early-binding "1" and "4"
aren't amongst them -- early binding only applies to free variables,
not to constants.

> Hmm. It would have to do some fancy dancing to get everything in the
> right order. Too much like reinventing Python to think about at the
> moment. :) The point is, passing the "item" instance through such a
> scheme should be the easy part.

I've read this whole post twice, and I still don't understand what
you're really proposing (or how it could ever work given the rest of
Python), so I think it's probably not a good idea from a readability
perspective...

--Guido van Rossum (home page: http://www.python.org/~guido/)



More information about the Python-Dev mailing list