[Python-Dev] Re: "groupby" iterator

Guido van Rossum guido at python.org
Wed Dec 3 10:00:04 EST 2003


(This thread has nothing to do with the groupby iterator any more, but
I'm loathe to change the subject since so many messages are already
under this one.)

(I've read quite a few messages posted after Greg's post, but Greg
still summarizes the issue best for me, *and* it has an alternative
idea that needs a response.)

[Greg Ewing]
> We seem to be talking about two different things in this thread,
> speed and readability. The nice thing about the "attrgetter.x"
> etc. idea is that it has the potential to provide both at once.
> 
> The nasty thing about it is that it smells a bit too much like a
> clever trick. It's really abusing the syntax to make it mean
> something different from what it usually means.

It is also somewhat weak in that it only addresses lambdas with one
argument, and only allows a single reference to that argument in the
resulting expression, and can't really be made to handle method calls
without more gross notational hacks -- even though it *can* be made to
handle arbitrary binary and unary operators.

Yet, it captures 90% of the use cases quite well.  I also wonder if
the simple trick of requiring to call a "constructor" on each use
might not make it more palatable.  I.e., instead of writing

  map(Voodoo.address[0], database)

you'd write

  map(Voodoo().address[0], database)

where you can replace Voodoo with a name of your choice, perhaps
operator.extract -- although I think this is too different to belong
in the operator module.  Nick Goghlan showed that a pretty readable
brief explanation *can* be written.

On the other hand...

> I think I like the idea of optimising lambda, but that doesn't do
> anything for the readability.

It's also been shown by now to be a bad idea -- the semantic
differences are too subtle (e.g. keyword args).

> So, how about a nicer syntax for lambda?  Maybe along the lines of
> 
>    x -> x.something
> 
> A bonus of introducing a new lambda syntax is that it would provide
> the opportunity to give it early-binding semantics for free
> variables, like generator expressions.

This is what everyone seems to expect and want of lambda anyway...

> The old lambda would have to be kept around for a while for programs
> relying on the old semantics, but it could be deprecated, and
> removed in 3.0.

I'm not sure that the -> notation is more understandable than lambda;
it would surely confuse C/C++ programmers who are new to Python.

Scary thought: how about simply introducing early-binding semantics
for lambda in 3.0?

Another radical idea would be to use an anonymous-block notation like
Smalltalk and Ruby.  We could use some kind of funky brackets like
[|...|].  A lambda would require an argument notation too.  I believe
Ruby uses [|x| x+1] where we would write lambda x: x+1, maybe we could
use [|x: x+1|].  (I like structure with an explicit close more than
open ones like lambda.)

Yet another far-out thought: I'd hoped to have gotten rid of most use
cases for lambda with list comprehensions, recently generalized into
generator expressions.  But we keep inventing things (like
list.sort(key=), and now groupby(key=)) that aren't expressible using
generator expressions.  Perhaps we should try harder to find a
generalization that covers these cases too, or to define APIs that
*can* be used with generator expressions?

For groupby, the best I can think of would be to change its API to
take an iterable of (key, value) pairs, so you could write:

  groupby((x.key, x) for x in sequence)

instead of

  groupby(sequence, lambda x: x.key)

but that doesn't work for list.sort(), where the sequence already
exists, and the whole point is to avoid having to make the explicit
decorate-sort-undecorate step.  (Well, the groupby does the decorate
part sort-of explicit and avoids the undecorate, so it gets there at
least halfway.)

I guess the most radical idea would be to have the scope of a
generator expression extend to other arguments of the same call, so
you could write

  groupby(x for x in sequence, x.key)

but that looks too subtle, not to mention ambiguous, and perhaps
unimplementable -- what if it was instead

  groupby(x.value for x in sequence, x.key)

???

--Guido van Rossum (home page: http://www.python.org/~guido/)



More information about the Python-Dev mailing list