[Python-Dev] Possible resolution of generator expression variable capture dilemma

Wed Mar 24 00:37:56 EST 2004

Tim Peters:

> When scopes and lifetimes get intricate, it's easier to say what you
> mean in Scheme

Interesting you should mention that. Recently in c.l.py, there was a
discussion about the semantics of for-loops and nested scopes in which
someone claimed that Scheme's scoping works differently from Python's.

It doesn't, really. What's different is that Scheme's equivalent of a
for-loop creates a new binding for the loop variable on each
iteration. This tends to result in fewer late-binding surprises,
because you get the effect of variable capture -- but due to the
semantics of the for-loop, and other common constructs which introduce
new bindings, rather than anything that goes on inside them.

Let's consider Tim's famous example:

    pipe = source
    for p in predicates:
        # add a filter over the current pipe, and call that the new pipe
        pipe = e for e in pipe if p(e)

Originally it was claimed that the values of both 'p' and 'pipe' need
to be captured for this to do what is intended. However, if the
outermost iterator is to be pre-evaluated, that takes care of 'pipe'.

Now, if Python's for-loop were to create a new binding for the loop
variable on each iteration, as in Scheme, then that would take care of
'p' as well.

So, I propose that the semantics of for-loop variables be changed to
do exactly that (and no more than that).

I'm well aware that the suggestion of making the loop variable local
to the loop has been suggested several times before, and (at least
before Python 3.0) rejected.

However, I'm suggesting something different. I'm *not* proposing to
make it *local* to the loop -- its name will still reside in the
enclosing namespace, and its value will still be available after the
loop finishes. In fact, if it's not referenced from any nested scope,
there will be no change in semantics at all.

What *will* change is perhaps best explained by means of the
implementation, which is very simple. If the loop variable is
referenced from a nested scope, it will be held in a cell. Now, on
each iteration, instead of replacing the contents of the cell as a
normal assignment would, we create a *new* cell and re-bind the name
to the new cell.

That's all.

An advantage of this approach is that *all* forms of nested scope
(lambda, def, etc.) would benefit, not just generator expressions. I
suspect it would eradicate most of the few remaining uses for the
default-argument hack, for instance (which nested scopes were supposed
to do, but didn't).

Is there a list of Tim's wacky examples anywhere, so we can check how
many of them this would solve?

Greg Ewing, Computer Science Dept, +--------------------------------------+
University of Canterbury,	   | A citizen of NewZealandCorp, a	  |
Christchurch, New Zealand	   | wholly-owned subsidiary of USA Inc.  |
greg at cosc.canterbury.ac.nz	   +--------------------------------------+