[Python-ideas] For-loop variable scope: simultaneous possession and ingestion of cake

Fri Oct 3 12:37:39 CEST 2008

There's been another discussion on c.l.py about
the problem of

   lst = []
   for i in range(10):
     lst.append(lambda: i)
   for f in lst:
     print f()

printing 9 ten times instead of 0 to 9.

The usual response is to do

     lst.append(lambda i=i: i)

but this is not a very satisfying solution. For one
thing, it's still abusing default arguments, something
that lexical scoping was supposed to have removed the
need for, and it won't work in some situations, such
as if the function needs to take a variable number of
arguments.

Also, most other languages which have lexical scoping
and first-class functions don't seem to suffer from
problems like this. To someone familiar with one of
those languages (e.g. Scheme, Haskell) it looks as
if there's something broken about the way scoping of
nested functions works in Python.

However, it's not lambda that's broken, it's the for
loop.

In Scheme, for example, the way you normally write the
equivalent of a Python for-loop results in a new
scope being created for each value of the loop variable.

Previous proposals to make for-loop variables local to
the loop have stumbled on the problem of existing code
that relies on the loop variable keeping its value
after exiting the loop, and it seems that this is
regarded as a desirable feature.

So I'd like to propose something that would satisfy
both requirements:

0. There is no change if the loop variable is not
    referenced by a nested function defined in the loop
    body. The vast majority of loop code will therefore
    be completely unaffected.

1. If the loop variable is referenced by such a nested
    function, a new local scope is effectively created
    to hold each successive value of the loop variable.

2. Upon exiting the loop, the final value of the loop
    variable is copied into the surrounding scope, for
    use by code outside the loop body.

Rules 0 and 1 would also apply to list comprehensions
and generator expressions.

There is a very simple and efficient way to implement
this in current CPython: If the loop variable is referenced
by a nested function, it will be in a cell. Instead of
rebinding the existing cell, each time around the loop
a new cell is created, replacing the previous cell.
Immediately before exiting the loop, one more new cell
is created and the final value of the loop variable
copied into it.

Implementations other than CPython that aren't using
cells may need to do something more traditional, such
as compiling the loop body as a separate function.

I think this arrangement would allow almost all existing
code to continue working, and new code to be written
that takes advantage of final values of loop variables.

There would be a few obscure situations where the
results would be different, for example if a nested
function modifies the loop variable and expects the
result to be reflected in the value seen from outside
the loop. But I can't imagine such cases being anything
other than extremely rare.

The benefit would be that almost all code involving
loops and nested functions would behave intuitively,
Python would free itself from any remaining perception
of having broken scope rules, and we would finally be
able to consign the default-argument hack to the garbage
collector of history.

-- 
Greg