[Python-ideas] For-loop variable scope: simultaneous possession and ingestion of cake

Mon Oct 6 13:54:53 CEST 2008

> There is a very simple and efficient way to implement
> this in current CPython: If the loop variable is referenced
> by a nested function, it will be in a cell. Instead of
> rebinding the existing cell, each time around the loop
> a new cell is created, replacing the previous cell.
> Immediately before exiting the loop, one more new cell
> is created and the final value of the loop variable
> copied into it.
[skip]
> The benefit would be that almost all code involving
> loops and nested functions would behave intuitively,

+1 from me too.

Neil Toronto wrote:
>
> Spanking good point. To hack this "properly" all cell variables closed over within the loop would have to go into the per-iteration scope.

Agreed. And to preserve current semantics these values would need to
be copied to the new scope of every next iteration (if it's closed
over).

>> It seems an odd sort of scope that lets rebindings inside it fall through outwards.
>
> True, but a lot of Python programs would depend on this - even new ones because of social inertia.

It'll unfortunately have to wait til python 4000 :).

I like finally fixing this problem, which I've also run into. But I
don't like the idea of introducing a new keyword to create new scopes.

I think all variable that are assigned to in a loop body and closed
over should be put into a cell as Greg proposes, not just the index
variable. (In both for and while loops.) At the end of each loop
iteration all such variables would need to be copied to the 'next'
scope, which could be the parent scope or the next iteration.

I'm trying really hard to think about cases that would break if this
new behaviour was introduced, but I can't think about anything. The
only thing that would 'break' is if you would want the standard
example of

 lst = []
 for i in range(10):
   lst.append(lambda: i)
 for f in lst:
   print f()

to actually print 9 times 10. But if you want your bunch of lambda
functions that you create in the loop body to all refer to the last
value of i, why on earth would you even attempt to create a whole
bunch of lambdas in this way?? (except to show that for loops are
broken)

Dillon Collins wrote:
>>
>> Upon exiting the loop, the final value of the loop variable is copied into
>> the surrounding scope
>
>
>
> So we are already dealing with at least one implicit copy.  This is why I
> think that the loop scoping suggestion should basically be dropped in favor
> of some sort of scope block.

Are you suggesting doing something with the scope of all block
constructs? If the variables of the block are copied into their outer
scope at the end, there really is allmost no difference between, say,
an if block with it's own scope and the if blocks we have now. The
only time it matters is if the block is executed multiple times and if
a variable is closed over in it. So that's only in loops that create
functions/lambdas in their bodies.
If you are suggesting to only introduce a new 'scope' keyword or
something like that and leave loops alone, I'd say I would prefer to
fix the loops without introducing new grammar.

Greg wrote:
>
> More generally, Python's inability to distinguish
> clearly between creating new bindings and changing
> existing bindings interacts badly with nested
> functions.
>
> I agree that the wider problem needs to be addressed
> somehow, and perhaps that should be the starting
> point. Depending on the solution adopted, we can
> then look at whether a change to the for-loop is
> still needed.

The (IMO) natural approach that many functional-minded languages take
is to have each block-like structure create a new scope. I think your
proposal with cells for loop variables would fix it for loops, if it
is applied to all variables closed over in a loop. But in other block
structures that aren't executed repeatedly, this problem doesn't come
up. Are there any other problem cases that aren't taken care of by
this proposal or the 'nonlocal' keyword?

Andrew Clover wrote:
> You mean:
>
>    >>> i= 0
>    >>> geti= lambda: i
>
>    >>> for i in [1]:
>    ...     print i is geti()
>    True
>
>    >>> for i in [1]:
>    ...     dummy= lambda: i
>    ...     print i is geti()
>    False

This does point at an implementation gotcha. The global i and the i in
the 'current' loop scope have to be kept in sync. But actually having
two variables and keeping them synchronized so they appear to be one
is near impossible due to pythons dynamic nature and multithreading.
So this would require the variable i to be both a global variable and
a cell variable. The module namespace dict would need to point to the
cell containing i. This would require the python interpreter to check
on every global lookup if it is looking at the value itself or at a
cell containing the value. Perhaps this could be done with an extra
tag bit on the value pointers in the module namespace dictionaries.

So for some annotated code:

 >>> i = 0               # globals()['i'] = 0
 >>> geti = lambda: i    # The lambda looks up its 'i' in the global namespace
 >>> for i in [1]:       # The global is put into a cell if it is not already:
                         # globals()['i'] = cell(globals()['i'])
                         # Each iteration updates the cell with the new value
                         # from the iterator:
                         # globals()['i'].cellvalue = next(_iterator)
 ...    dummy = lambda: i
 ...    i += 1           # globals()['i'].cellvalue += 1
 ...    print i, i == geti()
 ...                     # At the end of every loop iteration
                         # the cell is replaced with a fresh cell:
                         # globals()['i'] = cell( globals()['i'].cellvalue )
 2 True

Whether a global variable contains a value or a cell is something the
interpreter needs to check at runtime, but I think it would not have
to be very expensive.