unintuitive for-loop behavior

Sat Oct 1 09:15:20 EDT 2016

On Sat, 1 Oct 2016 09:33 pm, Gregory Ewing wrote:

> Steve D'Aprano wrote:
> 
>> # create a new binding
>> x: address 1234 ----> [  box contains 999 ]
>> x: address 5678 ----> [  a different box, containing 888 ]
> 
> In the context of CPython and nested functions, replace
> "box" with "cell".

Cells in Python are an implementation detail. It only applies to CPython (as
far as I know), certainly not to IronPython, and the interpreter takes care
to ensure that the user-visible behaviour of local variables in cells is
identical to the behaviour of name bindings in dicts.

The only user-visible semantic differences between variables in a dict and
variables in cells are:

- writing to locals() won't necessarily affect the actual locals;
- in Python 3 only, import * and exec in the local names space may be
prohibited.

Everything else (that I know of) is indistinguishable. Python takes care to
hide the differences, for example even though local variables have a cell
pre-allocated when the function is called, trying to access that cell
before a value is bound to the local gives a NameError (UnboundLocalError)
rather than accessing uninitialised memory, as a naive implementation might
have done.

In any case, I think this is going off on a rather wide tangent -- how is
this specifically relevant to the "unintuitive for-loop behavior" the OP
was surprised by?

> When I said "creating a new binding" I meant that the
> name x refers to different cells at different times.
> When I said "updating an existing binding" I meant that
> the name x still refers to the same cell, but that cell
> refers to a different object.

I don't believe that there is any test you can write in Python that will
distinguish those two cases. (Excluding introspection of implementation
details, such as byte-code, undocumented C-level structures, etc.) As far
as ordinary Python operations go, I don't see that there's any difference
between the two.

When you say:

    x = 0
    x = 1

inside a function, and the interpreter does the name binding twice, there's
no way of telling whether it writes to the same cell each time or not.
Apart from possible performance and memory use, what difference would it
make? It is still the same 'x' name binding, regardless of the
implementation details.

> In a wider context, replace "box" with "slot in a
> stack frame" or "slot in a namespace dictionary".
> 
>> But Python doesn't work that way! Variables aren't modelled by boxes in
>> fixed locations, and there is no difference between "create a new
>> binding" and "update an existing one".
> 
> There is very much a distintion. Each time you invoke
> a function, a new set of bindings is created for all of
> its parameters and local names. Assigning to those names
> within the function, on the other hand, updates existing
> bindings.

Certainly when you call a function, the local bindings need to be created.
Obviously they didn't exist prior to calling the function! I didn't think
that was the difference you were referring to, and I fail to see how it
could be relevant to the question of for-loop behaviour.

As I understood you, you were referring to assignments to *existing* names.
If a binding for x already exists, then and only then do you have a choice
between:

- update the existing binding for x;
- or create a new binding for x.

If there is no binding for x yet (such as before the function is called),
then you have no choice in the matter: you cannot possibly update what
doesn't exist.

-- 
Steve
“Cheer up,” they said, “things could be worse.” So I cheered up, and sure
enough, things got worse.