[Python-Dev] Iteration variables and list comprehensions

Wed, 30 May 2001 03:47:47 -0400

[David Beazley]
> ...
> However, I've also been shooting myself in the foot a little more
> than usual
> ...
> Because of this, I have frequently found myself debugging the
> following programming error:

If "frequently" is "a little more than usual", then it sounds like your
problems in all areas are too common for us to really help you by fixing
this one <wink>.

OK, I'm afraid the behavior follows from taking seriously the idea that
listcomps are syntactic sugar for a specific pattern of nested loops and
"if" tests.  That was done to make it explainable, and the correspondence is
indeed exact.  The implementation already creates "invisible" names:

>>> [repr(name) for name in globals().keys()]
["'__builtins__'", "'__name__'", "'name'", "'__doc__'", "'_[1]'"]
>>>

Where did "_[1]" come from?  You guessed it.  Look for it after the listcomp
finishes and it's gone:

>> globals().keys()
'__builtins__', '__name__', 'name', '__doc__']
>>

It's invisible because it's a temp var you *wouldn't* see in the equivalent
loop nest.

> ...
> Therefore, I'm wondering if it would make any sense to make the
> iterator variables used inside of a list comprehension private in some
> manner

I'm not sure it's worth losing the exact correspondence with nested loops;
or that it's not worth it either.  Note that "the iterator variables"
needn't be bare names:

>>> class x:
...     pass
...
>>> [1 for x.i in range(3)]
[1, 1, 1]
>>> x.i
2
>>>

This complicates explaining exactly how you want to deviate from the
for-loop model.  So, I think, does this:

>>> [i for i in range(2) for i in range(2, 5)]
[2, 3, 4, 2, 3, 4]
>>>

That is, even in simple cases, is the desired scope attached to the "for" or
to the "[]"?  Python doesn't have a problem with reusing a name as a for
target in nested loops (or in listcomps today).

> ...
> Just as an aside, I have never intentionally used the iterator
> variable of a list comprehension after the operation has completed.

Not even in a debugger, when the operation has completed via unexpected
exception, and you're desperate to know what the control vrbl was bound to
at the time of death?  Or in an exception handler?

>>> import sys
>>> try:
...     [i*i for i in xrange(sys.maxint)]
... except OverflowError:
...     raise OverflowError("oops! blew up at %d" % i)
...
Traceback (most recent call last):
  File "<stdin>", line 4, in ?
OverflowError: oops! blew up at 46341
>>>

Or what about:

i = 12
def f():
    print i
    return [i for i in range(i)]
f()

1. Should "print i" print 12, or raise UnboundLocalError?

2. Does the "i" in "range(i)" refer to the global i, or is that just
   senseless?

So long as the for-loop model is followed faithfully, nothing is hard to
explain or predict, and simply because there's nothing truly new.

> I was actually quite surprised with this behavior the first time I saw
> it.

Me too <wink>.

> I suspect most other programmers would not anticipate this side
> effect either.

I share the suspicion, but am not sure why:  "for" is a binding construct in
Python, so being surprised by "for" binding a name is itself surprising.

Another principled model is possible, where

    [f(i) for i in whatever]

is treated like

    (lambda: [f(i) for i in whatever])()

>>> i = 12
>>> (lambda: [i**2 for i in range(4)])()
[0, 1, 4, 9]
>>> i
12
>>>

That's more like Haskell does it.  But the day we explain a Python construct
in terms of a lambda transformation is the day Guido kills all of us <wink>.