[Python-Dev] Iteration variables and list comprehensions
Tim Peters
tim.one@home.com
Wed, 30 May 2001 03:47:47 -0400
[David Beazley]
> ...
> However, I've also been shooting myself in the foot a little more
> than usual
> ...
> Because of this, I have frequently found myself debugging the
> following programming error:
If "frequently" is "a little more than usual", then it sounds like your
problems in all areas are too common for us to really help you by fixing
this one <wink>.
OK, I'm afraid the behavior follows from taking seriously the idea that
listcomps are syntactic sugar for a specific pattern of nested loops and
"if" tests. That was done to make it explainable, and the correspondence is
indeed exact. The implementation already creates "invisible" names:
>>> [repr(name) for name in globals().keys()]
["'__builtins__'", "'__name__'", "'name'", "'__doc__'", "'_[1]'"]
>>>
Where did "_[1]" come from? You guessed it. Look for it after the listcomp
finishes and it's gone:
>> globals().keys()
'__builtins__', '__name__', 'name', '__doc__']
>>
It's invisible because it's a temp var you *wouldn't* see in the equivalent
loop nest.
> ...
> Therefore, I'm wondering if it would make any sense to make the
> iterator variables used inside of a list comprehension private in some
> manner
I'm not sure it's worth losing the exact correspondence with nested loops;
or that it's not worth it either. Note that "the iterator variables"
needn't be bare names:
>>> class x:
... pass
...
>>> [1 for x.i in range(3)]
[1, 1, 1]
>>> x.i
2
>>>
This complicates explaining exactly how you want to deviate from the
for-loop model. So, I think, does this:
>>> [i for i in range(2) for i in range(2, 5)]
[2, 3, 4, 2, 3, 4]
>>>
That is, even in simple cases, is the desired scope attached to the "for" or
to the "[]"? Python doesn't have a problem with reusing a name as a for
target in nested loops (or in listcomps today).
> ...
> Just as an aside, I have never intentionally used the iterator
> variable of a list comprehension after the operation has completed.
Not even in a debugger, when the operation has completed via unexpected
exception, and you're desperate to know what the control vrbl was bound to
at the time of death? Or in an exception handler?
>>> import sys
>>> try:
... [i*i for i in xrange(sys.maxint)]
... except OverflowError:
... raise OverflowError("oops! blew up at %d" % i)
...
Traceback (most recent call last):
File "<stdin>", line 4, in ?
OverflowError: oops! blew up at 46341
>>>
Or what about:
i = 12
def f():
print i
return [i for i in range(i)]
f()
1. Should "print i" print 12, or raise UnboundLocalError?
2. Does the "i" in "range(i)" refer to the global i, or is that just
senseless?
So long as the for-loop model is followed faithfully, nothing is hard to
explain or predict, and simply because there's nothing truly new.
> I was actually quite surprised with this behavior the first time I saw
> it.
Me too <wink>.
> I suspect most other programmers would not anticipate this side
> effect either.
I share the suspicion, but am not sure why: "for" is a binding construct in
Python, so being surprised by "for" binding a name is itself surprising.
Another principled model is possible, where
[f(i) for i in whatever]
is treated like
(lambda: [f(i) for i in whatever])()
>>> i = 12
>>> (lambda: [i**2 for i in range(4)])()
[0, 1, 4, 9]
>>> i
12
>>>
That's more like Haskell does it. But the day we explain a Python construct
in terms of a lambda transformation is the day Guido kills all of us <wink>.