[Python-3000] Is this a bug with list comprehensions or not?
Nick Coghlan
ncoghlan at gmail.com
Sat Jul 12 02:43:03 CEST 2008
Raymond Hettinger wrote:
> From: "Nick Coghlan" <ncoghlan at gmail.com>
>> Georg and I tried doing it that way and had major problems trying to
>> get it to work - the hard part is that the body of the list comp
>> (which may include nested list comps, lambda functions and generator
>> expressions) needs to be able to see the iteration variables exactly
>> as they are, but the surrounding code shouldn't be able to see them at
>> all.
>
> Did you try just saving and restoring the variable so that hiding
> wouldn't be necessary.
>
> [for x in s]
> -->
> save(x) if it exists
> do the listcomp
> restore(x) if it was saved
>
>
>> As for whether or not this is worth fixing, I think getting rid of the
>> current subtle differences between the scoping rules for list
>> comprehensions and generator expressions is a very worthwhile outcome,
>> even at the possible performance cost.
>
> Difference of opinion here.
> All 2-to-3 conversions don't automatically work
> and when they don't, the reason is subtle, surprising, and weird.
2to3 should be able to detect and warn about the cases which don't work
though (specifically, list comprehensions at module or class scope which
refer to local variables somewhere other than in the outermost iterable
expression). A patch to add such a warning would be very welcome.
There's nothing it can do about code which deletes the no longer leaked
local variable, but that failure isn't subtle (and is also fairly easily
fixed by changing the code to "var = None; del var").
> With genexps, you expect function scoping rules to apply
> (we explain genexps in terms of an equivalent generator).
> With set/list/dict comps, that is unnecessary, slow, and unexpected
I should probably add in to this discussion what the Py3k expansions of
list, set and dict comprehensions actually *are* (they don't implicitly
build genexps because that would be stupid and incredibly slow, but they
do each build a single implicit function in order to get the scoping
rules right):
>>> import dis
>>> def f():
... return [x for x in range(10)]
...
>>> dis.dis(f)
2 0 LOAD_CONST 1 (<code object <listcomp> at
0xb7c41260, file "<stdin>", line 2>)
3 MAKE_FUNCTION 0
6 LOAD_GLOBAL 0 (range)
9 LOAD_CONST 2 (10)
12 CALL_FUNCTION 1
15 GET_ITER
16 CALL_FUNCTION 1
19 RETURN_VALUE
>>> dis.dis(f.__code__.co_consts[1])
2 0 BUILD_LIST 0
3 DUP_TOP
4 STORE_FAST 1 (_[1])
7 LOAD_FAST 0 (.0)
>> 10 FOR_ITER 13 (to 26)
13 STORE_FAST 2 (x)
16 LOAD_FAST 1 (_[1])
19 LOAD_FAST 2 (x)
22 LIST_APPEND
23 JUMP_ABSOLUTE 10
>> 26 RETURN_VALUE
So [x for x in range(10)] is shorthand for:
def _f(itr):
_[1] = []
for x in itr:
_[1].append(x)
return _[1]
expr_result = _f(range(10))
{x for x in range(10)} is shorthand for:
def _f(itr):
_[1] = set()
for x in itr:
_[1].add(x)
return _[1]
expr_result = _f(range(10))
And {x:x*x for x in range(10)} is shorthand for:
def _f(itr):
_[1] = {}
for x in itr:
_[1][x] = x*x
return _[1]
expr_result = _f(range(10))
While (x for x in range(10)) continues to be shorthand for:
def _g(itr):
for x in itr:
yield x
expr_result = _g(range(10))
It all looks simple, consistent and pretty straightforward to me. If pdb
doesn't understand it, then it may be a good idea for someone to teach
pdb what a disassembled comprehension looks like in Py3k.
Those expansions should also make it clear why this change actually
results in a speed increase of module and class level list
comprehensions above a certain size - the iteration variables become
function locals, so they benefit from function local speedups. The exact
point where the speedup from that is greater than the cost of a single
function call will vary based on the complexity of the body of the list
comprehension (most notable the number of times the iteration variables
are referred to).
$ ./python -m timeit "exec('[x*x for x in range(100)]')"
10000 loops, best of 3: 125 usec per loop
$ python -m timeit "exec '[x*x for x in range(100)]'"
10000 loops, best of 3: 121 usec per loop
$ ./python -m timeit "exec('[x*x for x in range(1000)]')"
1000 loops, best of 3: 418 usec per loop
$ python -m timeit "exec '[x*x for x in range(1000)]'"
1000 loops, best of 3: 723 usec per loop
(exec is used in the above examples because the timeit module otherwise
runs all supplied code in a function scope, so the speedup in Py3k list
comprehensions isn't visible. The version of Python in the local
directory is a fairly up to date build from the Py3k branch, the system
Python is 2.5.1)
> (we should be able to explain those in terms of an equivalent
> in-lined for-loop with the loop variable restricted to the scope of
> the loop).
That only sounds reasonable until you try to get the following two
pieces of code to produce the same answer (example lifted from
test_listcomps.py):
def test_func_listcomp():
items = [(lambda: i) for i in range(5)]
i = 20
return [x() for x in items]
def test_func_genexp():
items = list((lambda: i) for i in range(5))
i = 20
return [x() for x in items]
In current Py3k, they do produce the same answer ([4]*5) and in current
2.x they produce different, but sensible answers ([20]*5 for the list
comprehension, [4]*5 for the generator expression).
With the "just hide the iteration variable" approach it is highly
unclear what the list comprehension version should produce.
Cheers,
Nick.
--
Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia
---------------------------------------------------------------
http://www.boredomandlaziness.org
More information about the Python-3000
mailing list