[Python-3000] Is this a bug with list comprehensions or not?

Nick Coghlan ncoghlan at gmail.com
Sat Jul 12 02:43:03 CEST 2008


Raymond Hettinger wrote:
> From: "Nick Coghlan" <ncoghlan at gmail.com>
>> Georg and I tried doing it that way and had major problems trying to 
>> get it to work - the hard part is that the body of the list comp 
>> (which may include nested list comps, lambda functions and generator 
>> expressions) needs to be able to see the iteration variables exactly 
>> as they are, but the surrounding code shouldn't be able to see them at 
>> all.
> 
> Did you try just saving and restoring the variable so that hiding 
> wouldn't be necessary.
> 
> [for x in s]
>   -->
>   save(x) if it exists
>   do the listcomp
>   restore(x) if it was saved
> 
> 
>> As for whether or not this is worth fixing, I think getting rid of the 
>> current subtle differences between the scoping rules for list 
>> comprehensions and generator expressions is a very worthwhile outcome, 
>> even at the possible performance cost.
> 
> Difference of opinion here.
> All 2-to-3 conversions don't automatically work
> and when they don't, the reason is subtle, surprising, and weird.

2to3 should be able to detect and warn about the cases which don't work 
though (specifically, list comprehensions at module or class scope which 
refer to local variables somewhere other than in the outermost iterable 
expression). A patch to add such a warning would be very welcome.

There's nothing it can do about code which deletes the no longer leaked 
local variable, but that failure isn't subtle (and is also fairly easily 
fixed by changing the code to "var = None; del var").

> With genexps, you expect function scoping rules to apply
> (we explain genexps in terms of an equivalent generator).
> With set/list/dict comps, that is unnecessary, slow, and unexpected

I should probably add in to this discussion what the Py3k expansions of 
list, set and dict comprehensions actually *are* (they don't implicitly 
build genexps because that would be stupid and incredibly slow, but they 
do each build a single implicit function in order to get the scoping 
rules right):

 >>> import dis
 >>> def f():
...     return [x for x in range(10)]
...
 >>> dis.dis(f)
   2           0 LOAD_CONST               1 (<code object <listcomp> at 
0xb7c41260, file "<stdin>", line 2>)
               3 MAKE_FUNCTION            0
               6 LOAD_GLOBAL              0 (range)
               9 LOAD_CONST               2 (10)
              12 CALL_FUNCTION            1
              15 GET_ITER
              16 CALL_FUNCTION            1
              19 RETURN_VALUE
 >>> dis.dis(f.__code__.co_consts[1])
   2           0 BUILD_LIST               0
               3 DUP_TOP
               4 STORE_FAST               1 (_[1])
               7 LOAD_FAST                0 (.0)
         >>   10 FOR_ITER                13 (to 26)
              13 STORE_FAST               2 (x)
              16 LOAD_FAST                1 (_[1])
              19 LOAD_FAST                2 (x)
              22 LIST_APPEND
              23 JUMP_ABSOLUTE           10
         >>   26 RETURN_VALUE

So [x for x in range(10)] is shorthand for:

   def _f(itr):
       _[1] = []
       for x in itr:
           _[1].append(x)
       return _[1]
   expr_result = _f(range(10))

{x for x in range(10)} is shorthand for:

   def _f(itr):
       _[1] = set()
       for x in itr:
           _[1].add(x)
       return _[1]
   expr_result = _f(range(10))

And {x:x*x for x in range(10)} is shorthand for:

   def _f(itr):
       _[1] = {}
       for x in itr:
           _[1][x] = x*x
       return _[1]
   expr_result = _f(range(10))

While (x for x in range(10)) continues to be shorthand for:

   def _g(itr):
       for x in itr:
         yield x
   expr_result = _g(range(10))

It all looks simple, consistent and pretty straightforward to me. If pdb 
doesn't understand it, then it may be a good idea for someone to teach 
pdb what a disassembled comprehension looks like in Py3k.

Those expansions should also make it clear why this change actually 
results in a speed increase of module and class level list 
comprehensions above a certain size - the iteration variables become 
function locals, so they benefit from function local speedups. The exact 
point where the speedup from that is greater than the cost of a single 
function call will vary based on the complexity of the body of the list 
comprehension (most notable the number of times the iteration variables 
are referred to).

$ ./python -m timeit "exec('[x*x for x in range(100)]')"
10000 loops, best of 3: 125 usec per loop
$ python -m timeit "exec '[x*x for x in range(100)]'"
10000 loops, best of 3: 121 usec per loop
$ ./python -m timeit "exec('[x*x for x in range(1000)]')"
1000 loops, best of 3: 418 usec per loop
$ python -m timeit "exec '[x*x for x in range(1000)]'"
1000 loops, best of 3: 723 usec per loop

(exec is used in the above examples because the timeit module otherwise 
runs all supplied code in a function scope, so the speedup in Py3k list 
comprehensions isn't visible. The version of Python in the local 
directory is a fairly up to date build from the Py3k branch, the system 
Python is 2.5.1)

 > (we should be able to explain those in terms of an equivalent
 > in-lined for-loop with the loop variable restricted to the scope of
 > the loop).

That only sounds reasonable until you try to get the following two 
pieces of code to produce the same answer (example lifted from 
test_listcomps.py):

def test_func_listcomp():
     items = [(lambda: i) for i in range(5)]
     i = 20
     return [x() for x in items]

def test_func_genexp():
     items = list((lambda: i) for i in range(5))
     i = 20
     return [x() for x in items]

In current Py3k, they do produce the same answer ([4]*5) and in current 
2.x they produce different, but sensible answers ([20]*5 for the list 
comprehension, [4]*5 for the generator expression).

With the "just hide the iteration variable" approach it is highly 
unclear what the list comprehension version should produce.

Cheers,
Nick.

-- 
Nick Coghlan   |   ncoghlan at gmail.com   |   Brisbane, Australia
---------------------------------------------------------------
             http://www.boredomandlaziness.org


More information about the Python-3000 mailing list