[Python-3000] Is this a bug with list comprehensions or not?

Nick Coghlan ncoghlan at gmail.com
Sat Jul 12 11:06:41 CEST 2008


Stefan Behnel wrote:
> Raymond Hettinger wrote:
>> I know this group doesn't care about Psyco, but it was
>> nice that psyco could handle listcomps just like it could with
>> regular for-loops.  Turning it into a genexp stops psyco in its tracks.

List/set/dict comprehensions in Py3k now have the same lexical scoping 
behaviour as generator expressions, but they do NOT create or execute 
any generators. They create a nested function to do the accumulation, 
call it, then use the return value from the function call as the result 
of the expression. While I've never used psyco myself, I'd be fairly 
surprised if some simple nested functions caused it any serious problems.

I apologise for not correcting that particular misapprehension earlier 
in the thread, but I haven't really thought about any of this for over a 
year so it has taken me a while to recall the details of the implementation.

>> Likewise, Cython won't be able to handle the semantics.
> 
> Regarding Cython, I expect that we will be able to implement this pretty soon,
> by translating the generator expression into an iterable extension class with
> local variables.
> 
> However, such an approach will obviously be a lot slower than a plain embedded
> C loop for literal list/tuple/set comprehension (as we currently generate for
> list comprehensions).

Comprehensions in Py3k are still just a simple accumulation loop at the 
Python level - they merely execute inside their own function scope now, 
rather than executing in the containing scope.

> So a better approach might be to actually apply a
> separate scoping rule to the iteration variable, such as renaming it into
> something that just can't be retrieved from the outside world. That way, it
> would still be available to everything inside the comprehension, but it can't
> leak anymore.

Until someone comes up with a mechanism other than creating an implicit 
function that allows closures that reference the iteration variables to 
continue to work, people are just retreading ground that we went over 
more than a year ago when this change was first implemented.

People are welcome to try of course, but please don't make the mistake 
of thinking that it will be easy or that we added the implicit function 
scopes just because we felt like it - they were added because they were 
the simplest and cleanest way we could find to allow closures in the 
body of the list comprehension to continue to work correctly while still 
hiding the iteration variables from the containing scope.

The additional overhead imposed by the Py3k approach relative to the 2.x 
approach is one function call plus the cost of constructing the function 
object. A list comprehension is still far faster than the equivalent 
generator expression:

$ ./python -m timeit "list; [x for x in [1]]"
1000000 loops, best of 3: 1.81 usec per loop
$ ./python -m timeit "list(x for x in [1])"
100000 loops, best of 3: 4.85 usec per loop

In addition, as I've noted elsewhere in this thread, the function 
creation and call overhead can be counterbalanced in module and class 
level code by the fact that the body of the comprehension will now 
benefit from the optimisations that are used for function local variables.

Cheers,
Nick.

P.S. For interest, I ran the above examples a few times using the other 
versions of Python I have on this machine. The results below were fairly 
typical of what I saw:

Python 2.5.1 (system python):
$ python -m timeit "list; [x for x in [1]]"
1000000 loops, best of 3: 1.11 usec per loop
$ python -m timeit "list(x for x in [1])"
100000 loops, best of 3: 5.91 usec per loop

Python 2.6b1+ (local build):
$ ./python -m timeit "list; [x for x in [1]]"
1000000 loops, best of 3: 1.1 usec per loop
$ ./python -m timeit "list(x for x in [1])"
100000 loops, best of 3: 4.32 usec per loop

(Interestingly, the Py3k genexps are currently coming up as consistently 
slower than their 2.6 counterparts for me. I'm not sure what could be 
causing that)

-- 
Nick Coghlan   |   ncoghlan at gmail.com   |   Brisbane, Australia
---------------------------------------------------------------
             http://www.boredomandlaziness.org


More information about the Python-3000 mailing list