[Python-3000] List & set comprehensions patch

Nick Coghlan ncoghlan at gmail.com
Tue Mar 6 16:46:28 CET 2007


Georg and I have been working on the implementation of list 
comprehensions which don't leak their iteration variables, along with 
the implementation of set comprehensions. The latest patch can be found 
as SF patch #1660500 [1]. The file new-set-comps.diff is the combined 
patch which implements both features, and unifies handling of the 
different kinds of comprehension. In an effort to improve readability, 
the patch also converts the sets in symtable.c to be actual PySet 
objects, rather than PyDict objects with None keys and tries to reduce 
the number of different meanings assigned to the term 'scope'.

One of the comments made on Georg's initial attempt at implementing 
these features was that it would be nice to avoid the function call 
overhead in the listcomp & setcomp case (as it appears at first glance 
that the internal scope can be temporary). I tried to do that and 
essentially failed outright - working through symtable.c and compile.c, 
I found that dealing with the scoping issues created by the possibility 
of nested genexps, lambdas and list or set comprehensions would pretty 
much require reimplementing all of the scoping rules that functions 
already provide.

Here's an example of the scoping issues from the new test_listcomps.py 
that forms part of the patch:

     >>> def test_func():
     ...     items = [(lambda: i) for i in range(5)]
     ...     i = 20
     ...     return [x() for x in items]
     >>> test_func()
     [4, 4, 4, 4, 4]

Without creating an actual function object for the body of the list 
comprehension, it becomes rather difficult to get the lambda expression 
closure to resolve to the correct value.

For list comprehensions at module or class scope, the introduction of 
the function object can actually lead to a speed increase as the 
iteration variables and accumulation variable become function locals 
instead of module globals. Inside a function, however, the additional 
function call overhead slows things down.

Some specific questions related to the current patch:

In implementing it, I discovered that list comprehensions don't do 
SETUP_LOOP/POP_BLOCK around their for loop - I'd like to get 
confirmation from someone who knows their way around the ceval loop 
better than I do that omitting those is actually legitimate (I *think* 
the restriction to a single expression in the body of the comprehension 
makes it OK, but I'm not sure).

There are also a couple of tests we had to disable - one in test_dis, 
one in test_grammar. Suggestions on how to reinstate those (or agreement 
that it is OK to get rid of them) would be appreciated.

The PySet update code in symtable.c currently uses PyNumber_InplaceOr 
with a subsequent call to Py_DECREF to counter the implicit call to 
Py_INCREF. Should this be changed to use PyObject_CallMethod to invoke 
the Python level update method?

There are also two backwards compatibility problems which came up:

   - code which explicitly deleted the listcomp variable started 
throwing NameErrors. Several tweaks were needed in the standard library 
to fix this.

   - only the outermost iterator expression is evaluated in the scope 
containing the comprehension (just like generator expressions). This 
means that the inner expressions can no longer see class variables and 
values in explicit locals() dictionaries provided to exec & friends. 
This didn't actually cause any problems in the standard library - I only 
note it because my initial implementation mistakenly evaluated the 
outermost iterator in the new scope, which *did* cause severe problems 
along these lines.

Regards,
Nick

[1] http://www.python.org/sf/1660500

-- 
Nick Coghlan   |   ncoghlan at gmail.com   |   Brisbane, Australia
---------------------------------------------------------------
             http://www.boredomandlaziness.org


More information about the Python-3000 mailing list