[Python-3000] List & set comprehensions patch

Nick Coghlan ncoghlan at gmail.com
Wed Mar 7 15:11:05 CET 2007


Guido van Rossum wrote:
> Quick responses from just reading the email (I'll try to review the 
> code later today, I'm trying to do Py3k work all day):
> 
> On 3/6/07, Nick Coghlan <ncoghlan at gmail.com> wrote:
>> Georg and I have been working on the implementation of list 
>> comprehensions which don't leak their iteration variables, along
>> with the implementation of set comprehensions. The latest patch can
>> be found as SF patch #1660500 [1]. The file new-set-comps.diff is
>> the combined patch which implements both features, and unifies
>> handling of the different kinds of comprehension. In an effort to
>> improve readability, the patch also converts the sets in symtable.c
>> to be actual PySet objects, rather than PyDict objects with None
>> keys and tries to reduce the number of different meanings assigned
>> to the term 'scope'.
> 
> Maybe you could separate that out into a separate patch so it won't 
> old up review or work on the main patch? Or is there a stronger 
> connection?

I think they can be separated - I didn't do it before posting the patch
on SF as I was well and truly over symtable.c by that point :)

I'll try to have a look at splitting them this weekend.

> How about determining if it's a *simple* case or not, and doing the 
> variable renaming in the simple case and Georg's original version in 
> non-simple cases? You can define "simple" as whatever makes the 
> determination easy and still treats most common cases as simple. E.g.
>  a lambda would be a non-simple case, and so would using a nonlocal
> or global variable (note though that nonlocal and global should reach
>  inside the list/set comp!) etc.

It sounds feasible, but I don't think the lack of it should prevent the
change from going in. The implementation in the patch *is* a regression
from a speed point of view, but it's also an obvious target for later
optimisation - if we find a function scope that cannot outlive it's
parent scope (such as a simple comprehension), then we can inline that
code in the parent scope by using renamed variables.

However, there's then further weirdness with what effect this has on the
contents of locals()...

It looks like even nailing down what 'simple' means in this situation
will be somewhat tricky :P

>> In implementing it, I discovered that list comprehensions don't do 
>> SETUP_LOOP/POP_BLOCK around their for loop - I'd like to get 
>> confirmation from someone who knows their way around the ceval loop
>>  better than I do that omitting those is actually legitimate (I
>> *think* the restriction to a single expression in the body of the
>> comprehension makes it OK, but I'm not sure).
> 
> They exist to handle break/continue. Since those don't apply to 
> list/set comps, it's safe.

Excellent - that's what I thought they were about. Does yield need them?
If it doesn't, we can probably also remove them from the bytecode
emitted for generator expressions.

>> There are also a couple of tests we had to disable - one in
>> test_dis, one in test_grammar. Suggestions on how to reinstate
>> those (or agreement that it is OK to get rid of them) would be
>> appreciated.
> 
> I'll have to look later.

According to Georg, it's only the test_dis one that needs a closer look 
- the removed grammar test was declared invalid some time ago.

>> The PySet update code in symtable.c currently uses
>> PyNumber_InplaceOr with a subsequent call to Py_DECREF to counter
>> the implicit call to Py_INCREF. Should this be changed to use
>> PyObject_CallMethod to invoke the Python level update method?
> 
> What's wrong with the inplace or? I seem to recall that s |= x and 
> s.update(x) aren't equivalent if x is not a set.

A while back when Barry wanted to add PySet_Update, Raymond recommended
using PyObject_CallMethod instead. In this case, I know I'm dealing with
real set objects, so the inplace-or works fine (it's annoyingly easy to
forget the decref and leak a reference though... all the more reason to
leave that part of the patch out for the moment, I guess)

>> - only the outermost iterator expression is evaluated in the scope 
>> containing the comprehension (just like generator expressions).
>> This means that the inner expressions can no longer see class
>> variables and values in explicit locals() dictionaries provided to
>> exec & friends. This didn't actually cause any problems in the
>> standard library - I only note it because my initial implementation
>> mistakenly evaluated the outermost iterator in the new scope, which
>> *did* cause severe problems along these lines.
> 
> This smells fishy. Do you have an example?

The following example uses a generator expression to show the effect of
the semantic change:

.>>> class Dummy:
...     x = 10
...     print [x for i in range(5)]
...     print list(x for i in range(5))
...
[10, 10, 10, 10, 10]
Traceback (most recent call last):
   File "<stdin>", line 1, in ?
   File "<stdin>", line 4, in Dummy
   File "<stdin>", line 4, in <generator expression>
NameError: global name 'x' is not defined

With list comprehensions also switching to private iteration variables,
the first print statement will throw a NameError instead of printing the
list.

This was brought to my attention by the fact that my original buggy 
implementation blew up on really common things like "[i for i in seq if 
cond(i)]" when the code was executed with execfile() rather than being 
imported.

Cheers,
Nick.

-- 
Nick Coghlan   |   ncoghlan at gmail.com   |   Brisbane, Australia
---------------------------------------------------------------
             http://www.boredomandlaziness.org


More information about the Python-3000 mailing list