[Python-ideas] With clauses for generator expressions

Andrew Barnert abarnert at yahoo.com
Thu Nov 15 16:25:14 CET 2012


From: Nick Coghlan <ncoghlan at gmail.com>
Sent: Thu, November 15, 2012 4:39:42 AM


>On Thu, Nov 15, 2012 at 9:11 PM, Andrew Barnert <abarnert at yahoo.com> wrote:


> One, and only one, clause in a comprehension or generator expression is written 
>
> out of sequence: the innermost clause is lifted out and written first.

Given that there are only three clauses, "flatten in order, then move expression 
to front" and "flatten in reverse order, then move if clause to back" are 
identical. I suppose you're right that, given that the rule for nested 
expressions is to preserve the order of nesting, the first description is more 
natural.

But at any rate, I don't think any such rule is what most Python programmers 
have internalized. People obviously know how to nest clauses in general (we 
couldn't speak human languages otherwise), but they do not know how to write, or 
even read, nested comprehensions. What they know is that there are three 
clauses, and they go expression-for-if, period. And those who do learn about 
nesting seem to guess the order wrong at least half the time (hence all the 
StackOverflow posts on "why does [x for x in range(y) for y in range(5)] give me 
a NameError?").

> So *if* a context management clause was added to comprehension syntax, it 
> couldn't reasonably be added using the same design as was used to determine the 
>
> placement of the current iteration and filtering clauses.

Sure it could. If you want to flatten in order, then lift the innermost 
expression to the left, exactly my "option 3". And any of the three options nest 
just as well as current generator expressions. This follows exactly the same 
rules you described:

(line with open(path, 'r') as file for line in file if line 
 with open('filelist', 'r') as filelistfile for path in filelistfile)

I personally find option 1 more readable than option 3, but as I said, that's 
just a bare intuition, and I'm not married to it at all.

> If the determination is "place it at the end, and affect the whole 
> comprehension/generator expression regardless of the number of clauses"

I didn't even consider that. For the nested case, each for clause can have 0 or 
1 with clauses, just as it can have 0 or 1 if clauses. I can't see how anything 
else is reasonable—how else could you handle the example above without keeping 
hundreds of files open unnecessarily? Of course for the non-nested case, there 
is no real distinction between "at the end" and "at the end of each nesting 
level"…

> you're now very close to the point of it making more sense to propose 
allowing 
> context managers on arbitrary expressions

No, not at all. If you read the blog post I linked, I explain this. But I'll try 
to answer your points below.

> as it would be impossible to explain why this was allowed:
> 
>     lines = list(line for line in f with open(name) as f)
>
> But this was illegal:
> 
>     lines = (f.readlines() with open(name) as f)

Those aren't at all the same. Or, if they are the same, the first one doesn't 
work.

In the second one, your with expression clearly modifies the expression 
f.readlines(), so the file is open until f.readlines() finishes. Great. 

But in the first, it clearly modifies f, so the file is open until f 
finishes—that is, it gets closed immediately. There may be other things you 
could attach it to, but I can't think of any way either a human or a computer 
could parse it as being attached to the iteration inside the implicit generator 
created by the larger expression that this expression is a part of. And you're 
going to have the exact same problem with, e.g., "lines = (f with open(name) as 
f)"—the only useful thing this could do is attach to something *inside the file 
object*, which is ridiculous. 

In fact, any statement that cannot be trivially rewritten as "with open(name) as 
f: lines = f.readlines()" also cannot possibly work right using a general with 
expression. Which means it's useless.

The only way you could possibly make this work is to make a context manager mean 
different things in different kinds of expressions. That's a horribly bad idea. 
It means you're building the with clause that I wanted, and a variety of other 
with clauses, all of which look similar enough to confuse both parsers and human 
beings, despite doing different things. It's like suggesting that we don't need 
if clauses in generator expressions because the more general ternary if 
expression already takes care of it.

So, in short, adding general with expressions not only doesn't solve my problem, 
it makes my problem harder to solve. And it's a bad idea in general, because it 
only works in cases where it's not needed. So, I'm very strongly against it.

> Generator expressions, like lambda expressions, are deliberately limited. If 
>you 
> want to avoid those limits, it's time to upgrade to a named generator or 
> function. If you feel that puts things in the wrong order in your code then 
> please, send me your use cases so I can considering adding them as examples in 

> PEP 403 and PEP 3150.

My use case was at the top of my first email:

    upperlines = (line.upper() for line in file with open(path, 'r') as file)

This is the kind of thing beginners need to write and don't know how, and 
experienced developers do all the time in quick-n-dirty scripts and don't do 
properly even if they do know how, because it's difficult to write today.

And I'm not sure how PEP 403 or PEP 3150 would help.

> If you really want to enhance the capabilities of 
> expressions, then the more general proposal is the only one with even a remote 

> chance

I don't want to do things like turn assignments into expressions, add general 
with expressions, add bind expressions, etc. I want to make iterating over a 
context manager easy, and the with clause is the best way I could come up with 
to do it.

> and that's contingent on proving that the compiler can be set up to give 
> generator expressions the semantics you propose (Off the top of my head, I  
> suspect it should be possible, since the compiler will know it's in a  
> comprehension or generator expression by the time it hits the with  token, but 

> there may be other technical limitations that ultimately rule  it out). 


I plan to experiment with implementing it in PyPy over the weekend, and if that 
works out I'll take a look at CPython. But I don't see any reason that options 1 
or 2 should be any problem; option 3 might be, but I'll see when I get there.



More information about the Python-ideas mailing list