[Python-ideas] With clauses for generator expressions
Andrew Barnert
abarnert at yahoo.com
Thu Nov 15 16:25:14 CET 2012
From: Nick Coghlan <ncoghlan at gmail.com>
Sent: Thu, November 15, 2012 4:39:42 AM
>On Thu, Nov 15, 2012 at 9:11 PM, Andrew Barnert <abarnert at yahoo.com> wrote:
> One, and only one, clause in a comprehension or generator expression is written
>
> out of sequence: the innermost clause is lifted out and written first.
Given that there are only three clauses, "flatten in order, then move expression
to front" and "flatten in reverse order, then move if clause to back" are
identical. I suppose you're right that, given that the rule for nested
expressions is to preserve the order of nesting, the first description is more
natural.
But at any rate, I don't think any such rule is what most Python programmers
have internalized. People obviously know how to nest clauses in general (we
couldn't speak human languages otherwise), but they do not know how to write, or
even read, nested comprehensions. What they know is that there are three
clauses, and they go expression-for-if, period. And those who do learn about
nesting seem to guess the order wrong at least half the time (hence all the
StackOverflow posts on "why does [x for x in range(y) for y in range(5)] give me
a NameError?").
> So *if* a context management clause was added to comprehension syntax, it
> couldn't reasonably be added using the same design as was used to determine the
>
> placement of the current iteration and filtering clauses.
Sure it could. If you want to flatten in order, then lift the innermost
expression to the left, exactly my "option 3". And any of the three options nest
just as well as current generator expressions. This follows exactly the same
rules you described:
(line with open(path, 'r') as file for line in file if line
with open('filelist', 'r') as filelistfile for path in filelistfile)
I personally find option 1 more readable than option 3, but as I said, that's
just a bare intuition, and I'm not married to it at all.
> If the determination is "place it at the end, and affect the whole
> comprehension/generator expression regardless of the number of clauses"
I didn't even consider that. For the nested case, each for clause can have 0 or
1 with clauses, just as it can have 0 or 1 if clauses. I can't see how anything
else is reasonable—how else could you handle the example above without keeping
hundreds of files open unnecessarily? Of course for the non-nested case, there
is no real distinction between "at the end" and "at the end of each nesting
level"…
> you're now very close to the point of it making more sense to propose
allowing
> context managers on arbitrary expressions
No, not at all. If you read the blog post I linked, I explain this. But I'll try
to answer your points below.
> as it would be impossible to explain why this was allowed:
>
> lines = list(line for line in f with open(name) as f)
>
> But this was illegal:
>
> lines = (f.readlines() with open(name) as f)
Those aren't at all the same. Or, if they are the same, the first one doesn't
work.
In the second one, your with expression clearly modifies the expression
f.readlines(), so the file is open until f.readlines() finishes. Great.
But in the first, it clearly modifies f, so the file is open until f
finishes—that is, it gets closed immediately. There may be other things you
could attach it to, but I can't think of any way either a human or a computer
could parse it as being attached to the iteration inside the implicit generator
created by the larger expression that this expression is a part of. And you're
going to have the exact same problem with, e.g., "lines = (f with open(name) as
f)"—the only useful thing this could do is attach to something *inside the file
object*, which is ridiculous.
In fact, any statement that cannot be trivially rewritten as "with open(name) as
f: lines = f.readlines()" also cannot possibly work right using a general with
expression. Which means it's useless.
The only way you could possibly make this work is to make a context manager mean
different things in different kinds of expressions. That's a horribly bad idea.
It means you're building the with clause that I wanted, and a variety of other
with clauses, all of which look similar enough to confuse both parsers and human
beings, despite doing different things. It's like suggesting that we don't need
if clauses in generator expressions because the more general ternary if
expression already takes care of it.
So, in short, adding general with expressions not only doesn't solve my problem,
it makes my problem harder to solve. And it's a bad idea in general, because it
only works in cases where it's not needed. So, I'm very strongly against it.
> Generator expressions, like lambda expressions, are deliberately limited. If
>you
> want to avoid those limits, it's time to upgrade to a named generator or
> function. If you feel that puts things in the wrong order in your code then
> please, send me your use cases so I can considering adding them as examples in
> PEP 403 and PEP 3150.
My use case was at the top of my first email:
upperlines = (line.upper() for line in file with open(path, 'r') as file)
This is the kind of thing beginners need to write and don't know how, and
experienced developers do all the time in quick-n-dirty scripts and don't do
properly even if they do know how, because it's difficult to write today.
And I'm not sure how PEP 403 or PEP 3150 would help.
> If you really want to enhance the capabilities of
> expressions, then the more general proposal is the only one with even a remote
> chance
I don't want to do things like turn assignments into expressions, add general
with expressions, add bind expressions, etc. I want to make iterating over a
context manager easy, and the with clause is the best way I could come up with
to do it.
> and that's contingent on proving that the compiler can be set up to give
> generator expressions the semantics you propose (Off the top of my head, I
> suspect it should be possible, since the compiler will know it's in a
> comprehension or generator expression by the time it hits the with token, but
> there may be other technical limitations that ultimately rule it out).
I plan to experiment with implementing it in PyPy over the weekend, and if that
works out I'll take a look at CPython. But I don't see any reason that options 1
or 2 should be any problem; option 3 might be, but I'll see when I get there.
More information about the Python-ideas
mailing list