With clauses for generator expressions

First, I realize that people regularly propose with expressions. This is not the same thing. The problem with the with statement is not that it can't be postfixed perl-style, or used in expressions. The problem is that it can't be used with generator expressions. Here's the suggestion: upperlines = (lines.upper() for line in file with open('foo', 'r') as file) This would be equivalent to: def foo(): with open('foo', 'r') as file: for line in file: yield line.upper() upperlines = foo() The motivation is that there is no way to write this properly using a with statement and a generator expression—in fact, the only way to get this right is with the generator function above. And almost nobody ever gets it right, even when you push them in the right direction (although occasionally they write a complex class that has the same effect). That's why we still have tons of code like this lying around: upperlines = (lines.upper() for line in open('foo', 'r')) Everyone knows that this only works with CPython, and isn't even quite right there, and yet people write it anyway, because there's no good alternative. The with clause is inherently part of the generator expression, because the scope has to be dynamic. The file has to be closed when iteration finishes, not when creating the generator finishes (or when the generator is cleaned up—which is closer, but still wrong). That's why a general-purpose "with expression" wouldn't actually help here; in fact, it would just make generator expressions with with clauses harder to parse. A with expression would have to be statically scoped to be general. For more details, see this: http://stupidpythonideas.blogspot.com/2012/11/with-clauses-for-generator-exp...

On Wed, Nov 14, 2012 at 07:44:44PM -0800, Andrew Barnert wrote:
While this looks very clean, how do you propose the following should be written as a generator expression? def foo(): with open('foo') as f: for line in f: if 'bar' in line: yield line An obvious suggestion is as follows, but I'm not totally convinced about the out-of-order with, for and if clauses (compared with the equivalent generator) (line for line in f if bar in 'line' with open('foo') as f) Cheers, Phil

Exactly as you suggest (quoting you out of order to make the answer clearer):
The clauses have *always* been out of order. In the function, the "if" comes between the "for" and the yield expression. In the expression, the "for" comes in between. If the clause order implies the statement order (I would have put it in terms of the clause structure implying the scoping, but they're effectively the same idea), then our syntax has been wrong since list comprehensions were added in 2.0. So, I think (and hope!) that implication was never intended. Which means the only question is, which one looks more readable: 1. (foo(line) for line in baz(f) if 'bar' in line with open('foo') as f) 2. (foo(line) for line in baz(f) with open('foo') as f if 'bar' in line) 3. (foo(line) with open('foo') as f for line in baz(f) if 'bar' in line) Or, in the trivial case (where versions 1 and 2 are indistinguishable): 1. (line for line in f with open('foo') as f) 2. (line for line in f with open('foo') as f) 3. (line with open('foo') as f for line in f) My own intuition is that 1 is the clearest, and 3 by far the worst. So, that's why I proposed order 1. But I'm not at all married to it.

On Thu, Nov 15, 2012 at 03:11:07AM -0800, Andrew Barnert wrote:
I was mostly playing devil's advocate :) In my experience, the ordering of comprehension clauses is already a source of confusion for those new to the language. So, if it's not obvious where the "if" should come it may well make matters worse in this regard (but I wouldn't say that this is enough to kill the proposal).
To me, 1 feels like it captures the semantics the best - the "with" clause is tacked onto the generator expression "(foo(line) ... for ... if)" and applies to the whole of that expression. Cheers, Phil

Phil Connell wrote:
But that would break the existing rule that binding clauses in a comprehension have to precede the expressions that use the bound variable. If you're allowed to write (foo(line) for line in baz(f) with open('foo') as f) then it's not obvious why you can't write (foo(line) if 'bar' in line for line in lines) Are you suggesting that the latter should be allowed as well? -- Greg

From: Serhiy Storchaka <storchaka@gmail.com> Sent: Thu, November 15, 2012 4:17:42 AM
Mechanically transforming that is easy. You just insert each with along with its corresponding for and if. There are no ambiguities for any of the three potential rules: 1. (y for x in f if p(x) with a() as f for y in g if q(y) with b(x) as g) 2. (y for x in f with a() as f if p(x) for y in g with b(x) as g if q(y)) 3. (y with a() as f for x in f if p(x) with b(x) as g for y in g if q(y)) I suppose you could also argue for a "super-1" where we stick all the withs at the end, or a "super-3" where we stick them all at the beginning… but it's hard to see a compelling argument for that. In fact, despite everything I said about clause structure not implying nesting, either one of those would look to me as if all the withs were at the outermost scope. At any rate, unlike the simpler cases, here I have no opinion on which of those is clearest. They're all impossible to read at a glance (although breaking them up into multiple lines helps, I still don't have any clue what even the original function means—all those one-letter variables and functions, with easily-confused letters to boot…). But they're all quite easy to work out, or to construct, if you understand nested generator expressions and know the rule for where each clause goes.

If you can only have one with per for, this doesn't have a direct translation. However, if you want to extend it to have any number of withs per for, that seems to rule out option 2, and maybe option 1, but seems fine with option 3: (x with a() as f for x in f with b() as g if p(x) with c() as h) The fact that option 3 can obviously do something which seems impossible in option 2, and which I can't work out in a few seconds off the top of my head with option 1, may be a more compelling argument than the fact that option 1 instinctively looked cleaner to me (and the one other person who commented on the three choices).

On 15.11.12 17:32, Andrew Barnert wrote:
If you can only have one with per for, this doesn't have a direct translation.
Even with one "with" per "for" an ambiguity is possible for some options. with/for/if/yield, for/with/if/yield, for/if/with/yield, for/if/with/if/yield,... should have different transcription.
Yes, that is what I wanted to show. Even if for you, the author of the proposal, the most consistent option is the least obvious, then for others it will always lead to confusion.

On Thu, Nov 15, 2012 at 9:11 PM, Andrew Barnert <abarnert@yahoo.com> wrote:
One, and only one, clause in a comprehension or generator expression is written out of sequence: the innermost clause is lifted out and written first. The rest of the expression is just written in the normal statement order with the colons and newlines omitted. This is why you can chain comprehensions to arbitrary depths without any ambiguity from the compiler's point of view:
(Note: even though you *can* chain the clauses like this, please don't, as it's thoroughly unreadable for humans, even though it makes sense to the compiler) So *if* a context management clause was added to comprehension syntax, it couldn't reasonably be added using the same design as was used to determine the placement of the current iteration and filtering clauses. If the determination is "place it at the end, and affect the whole comprehension/generator expression regardless of the number of clauses", then you're now very close to the point of it making more sense to propose allowing context managers on arbitrary expressions, as it would be impossible to explain why this was allowed: lines = list(line for line in f with open(name) as f) But this was illegal: lines = (f.readlines() with open(name) as f) And if arbitrary subexpressions are allowed, *then* you're immediately going to have people wanting a "bind" builtin: class bind: def __init__(self, value): self.value = value def __enter__(self): return self.value def __exit__(self, *args): pass if (m is None with bind(pattern.match(data) as m): raise ValueError("{} does not match {}".format(data, pattern)) # Do things with m... This is not an endorsement of the above concepts, just making it clear that I don't believe that attempting to restrict this to generator expressions is a viable proposal, as the restriction is far too arbitrary (from a user perspective) to form part of a coherent language design. Generator expressions, like lambda expressions, are deliberately limited. If you want to avoid those limits, it's time to upgrade to a named generator or function. If you feel that puts things in the wrong order in your code then please, send me your use cases so I can considering adding them as examples in PEP 403 and PEP 3150. If you really want to enhance the capabilities of expressions, then the more general proposal is the only one with even a remote chance, and that's contingent on proving that the compiler can be set up to give generator expressions the semantics you propose (Off the top of my head, I suspect it should be possible, since the compiler will know it's in a comprehension or generator expression by the time it hits the with token, but there may be other technical limitations that ultimately rule it out). Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia

From: Nick Coghlan <ncoghlan@gmail.com> Sent: Thu, November 15, 2012 4:39:42 AM
On Thu, Nov 15, 2012 at 9:11 PM, Andrew Barnert <abarnert@yahoo.com> wrote:
One, and only one, clause in a comprehension or generator expression is written
out of sequence: the innermost clause is lifted out and written first.
Given that there are only three clauses, "flatten in order, then move expression to front" and "flatten in reverse order, then move if clause to back" are identical. I suppose you're right that, given that the rule for nested expressions is to preserve the order of nesting, the first description is more natural. But at any rate, I don't think any such rule is what most Python programmers have internalized. People obviously know how to nest clauses in general (we couldn't speak human languages otherwise), but they do not know how to write, or even read, nested comprehensions. What they know is that there are three clauses, and they go expression-for-if, period. And those who do learn about nesting seem to guess the order wrong at least half the time (hence all the StackOverflow posts on "why does [x for x in range(y) for y in range(5)] give me a NameError?").
Sure it could. If you want to flatten in order, then lift the innermost expression to the left, exactly my "option 3". And any of the three options nest just as well as current generator expressions. This follows exactly the same rules you described: (line with open(path, 'r') as file for line in file if line with open('filelist', 'r') as filelistfile for path in filelistfile) I personally find option 1 more readable than option 3, but as I said, that's just a bare intuition, and I'm not married to it at all.
If the determination is "place it at the end, and affect the whole comprehension/generator expression regardless of the number of clauses"
I didn't even consider that. For the nested case, each for clause can have 0 or 1 with clauses, just as it can have 0 or 1 if clauses. I can't see how anything else is reasonable—how else could you handle the example above without keeping hundreds of files open unnecessarily? Of course for the non-nested case, there is no real distinction between "at the end" and "at the end of each nesting level"…
No, not at all. If you read the blog post I linked, I explain this. But I'll try to answer your points below.
Those aren't at all the same. Or, if they are the same, the first one doesn't work. In the second one, your with expression clearly modifies the expression f.readlines(), so the file is open until f.readlines() finishes. Great. But in the first, it clearly modifies f, so the file is open until f finishes—that is, it gets closed immediately. There may be other things you could attach it to, but I can't think of any way either a human or a computer could parse it as being attached to the iteration inside the implicit generator created by the larger expression that this expression is a part of. And you're going to have the exact same problem with, e.g., "lines = (f with open(name) as f)"—the only useful thing this could do is attach to something *inside the file object*, which is ridiculous. In fact, any statement that cannot be trivially rewritten as "with open(name) as f: lines = f.readlines()" also cannot possibly work right using a general with expression. Which means it's useless. The only way you could possibly make this work is to make a context manager mean different things in different kinds of expressions. That's a horribly bad idea. It means you're building the with clause that I wanted, and a variety of other with clauses, all of which look similar enough to confuse both parsers and human beings, despite doing different things. It's like suggesting that we don't need if clauses in generator expressions because the more general ternary if expression already takes care of it. So, in short, adding general with expressions not only doesn't solve my problem, it makes my problem harder to solve. And it's a bad idea in general, because it only works in cases where it's not needed. So, I'm very strongly against it.
PEP 403 and PEP 3150.
My use case was at the top of my first email: upperlines = (line.upper() for line in file with open(path, 'r') as file) This is the kind of thing beginners need to write and don't know how, and experienced developers do all the time in quick-n-dirty scripts and don't do properly even if they do know how, because it's difficult to write today. And I'm not sure how PEP 403 or PEP 3150 would help.
If you really want to enhance the capabilities of expressions, then the more general proposal is the only one with even a remote
chance
I don't want to do things like turn assignments into expressions, add general with expressions, add bind expressions, etc. I want to make iterating over a context manager easy, and the with clause is the best way I could come up with to do it.
there may be other technical limitations that ultimately rule it out).
I plan to experiment with implementing it in PyPy over the weekend, and if that works out I'll take a look at CPython. But I don't see any reason that options 1 or 2 should be any problem; option 3 might be, but I'll see when I get there.

On 11/15/2012 10:25 AM, Andrew Barnert wrote:
This is how list comps were designed and initially defined.
Given that there are only three clauses, "flatten in order, then move expression to front"
This is the simple and correct rule.
and "flatten in reverse order, then move if clause to back"
This is more complicated and wrong.
are identical.
Until one adds more clauses.
It *is* the rule, and a very simple one. The reference manual gives it, though it could perhaps be clearer. The tutorial List Comprehension section does give a clear example: ''' For example, this listcomp combines the elements of two lists if they are not equal:
Anyone who read and understood that snippet in the tutorial, which took me a minute to find, would not ask such a question. There are people who program Python without ever reading the manuals and guess as they go and, when they stumble, prefer to post questions on forums and wait for a customized answer rather than dig it out themselves. -- Terry Jan Reedy

On Fri, Nov 16, 2012 at 1:25 AM, Andrew Barnert <abarnert@yahoo.com> wrote:
And it's *awful*. If the only thing on the RHS of a simple assignment statement is a lambda or generator expression, that code should almost always be rewritten with def as a matter of style, regardless of other considerations. However, I realised there's a more serious problem with your idea: the outermost clause in a list comprehension or generator expression is evaluated immediately and passed as an argument to the inner scope that implements the loop, so you have an unresolved sequencing problem between the evaluation of that argument and the evaluation of the context manager. If you want the context manager inside the generator, you *can't* reference the name bound in the as clause in the outermost iterable. Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia

On Fri, Nov 16, 2012 at 10:46 AM, Nick Coghlan <ncoghlan@gmail.com> wrote:
(Andrew's reply here dropped the list from the cc, but I figure my subsequent clarification is worth sharing more widely) When you write a genexp like this: gen = (x for x in get_seq()) The expansion is *NOT* this: def _g(): for x in get_seq(): yield x gen = _g() Instead, it is actually: def _g(iterable): for x in iterable: yield x gen = _g(get_seq()) That is, the outermost iterable is evaluated in the *current* scope, not inside the generator. Thus, the entire proposal is rendered incoherent, as there is no way for the context manager expression to be executed both *before* the outermost iterable expression and *inside* the generator function, since the generator doesn't get called until *after* the outermost iterable expression has already been evaluated. (And, to stave of the obvious question, no this order of evaluation is *not* negotiable, as changing it would be a huge backwards compatibility breach, as well as leading to a lot more obscure errors with generator expressions) The reason PEP 403 is potentially relevant is because it lets you write a one-shot generator function using the long form and still make it clear that it *is* a one shot operation that creates the generator-iterator directly, without exposing the generator function itself: @in gen = g() def g(): for x in get_seq(): yield x Or, going back to the use case in the original post: @in upperlines = f() def f(): with open('foo', 'r') as file: for line in file: yield line.upper() Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia

Ah, you're right. The only way this would work is if the with clause were second or later, which would be very uncommon. And the fact that it doesn't work in the most common case means that, even if it were occasionally useful, it would cause a lot more confusion than benefit. So, never mind... Sent from my iPhone On Nov 16, 2012, at 7:53, Nick Coghlan <ncoghlan@gmail.com> wrote:

Nick Coghlan wrote:
That is, the outermost iterable is evaluated in the *current* scope, not inside the generator.
I've always felt it was a bad idea to bake this kludge into the language. It sweeps a certain class of problems under the rug, but only in *some* cases. For example, in ((x, y) for x in foo for y in blarg) rebinding of foo is guarded against, but not blarg. And if that's not arbitrary enough, in the otherwise completely equivalent ((x, y) for y in blarg for x in foo) it's the other way around. Anyhow, it wouldn't be *impossible* to incorporate a with-clause into this scheme. Given (upper(line) with open(name) as f for line in f) you either pick open(name) to be the pre-evaluated expression, or not do any pre-evaluation at all in that case. Either way, it can't break any *existing* code, because nobody is writing genexps containing with-clauses yet. -- Greg

On Fri, Nov 16, 2012 at 3:28 PM, Greg Ewing <greg.ewing@canterbury.ac.nz> wrote:
I wouldn't call it arbitrary -- the second and following clauses *must* be re-evaluated because they may reference the loop variable of the first. And the two versions you show aren't equivalent unless iterating over blarg and foo is completely side-effect-free.
And nobody ever will. It's too ugly. -- --Guido van Rossum (python.org/~guido)

On 11/15/2012 6:11 AM, Andrew Barnert wrote:
From: Phil Connell <pconnell@gmail.com>
The simple rule for comprehensions is that the append (l.c.) or yield (g.e) is moved from last to first and the other statements/clauses are left in the same order.
Which means that 3 is the proper one. In particular, if with clauses were added, f must be defined in the with clause before used in the for clause, just as line must be defined in the for clause before used in the if clause. -- Terry Jan Reedy

So far, nearly everyone is discussing things which are tangential, or arguing that one of the optional variants is bad. So let me strip down the proposal, without any options in it, and expand on a use case. The syntax is: (foo(line) with open('bar') as f for line in baz(f)) This translates to calling this function: def gen(): with open('bar') as f: for line in baz(f): yield foo(line) The translation for with clauses is identical to for and if clauses, and nesting works in the obvious way. So, why do I want to create a generator that wraps a file or other generator inside a with clause? There are a wide range of modules that have functions that can take a generator of strings in place of a file. Some examples off the top of my head include numpy.loadtxt, poster.multipart_encode, and line_protocol.connection.send. Many of these are asynchronous, so I can't just wrap the call in a with statement; I have to send a generator that will close the wrapped file (or other generator) when it's exhausted or closed, instead of when the function returns. So, imagine a simple "get" command in a mail server, a method in the Connection class: def handle_get(self, message_id): path = os.path.join(mailbox_path, message_id) self.send_async(open(path, 'r')) Now, let's say I want to do some kind of processing on the file as I send it (e.g., remove excessive curse words, or add new ones in if there aren't enough in any line): def handle_get(self, message_id): path = os.path.join(mailbox_path, message_id) def censored_file(): with open(path, 'r') as file: for line in file: yield self.censor(line) self.send_async(censored_file()) With my suggested idea, the last 5 lines could be replaced by this: self.send_async(self.censor(line) with open(path, 'r') as file for line in file) Of course this async_chat-style model isn't the only way to write a server, but it is a common way to write a server, and I don't think it should be complicated. ----- Original Message ----

On 16.11.12 11:09, Andrew Barnert wrote:
self.send_async(self.censor(line) for line in open(path, 'r')) or self.send_async(map(self.censor, open(path, 'r'))) This is *not worse* than your first example self.send_async(open(path, 'r')) How do you write a managed uncensored variant? You can use the wrapper suggested by Mathias Panzenböck. self.send_async(managed(open(path, 'r'))) self.send_async(self.censor(line) for line in managed(open(path, 'r'))) It is easy, clear, universal and requires no changes to syntax.

On 11/16/2012 4:09 AM, Andrew Barnert wrote:
OK, that's helpful. Now let me strip down my objection to this: your proposal is conceptually wrong because it mixes two distinct and different ideas -- collection definition and context management. It conflicts with a well-defined notion of long standing. To explain: in math, one can define a set explicitly by displaying the members or implicitly as a subset of based on one or more base sets. Using one version of the notation {0, 2, 4} == {2*i| i in N; i < 3} The latter is 'set-builder notation' or a 'set comprehension' (and would usually use the epsilon-like member symbol instead of 'in'). The idea goes back at least a century. https://en.wikipedia.org/wiki/Set-builder_notation In Python, the latter directly translates to {2*i for i in itertools.count() if i < 3} == {i for i in range(0, 5, 2)} (Python does not require the base collection to match the result class.) Another pair of examples: {(i,j)| i in N, j in N; i+j <= 5} {(i,j) for i in count() for j in count if i+j <= 5} Similar usage in programming go back over half a century. https://en.wikipedia.org/wiki/List_comprehension While notation in both math and CS varies, the components are always input source collection variables, conditions or predicates, and an output expression. The Python reference manual documents comprehensions as an alternate atomic display form. In Chapter 6, Expressions, Section 2, Atoms, "For constructing a list, a set or a dictionary Python provides special syntax called “displays”, each of them in two flavors: either the container contents are listed explicitly, or they are computed via a set of looping and filtering instructions, called a comprehension. ... list_display ::= "[" [expression_list | comprehension] "]" <etc>" A generator expression similarly represents an untyped abstract sequence, rather than a concrete class. --- In summary: A context-manager, as an object with __enter__ and __exit__ methods, is not a proper component of a comprehension. For instance, replace "open('xxx')" in your proposal with a lock creation function. On the other hand, an iterable managed resource, as suggested by Mathias Panzenböck, works fine as a source. So it does work (as you noticed also). -- Terry Jan Reedy

On 17 November 2012 00:00, Terry Reedy <tjreedy@udel.edu> wrote:
I don't follow how you made these two leaps: * It doesn't apply to set comprehensions in *math* -> it doesn't apply to set comprehensions in *Python* * it doesn't apply to *set* comprehensions in Python -> it doesn't apply to *any* comprehensions in Python

On 11/17/2012 3:11 PM, Joshua Landau wrote:
On 17 November 2012 00:00, Terry Reedy
Since the OP withdrew his suggestion, its a moot point. However, I talked about the general, coherent concept of comprehensions, as used in both math and CS, as an alternative to explicit listing. Do look at the references, including the Python manual. It presents the general idea and implementation first and then the four specific versions. I only used sets for an example. -- Terry Jan Reedy

From: Terry Reedy <tjreedy@udel.edu> Sent: Sun, November 18, 2012 11:56:04 AM
I agree that it is a moot point. The idea would require a larger semantic change than I initially anticipated, and I disagree with Greg Ewing that the immediate evaluation of the outer source is a kluge that should be abandoned, so I've withdrawn it. (Of course if Greg Ewing or Joshua Landau or anyone else wants to pick up the idea, I apologize for presuming, but I no longer think it's a good idea.) That's why I ignored the point about set builder notation. But if you want to continue to argue it:
Nested comprehensions already break the analogy with set builder notation. For one thing, nobody would define the rationals as {i/j | j in Z: j != 0 | i in Z}. People would probably figure out what you meant, but you wouldn't write it that way. Nested comprehensions (even more so when one is dependent on the other) make it blatant that a comprehension is actually an iterative sequence builder, not a declarative set builder. The analogy is a loose one, and it already leaks. It really only holds when you've got a single, well-ordered, finite source. It's obvious that (i/j for j in itertools.count(2) for i in range(1, j)) generates the rationals in (0, 1), in a specific order (with repeats), but you wouldn't write anything remotely similar in set builder notation. In fact, you'd probably define that set just as {q | i, j in N+: qj=i, q<1}, and you can't translate that to Python at all.

On 15/11/12 22:11, Andrew Barnert wrote:
Is that a trick question? Answer: None of them. In my opinion, they are all too busy for a generator expression and should be re-written as a generator function. As far as the given use-case is concerned: upperlines = (line.upper() for line in open('foo')) I don't see what the concern is. The file will remain open so long as the generator is not exhausted, but that has to be the case no matter what you do. If the generator is thrown away before being exhausted, the file will eventually be closed by the garbage collector, if only when the application or script exits. For short-lived scripts, the temporarily leakage of a file handle or two is hardly likely to be a serious problem. Presumably if you have a long-lived application with many such opened files, you might risk running out of file handles when running under Jython or IronPython. But I think that's a sufficiently unusual and advanced use- case that I'm not worried that this is a problem that needs solving with syntax instead of education. -- Steven

From: Steven D'Aprano <steve@pearwood.info> Sent: Thu, November 15, 2012 10:05:36 PM
This seems to be an argument against with statements, or any other kind of resource management at all besides "trust the GC". I'm pretty sure PEP 310, PEP 340, PEP 343, and the discussion around them already had plenty of counter-arguments, but here's a couple quick ones: If you've opened a file for exclusive access (the default on Windows), you can't safely open it again if you can't predict when it will be closed. If the context in question is a mutex lock rather than a file open, you can't safely lock it again if you can't predict when it will be released (and, even if you never want to lock it again, you could end up deadlocked against another thread that does).

On 16/11/12 20:26, Andrew Barnert wrote:
Certainly not. I'm saying that for many applications, explicit resource management is not critical -- letting the GC close the file (or whatever resource you're working with) -- is a perfectly adequate strategy. The mere existence of "faulty" gen expressions like the above example is not necessarily a problem. Think of it this way: you can optimize code for speed, for memory, and for resource usage. (Memory of course being a special case of resource usage.) You're worried about making it easy to micro-optimize generator expressions for resource usage. I'm saying that's usually premature optimization. It's not worth new syntax complicating generator expressions to optimize the closing of a few files. If your application is not one of those applications where a laissez-faire approach to resource management is acceptable, that's fine. I'm not saying that nobody needs care about resource management! If you need to care about your resources with more attention than benign neglect, then do so. The only limitation here is that you can't use a context manager in a list comprehension or generator expression. I don't care about that. Not every problem that requires a function needs to be solvable with lambda, and not every problem that requires a generator needs to be solvable with a generator expression. The beauty of generator expressions is that they are deliberately lean. The bar to fatten them up with more syntax is quite high, and I don't think you have come even close to getting over it. -- Steven

From: Steven D'Aprano <steve@pearwood.info> Sent: Fri, November 16, 2012 1:53:42 AM
It's not a micro-optimization, or an optimization at all. It has nothing to do with performance, and everything to do with making your code work at all. (Or, in some cases, making it robust—your code may work 99% of the time, or work with CPython or POSIX but not PyPy or Windows.) For example, see Google's Python Style Guide at http://google-styleguide.googlecode.com/svn/trunk/pyguide.html#Files_and_Soc... for why they recommend always closing files.
The only limitation here is that you can't use a context manager in a list comprehension or generator expression.
Yes, that's exactly the limitation (but only in generator expressions—in list comprehensions, it can't ever matter).
This is one of those cases where it won't hurt you when you don't use it. You don't have to put if clauses into generator expressions, or nest multiple loops—and very often you don't, in which case they don't get in the way, and your expression is concise and simple. Similarly, you won't have to put with clauses into generator expressions, and very often you won't, in which case they won't get in the way. And I don't think anyone would have trouble learning or understanding it. The expression still maps to a generator function that's just a simple tree of one-line nested statements with a yield statement at the bottom, the only difference is that instead of the two most common kinds of statements in such functions, you can now use the three most common.

But a piece of code that everyone needs on a regular basis should be writable, and readable, by a novice Python user. I don't care whether it's one line or four, but I do care that a task that seems to require nothing that you don't learn in your first week with the language is beyond the ability of not just novices, but people who post modules on PyPI, write answers on StackOverflow, etc.
Use a generator function.
Of course the right answer is obvious to you and me, because we understand the difference between static and dynamic scopes, and that a generator defines a dynamic scope, and what context managers actually do, and how to translate a generator expression into a generator function. It's not that the generator function is hard to write; it's that people who don't understand how all this stuff works won't even think of the idea that an explicit generator function would help them here.

On 2012-11-15, at 04:44 , Andrew Barnert wrote:
Actually, it's extremely debatable that the generator function is correct: if the generator is not fully consumed (terminating iteration on the file) I'm pretty sure the file will *not* get closed save by the GC doing a pass on all dead objects maybe. This means this function is *not safe* as a lazy source to an arbitrary client, as that client may very well use itertools.slice or itertools.takewhile and only partially consume the generator. Here's an example: -- import itertools class Manager(object): def __enter__(self): return self def __exit__(self, *args): print("Exited") def __iter__(self): for i in range(5): yield i def foo(): with Manager() as ms: for m in ms: yield m def bar(): print("1") f = foo() print("2") # Only consume part of the iterable list(itertools.islice(f, None, 2)) print("3") bar() print("4") -- CPython output, I'm impressed that the refcounting GC actually bothers unwinding the stack and running the __exit__ handler *once bar has finished executing*:
But here's the (just as correct, as far as I can tell) output from pypy:
If the program was long running, it is possible that pypy would run __exit__ when the containing generator is released (though by no means certain, I don't know if this is specified at all). This is in fact one of the huge issues with faking dynamic scopes via threadlocals and context managers (as e.g. Flask might do, I'm not sure what actual strategy it uses), they interact rather weirdly with generators (it's also why I think Python should support actually dynamically scoped variables, it would also fix the thread-broken behavior of e.g. warnings.catch_warnings)

From: Masklinn <masklinn@masklinn.net> Sent: Thu, November 15, 2012 1:29:46 AM
Well, yes, *no possible object* is safe as a lazy source to an arbitrary client that might not fully consume, close, or destroy it. By definition, the object must stay alive as long as an arbitrary client might use it, so a client that never finishes using it means the object must stay alive forever. And, similarly, in the case of a client that does finish using it, but the only way to detect that is by GCing the client, the object must stay alive until the GC collects the client. So, the correct thing for the generator function to do in that case is… exactly what it does. Of course in that case, it would arguably be just as correct to just do "ms = Manager()" or "file = open('foo', 'r')" instead of "with Manager() as ms:" or "with open('foo', 'r') as file:". The difference is that, in cases where the client does fully consume, close, or destroy the iterator deterministically, the with version will still do the right thing, while the leaky version will not. You can test this very easily by adding an "f.close()" to the end of bar, or changing "f = foo()" to "with closing(foo()) as f:", and compare the two versions of the generator function. Put another way, if your point is an argument against with clauses, it's also an argument against with statements, and manual resource cleanup, and in fact anything but a magical GC.
This is an almost-unrelated side issue. A generator used in a single thread defines a fully deterministic dynamic scope, one that can and often should be used for cleanup. The fact that sometimes it's not the right scope for some cleanups, or that you can use them in multithreaded programs in a way that makes them indeterministic, isn't an argument that it should be hard to use them for cleanup when appropriate, is it?

On 2012-11-15, at 11:08 , Andrew Barnert wrote:
This is an almost-unrelated side issue. A generator used in a single thread defines a fully deterministic dynamic scope
I think you meant "a context manager" not "a generator", and my example quite clearly demonstrates that the interaction between context managers and generators completely break context managers as dynamic scopes.
Using context managers on threadlocals means the context manager itself is in a single-threaded environment, the multithreading is not the issue, the interaction between context managers and generators is.
isn't an argument that it should be hard to use them for cleanup when appropriate, is it?
I never wrote that, I only noted that your assertion about the function you posted (namely that it is "properly written") is dubious and risky.

defines a fully deterministic dynamic scope
I think you meant "a context manager" not "a generator"
No, I meant a generator. "As long as the generator has values to generate, and has not been closed or destroyed" is a dynamic scope. "Until the end of this with statement block" is a static scope. The only reason the context managers in both your example and mine have dynamic scope is because they're embedded in generators.
No it doesn't. It demonstrates that it's possible to create indeterminate scopes, and context managers cannot help you if you do so. "Until the client exhausts the iterator, given that the client is not going to exhaust the iterator" effectively means "Until the client goes away". Which means you need a context manager around the client. The fact that you don't have one means that your client is inherently broken. You'll have the exact same problems with a trivial local object (e.g., its __del__ method won't get called by PyPy). However, if the client *did* have a context manager (or exhausted, closed, or explicitly deleted the generator), a properly-written generator would clean itself up, while a naively-written one would not. That's what I meant by "properly-written". Not that it's guaranteed to clean up even when used by a broken client, because that is completely impossible for any object (generator or otherwise), but that it is guaranteed to clean up when used by a properly-written client.

On 2012-11-15, at 12:37 , Andrew Barnert wrote:
It isn't a dynamic scope in the sense of "dynamic scoping" which is the one I used it in, and the one usually understood when talking about dynamic scopes, which is a function of the stack context in which the code executes not the lifecycle of an object.
"Until the end of this with statement block" is a static scope.
Not from the POV of callees within the stack of which the with block is part, which again is the standard interpretation for "dynamic scopes".
There is nothing indeterminate about the scopes in a classical and usual sense, neither the dynamic scope nor the lexical scope. And languages with proper dynamic scoping support have no issue with this kind of constructs. Neither does Python when walking through the whole stack, naturally.

On Thu, Nov 15, 2012 at 8:12 PM, Mathias Panzenböck <grosser.meister.morti@gmx.net> wrote:
I think this syntax would still make sense for list comprehensions:
upperlines = [lines.upper() for line in file with open('foo', 'r') as file]
-1000. There is no discernible advantage over with open(...) as file: upperlines = [lines.upper() for line in file] Also you've got the order backwards -- when there's a sequence of 'for' and 'if' clauses in a comprehension, they are to be read from left to right, but here you're tacking something onto the end that's supposed to go first. Please don't destroy my beautiful language. --Guido
-- --Guido van Rossum (python.org/~guido)

I'm pretty sure both my original message and the blog post linked from there explained why this is not particularly useful for list comprehensions. (If you're guaranteed to exhaust the iteration in the current block—which you obviously always are for comprehensions—just make the with a statement with its own block.) The only reason I suggested it for comprehensions as well as generator expressions is that someone convinced me that it would be slightly easier to implement, and to teach to users, than if it were only available for generator expressions. From: Mathias Panzenböck <grosser.meister.morti@gmx.net> Sent: Thu, November 15, 2012 8:39:34 PM
file]

Just throwing random syntax variations on the wall to see what/if anything sticks (because I think the "as file"-assignment serves no purpose here): upperlines = (lines.upper() for line in with open('foo', 'r')) upperlines = (lines.upper() for line with open('foo', 'r')) upperlines = (lines.upper() with for line in open('foo', 'r')) Or should the for loop check if there are __enter__ and __exit__ methods and call them? Guess not, but I thought I just mention it as an alternative. For now one can do this, which is functional equivalent but adds the overhead of another generator: def managed(sequence): with sequence: for item in sequence: yield item upperlines = (lines.upper() for line in managed(open('foo', 'r'))) You could even call this helper function "with_", if you like. Or write a helper like this: def iterlines(filename,*args,**kwargs): with open(filename,*args,**kwargs) as f: for line in f: yield line upperlines = (lines.upper() for line in iterlines('foo', 'r')) Maybe there should be a way to let a file be automatically closed when EOF is encountered? Maybe an "autoclose" wrapper object that passes through every method call to the file object but when EOF is encountered during a read it closes the file object? Then one could write: upperlines = (lines.upper() for line in autoclose(open('foo', 'r'))) On 11/15/2012 04:44 AM, Andrew Barnert wrote:

I missed this the first time through among all the other alternative suggestions: Sent from my iPhone On Nov 15, 2012, at 20:33, Mathias Panzenböck <grosser.meister.morti@gmx.net>
I think this ought to be in itertools in the standard library. I don't think the extra overhead will be a problem most of the time. It solves at least the simplest cases for when a with clause would be useful, and it's even a better solution for some cases where you'd write a with statement today. In some cases you'd have to write things like managed(closing(foo)), but in those cases you probably wouldn't have wanted the with clause, either.

On Wed, Nov 14, 2012 at 07:44:44PM -0800, Andrew Barnert wrote:
While this looks very clean, how do you propose the following should be written as a generator expression? def foo(): with open('foo') as f: for line in f: if 'bar' in line: yield line An obvious suggestion is as follows, but I'm not totally convinced about the out-of-order with, for and if clauses (compared with the equivalent generator) (line for line in f if bar in 'line' with open('foo') as f) Cheers, Phil

Exactly as you suggest (quoting you out of order to make the answer clearer):
The clauses have *always* been out of order. In the function, the "if" comes between the "for" and the yield expression. In the expression, the "for" comes in between. If the clause order implies the statement order (I would have put it in terms of the clause structure implying the scoping, but they're effectively the same idea), then our syntax has been wrong since list comprehensions were added in 2.0. So, I think (and hope!) that implication was never intended. Which means the only question is, which one looks more readable: 1. (foo(line) for line in baz(f) if 'bar' in line with open('foo') as f) 2. (foo(line) for line in baz(f) with open('foo') as f if 'bar' in line) 3. (foo(line) with open('foo') as f for line in baz(f) if 'bar' in line) Or, in the trivial case (where versions 1 and 2 are indistinguishable): 1. (line for line in f with open('foo') as f) 2. (line for line in f with open('foo') as f) 3. (line with open('foo') as f for line in f) My own intuition is that 1 is the clearest, and 3 by far the worst. So, that's why I proposed order 1. But I'm not at all married to it.

On Thu, Nov 15, 2012 at 03:11:07AM -0800, Andrew Barnert wrote:
I was mostly playing devil's advocate :) In my experience, the ordering of comprehension clauses is already a source of confusion for those new to the language. So, if it's not obvious where the "if" should come it may well make matters worse in this regard (but I wouldn't say that this is enough to kill the proposal).
To me, 1 feels like it captures the semantics the best - the "with" clause is tacked onto the generator expression "(foo(line) ... for ... if)" and applies to the whole of that expression. Cheers, Phil

Phil Connell wrote:
But that would break the existing rule that binding clauses in a comprehension have to precede the expressions that use the bound variable. If you're allowed to write (foo(line) for line in baz(f) with open('foo') as f) then it's not obvious why you can't write (foo(line) if 'bar' in line for line in lines) Are you suggesting that the latter should be allowed as well? -- Greg

From: Serhiy Storchaka <storchaka@gmail.com> Sent: Thu, November 15, 2012 4:17:42 AM
Mechanically transforming that is easy. You just insert each with along with its corresponding for and if. There are no ambiguities for any of the three potential rules: 1. (y for x in f if p(x) with a() as f for y in g if q(y) with b(x) as g) 2. (y for x in f with a() as f if p(x) for y in g with b(x) as g if q(y)) 3. (y with a() as f for x in f if p(x) with b(x) as g for y in g if q(y)) I suppose you could also argue for a "super-1" where we stick all the withs at the end, or a "super-3" where we stick them all at the beginning… but it's hard to see a compelling argument for that. In fact, despite everything I said about clause structure not implying nesting, either one of those would look to me as if all the withs were at the outermost scope. At any rate, unlike the simpler cases, here I have no opinion on which of those is clearest. They're all impossible to read at a glance (although breaking them up into multiple lines helps, I still don't have any clue what even the original function means—all those one-letter variables and functions, with easily-confused letters to boot…). But they're all quite easy to work out, or to construct, if you understand nested generator expressions and know the rule for where each clause goes.

If you can only have one with per for, this doesn't have a direct translation. However, if you want to extend it to have any number of withs per for, that seems to rule out option 2, and maybe option 1, but seems fine with option 3: (x with a() as f for x in f with b() as g if p(x) with c() as h) The fact that option 3 can obviously do something which seems impossible in option 2, and which I can't work out in a few seconds off the top of my head with option 1, may be a more compelling argument than the fact that option 1 instinctively looked cleaner to me (and the one other person who commented on the three choices).

On 15.11.12 17:32, Andrew Barnert wrote:
If you can only have one with per for, this doesn't have a direct translation.
Even with one "with" per "for" an ambiguity is possible for some options. with/for/if/yield, for/with/if/yield, for/if/with/yield, for/if/with/if/yield,... should have different transcription.
Yes, that is what I wanted to show. Even if for you, the author of the proposal, the most consistent option is the least obvious, then for others it will always lead to confusion.

On Thu, Nov 15, 2012 at 9:11 PM, Andrew Barnert <abarnert@yahoo.com> wrote:
One, and only one, clause in a comprehension or generator expression is written out of sequence: the innermost clause is lifted out and written first. The rest of the expression is just written in the normal statement order with the colons and newlines omitted. This is why you can chain comprehensions to arbitrary depths without any ambiguity from the compiler's point of view:
(Note: even though you *can* chain the clauses like this, please don't, as it's thoroughly unreadable for humans, even though it makes sense to the compiler) So *if* a context management clause was added to comprehension syntax, it couldn't reasonably be added using the same design as was used to determine the placement of the current iteration and filtering clauses. If the determination is "place it at the end, and affect the whole comprehension/generator expression regardless of the number of clauses", then you're now very close to the point of it making more sense to propose allowing context managers on arbitrary expressions, as it would be impossible to explain why this was allowed: lines = list(line for line in f with open(name) as f) But this was illegal: lines = (f.readlines() with open(name) as f) And if arbitrary subexpressions are allowed, *then* you're immediately going to have people wanting a "bind" builtin: class bind: def __init__(self, value): self.value = value def __enter__(self): return self.value def __exit__(self, *args): pass if (m is None with bind(pattern.match(data) as m): raise ValueError("{} does not match {}".format(data, pattern)) # Do things with m... This is not an endorsement of the above concepts, just making it clear that I don't believe that attempting to restrict this to generator expressions is a viable proposal, as the restriction is far too arbitrary (from a user perspective) to form part of a coherent language design. Generator expressions, like lambda expressions, are deliberately limited. If you want to avoid those limits, it's time to upgrade to a named generator or function. If you feel that puts things in the wrong order in your code then please, send me your use cases so I can considering adding them as examples in PEP 403 and PEP 3150. If you really want to enhance the capabilities of expressions, then the more general proposal is the only one with even a remote chance, and that's contingent on proving that the compiler can be set up to give generator expressions the semantics you propose (Off the top of my head, I suspect it should be possible, since the compiler will know it's in a comprehension or generator expression by the time it hits the with token, but there may be other technical limitations that ultimately rule it out). Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia

From: Nick Coghlan <ncoghlan@gmail.com> Sent: Thu, November 15, 2012 4:39:42 AM
On Thu, Nov 15, 2012 at 9:11 PM, Andrew Barnert <abarnert@yahoo.com> wrote:
One, and only one, clause in a comprehension or generator expression is written
out of sequence: the innermost clause is lifted out and written first.
Given that there are only three clauses, "flatten in order, then move expression to front" and "flatten in reverse order, then move if clause to back" are identical. I suppose you're right that, given that the rule for nested expressions is to preserve the order of nesting, the first description is more natural. But at any rate, I don't think any such rule is what most Python programmers have internalized. People obviously know how to nest clauses in general (we couldn't speak human languages otherwise), but they do not know how to write, or even read, nested comprehensions. What they know is that there are three clauses, and they go expression-for-if, period. And those who do learn about nesting seem to guess the order wrong at least half the time (hence all the StackOverflow posts on "why does [x for x in range(y) for y in range(5)] give me a NameError?").
Sure it could. If you want to flatten in order, then lift the innermost expression to the left, exactly my "option 3". And any of the three options nest just as well as current generator expressions. This follows exactly the same rules you described: (line with open(path, 'r') as file for line in file if line with open('filelist', 'r') as filelistfile for path in filelistfile) I personally find option 1 more readable than option 3, but as I said, that's just a bare intuition, and I'm not married to it at all.
If the determination is "place it at the end, and affect the whole comprehension/generator expression regardless of the number of clauses"
I didn't even consider that. For the nested case, each for clause can have 0 or 1 with clauses, just as it can have 0 or 1 if clauses. I can't see how anything else is reasonable—how else could you handle the example above without keeping hundreds of files open unnecessarily? Of course for the non-nested case, there is no real distinction between "at the end" and "at the end of each nesting level"…
No, not at all. If you read the blog post I linked, I explain this. But I'll try to answer your points below.
Those aren't at all the same. Or, if they are the same, the first one doesn't work. In the second one, your with expression clearly modifies the expression f.readlines(), so the file is open until f.readlines() finishes. Great. But in the first, it clearly modifies f, so the file is open until f finishes—that is, it gets closed immediately. There may be other things you could attach it to, but I can't think of any way either a human or a computer could parse it as being attached to the iteration inside the implicit generator created by the larger expression that this expression is a part of. And you're going to have the exact same problem with, e.g., "lines = (f with open(name) as f)"—the only useful thing this could do is attach to something *inside the file object*, which is ridiculous. In fact, any statement that cannot be trivially rewritten as "with open(name) as f: lines = f.readlines()" also cannot possibly work right using a general with expression. Which means it's useless. The only way you could possibly make this work is to make a context manager mean different things in different kinds of expressions. That's a horribly bad idea. It means you're building the with clause that I wanted, and a variety of other with clauses, all of which look similar enough to confuse both parsers and human beings, despite doing different things. It's like suggesting that we don't need if clauses in generator expressions because the more general ternary if expression already takes care of it. So, in short, adding general with expressions not only doesn't solve my problem, it makes my problem harder to solve. And it's a bad idea in general, because it only works in cases where it's not needed. So, I'm very strongly against it.
PEP 403 and PEP 3150.
My use case was at the top of my first email: upperlines = (line.upper() for line in file with open(path, 'r') as file) This is the kind of thing beginners need to write and don't know how, and experienced developers do all the time in quick-n-dirty scripts and don't do properly even if they do know how, because it's difficult to write today. And I'm not sure how PEP 403 or PEP 3150 would help.
If you really want to enhance the capabilities of expressions, then the more general proposal is the only one with even a remote
chance
I don't want to do things like turn assignments into expressions, add general with expressions, add bind expressions, etc. I want to make iterating over a context manager easy, and the with clause is the best way I could come up with to do it.
there may be other technical limitations that ultimately rule it out).
I plan to experiment with implementing it in PyPy over the weekend, and if that works out I'll take a look at CPython. But I don't see any reason that options 1 or 2 should be any problem; option 3 might be, but I'll see when I get there.

On 11/15/2012 10:25 AM, Andrew Barnert wrote:
This is how list comps were designed and initially defined.
Given that there are only three clauses, "flatten in order, then move expression to front"
This is the simple and correct rule.
and "flatten in reverse order, then move if clause to back"
This is more complicated and wrong.
are identical.
Until one adds more clauses.
It *is* the rule, and a very simple one. The reference manual gives it, though it could perhaps be clearer. The tutorial List Comprehension section does give a clear example: ''' For example, this listcomp combines the elements of two lists if they are not equal:
Anyone who read and understood that snippet in the tutorial, which took me a minute to find, would not ask such a question. There are people who program Python without ever reading the manuals and guess as they go and, when they stumble, prefer to post questions on forums and wait for a customized answer rather than dig it out themselves. -- Terry Jan Reedy

On Fri, Nov 16, 2012 at 1:25 AM, Andrew Barnert <abarnert@yahoo.com> wrote:
And it's *awful*. If the only thing on the RHS of a simple assignment statement is a lambda or generator expression, that code should almost always be rewritten with def as a matter of style, regardless of other considerations. However, I realised there's a more serious problem with your idea: the outermost clause in a list comprehension or generator expression is evaluated immediately and passed as an argument to the inner scope that implements the loop, so you have an unresolved sequencing problem between the evaluation of that argument and the evaluation of the context manager. If you want the context manager inside the generator, you *can't* reference the name bound in the as clause in the outermost iterable. Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia

On Fri, Nov 16, 2012 at 10:46 AM, Nick Coghlan <ncoghlan@gmail.com> wrote:
(Andrew's reply here dropped the list from the cc, but I figure my subsequent clarification is worth sharing more widely) When you write a genexp like this: gen = (x for x in get_seq()) The expansion is *NOT* this: def _g(): for x in get_seq(): yield x gen = _g() Instead, it is actually: def _g(iterable): for x in iterable: yield x gen = _g(get_seq()) That is, the outermost iterable is evaluated in the *current* scope, not inside the generator. Thus, the entire proposal is rendered incoherent, as there is no way for the context manager expression to be executed both *before* the outermost iterable expression and *inside* the generator function, since the generator doesn't get called until *after* the outermost iterable expression has already been evaluated. (And, to stave of the obvious question, no this order of evaluation is *not* negotiable, as changing it would be a huge backwards compatibility breach, as well as leading to a lot more obscure errors with generator expressions) The reason PEP 403 is potentially relevant is because it lets you write a one-shot generator function using the long form and still make it clear that it *is* a one shot operation that creates the generator-iterator directly, without exposing the generator function itself: @in gen = g() def g(): for x in get_seq(): yield x Or, going back to the use case in the original post: @in upperlines = f() def f(): with open('foo', 'r') as file: for line in file: yield line.upper() Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia

Ah, you're right. The only way this would work is if the with clause were second or later, which would be very uncommon. And the fact that it doesn't work in the most common case means that, even if it were occasionally useful, it would cause a lot more confusion than benefit. So, never mind... Sent from my iPhone On Nov 16, 2012, at 7:53, Nick Coghlan <ncoghlan@gmail.com> wrote:

Nick Coghlan wrote:
That is, the outermost iterable is evaluated in the *current* scope, not inside the generator.
I've always felt it was a bad idea to bake this kludge into the language. It sweeps a certain class of problems under the rug, but only in *some* cases. For example, in ((x, y) for x in foo for y in blarg) rebinding of foo is guarded against, but not blarg. And if that's not arbitrary enough, in the otherwise completely equivalent ((x, y) for y in blarg for x in foo) it's the other way around. Anyhow, it wouldn't be *impossible* to incorporate a with-clause into this scheme. Given (upper(line) with open(name) as f for line in f) you either pick open(name) to be the pre-evaluated expression, or not do any pre-evaluation at all in that case. Either way, it can't break any *existing* code, because nobody is writing genexps containing with-clauses yet. -- Greg

On Fri, Nov 16, 2012 at 3:28 PM, Greg Ewing <greg.ewing@canterbury.ac.nz> wrote:
I wouldn't call it arbitrary -- the second and following clauses *must* be re-evaluated because they may reference the loop variable of the first. And the two versions you show aren't equivalent unless iterating over blarg and foo is completely side-effect-free.
And nobody ever will. It's too ugly. -- --Guido van Rossum (python.org/~guido)

On 11/15/2012 6:11 AM, Andrew Barnert wrote:
From: Phil Connell <pconnell@gmail.com>
The simple rule for comprehensions is that the append (l.c.) or yield (g.e) is moved from last to first and the other statements/clauses are left in the same order.
Which means that 3 is the proper one. In particular, if with clauses were added, f must be defined in the with clause before used in the for clause, just as line must be defined in the for clause before used in the if clause. -- Terry Jan Reedy

So far, nearly everyone is discussing things which are tangential, or arguing that one of the optional variants is bad. So let me strip down the proposal, without any options in it, and expand on a use case. The syntax is: (foo(line) with open('bar') as f for line in baz(f)) This translates to calling this function: def gen(): with open('bar') as f: for line in baz(f): yield foo(line) The translation for with clauses is identical to for and if clauses, and nesting works in the obvious way. So, why do I want to create a generator that wraps a file or other generator inside a with clause? There are a wide range of modules that have functions that can take a generator of strings in place of a file. Some examples off the top of my head include numpy.loadtxt, poster.multipart_encode, and line_protocol.connection.send. Many of these are asynchronous, so I can't just wrap the call in a with statement; I have to send a generator that will close the wrapped file (or other generator) when it's exhausted or closed, instead of when the function returns. So, imagine a simple "get" command in a mail server, a method in the Connection class: def handle_get(self, message_id): path = os.path.join(mailbox_path, message_id) self.send_async(open(path, 'r')) Now, let's say I want to do some kind of processing on the file as I send it (e.g., remove excessive curse words, or add new ones in if there aren't enough in any line): def handle_get(self, message_id): path = os.path.join(mailbox_path, message_id) def censored_file(): with open(path, 'r') as file: for line in file: yield self.censor(line) self.send_async(censored_file()) With my suggested idea, the last 5 lines could be replaced by this: self.send_async(self.censor(line) with open(path, 'r') as file for line in file) Of course this async_chat-style model isn't the only way to write a server, but it is a common way to write a server, and I don't think it should be complicated. ----- Original Message ----

On 16.11.12 11:09, Andrew Barnert wrote:
self.send_async(self.censor(line) for line in open(path, 'r')) or self.send_async(map(self.censor, open(path, 'r'))) This is *not worse* than your first example self.send_async(open(path, 'r')) How do you write a managed uncensored variant? You can use the wrapper suggested by Mathias Panzenböck. self.send_async(managed(open(path, 'r'))) self.send_async(self.censor(line) for line in managed(open(path, 'r'))) It is easy, clear, universal and requires no changes to syntax.

On 11/16/2012 4:09 AM, Andrew Barnert wrote:
OK, that's helpful. Now let me strip down my objection to this: your proposal is conceptually wrong because it mixes two distinct and different ideas -- collection definition and context management. It conflicts with a well-defined notion of long standing. To explain: in math, one can define a set explicitly by displaying the members or implicitly as a subset of based on one or more base sets. Using one version of the notation {0, 2, 4} == {2*i| i in N; i < 3} The latter is 'set-builder notation' or a 'set comprehension' (and would usually use the epsilon-like member symbol instead of 'in'). The idea goes back at least a century. https://en.wikipedia.org/wiki/Set-builder_notation In Python, the latter directly translates to {2*i for i in itertools.count() if i < 3} == {i for i in range(0, 5, 2)} (Python does not require the base collection to match the result class.) Another pair of examples: {(i,j)| i in N, j in N; i+j <= 5} {(i,j) for i in count() for j in count if i+j <= 5} Similar usage in programming go back over half a century. https://en.wikipedia.org/wiki/List_comprehension While notation in both math and CS varies, the components are always input source collection variables, conditions or predicates, and an output expression. The Python reference manual documents comprehensions as an alternate atomic display form. In Chapter 6, Expressions, Section 2, Atoms, "For constructing a list, a set or a dictionary Python provides special syntax called “displays”, each of them in two flavors: either the container contents are listed explicitly, or they are computed via a set of looping and filtering instructions, called a comprehension. ... list_display ::= "[" [expression_list | comprehension] "]" <etc>" A generator expression similarly represents an untyped abstract sequence, rather than a concrete class. --- In summary: A context-manager, as an object with __enter__ and __exit__ methods, is not a proper component of a comprehension. For instance, replace "open('xxx')" in your proposal with a lock creation function. On the other hand, an iterable managed resource, as suggested by Mathias Panzenböck, works fine as a source. So it does work (as you noticed also). -- Terry Jan Reedy

On 17 November 2012 00:00, Terry Reedy <tjreedy@udel.edu> wrote:
I don't follow how you made these two leaps: * It doesn't apply to set comprehensions in *math* -> it doesn't apply to set comprehensions in *Python* * it doesn't apply to *set* comprehensions in Python -> it doesn't apply to *any* comprehensions in Python

On 11/17/2012 3:11 PM, Joshua Landau wrote:
On 17 November 2012 00:00, Terry Reedy
Since the OP withdrew his suggestion, its a moot point. However, I talked about the general, coherent concept of comprehensions, as used in both math and CS, as an alternative to explicit listing. Do look at the references, including the Python manual. It presents the general idea and implementation first and then the four specific versions. I only used sets for an example. -- Terry Jan Reedy

From: Terry Reedy <tjreedy@udel.edu> Sent: Sun, November 18, 2012 11:56:04 AM
I agree that it is a moot point. The idea would require a larger semantic change than I initially anticipated, and I disagree with Greg Ewing that the immediate evaluation of the outer source is a kluge that should be abandoned, so I've withdrawn it. (Of course if Greg Ewing or Joshua Landau or anyone else wants to pick up the idea, I apologize for presuming, but I no longer think it's a good idea.) That's why I ignored the point about set builder notation. But if you want to continue to argue it:
Nested comprehensions already break the analogy with set builder notation. For one thing, nobody would define the rationals as {i/j | j in Z: j != 0 | i in Z}. People would probably figure out what you meant, but you wouldn't write it that way. Nested comprehensions (even more so when one is dependent on the other) make it blatant that a comprehension is actually an iterative sequence builder, not a declarative set builder. The analogy is a loose one, and it already leaks. It really only holds when you've got a single, well-ordered, finite source. It's obvious that (i/j for j in itertools.count(2) for i in range(1, j)) generates the rationals in (0, 1), in a specific order (with repeats), but you wouldn't write anything remotely similar in set builder notation. In fact, you'd probably define that set just as {q | i, j in N+: qj=i, q<1}, and you can't translate that to Python at all.

On 15/11/12 22:11, Andrew Barnert wrote:
Is that a trick question? Answer: None of them. In my opinion, they are all too busy for a generator expression and should be re-written as a generator function. As far as the given use-case is concerned: upperlines = (line.upper() for line in open('foo')) I don't see what the concern is. The file will remain open so long as the generator is not exhausted, but that has to be the case no matter what you do. If the generator is thrown away before being exhausted, the file will eventually be closed by the garbage collector, if only when the application or script exits. For short-lived scripts, the temporarily leakage of a file handle or two is hardly likely to be a serious problem. Presumably if you have a long-lived application with many such opened files, you might risk running out of file handles when running under Jython or IronPython. But I think that's a sufficiently unusual and advanced use- case that I'm not worried that this is a problem that needs solving with syntax instead of education. -- Steven

From: Steven D'Aprano <steve@pearwood.info> Sent: Thu, November 15, 2012 10:05:36 PM
This seems to be an argument against with statements, or any other kind of resource management at all besides "trust the GC". I'm pretty sure PEP 310, PEP 340, PEP 343, and the discussion around them already had plenty of counter-arguments, but here's a couple quick ones: If you've opened a file for exclusive access (the default on Windows), you can't safely open it again if you can't predict when it will be closed. If the context in question is a mutex lock rather than a file open, you can't safely lock it again if you can't predict when it will be released (and, even if you never want to lock it again, you could end up deadlocked against another thread that does).

On 16/11/12 20:26, Andrew Barnert wrote:
Certainly not. I'm saying that for many applications, explicit resource management is not critical -- letting the GC close the file (or whatever resource you're working with) -- is a perfectly adequate strategy. The mere existence of "faulty" gen expressions like the above example is not necessarily a problem. Think of it this way: you can optimize code for speed, for memory, and for resource usage. (Memory of course being a special case of resource usage.) You're worried about making it easy to micro-optimize generator expressions for resource usage. I'm saying that's usually premature optimization. It's not worth new syntax complicating generator expressions to optimize the closing of a few files. If your application is not one of those applications where a laissez-faire approach to resource management is acceptable, that's fine. I'm not saying that nobody needs care about resource management! If you need to care about your resources with more attention than benign neglect, then do so. The only limitation here is that you can't use a context manager in a list comprehension or generator expression. I don't care about that. Not every problem that requires a function needs to be solvable with lambda, and not every problem that requires a generator needs to be solvable with a generator expression. The beauty of generator expressions is that they are deliberately lean. The bar to fatten them up with more syntax is quite high, and I don't think you have come even close to getting over it. -- Steven

From: Steven D'Aprano <steve@pearwood.info> Sent: Fri, November 16, 2012 1:53:42 AM
It's not a micro-optimization, or an optimization at all. It has nothing to do with performance, and everything to do with making your code work at all. (Or, in some cases, making it robust—your code may work 99% of the time, or work with CPython or POSIX but not PyPy or Windows.) For example, see Google's Python Style Guide at http://google-styleguide.googlecode.com/svn/trunk/pyguide.html#Files_and_Soc... for why they recommend always closing files.
The only limitation here is that you can't use a context manager in a list comprehension or generator expression.
Yes, that's exactly the limitation (but only in generator expressions—in list comprehensions, it can't ever matter).
This is one of those cases where it won't hurt you when you don't use it. You don't have to put if clauses into generator expressions, or nest multiple loops—and very often you don't, in which case they don't get in the way, and your expression is concise and simple. Similarly, you won't have to put with clauses into generator expressions, and very often you won't, in which case they won't get in the way. And I don't think anyone would have trouble learning or understanding it. The expression still maps to a generator function that's just a simple tree of one-line nested statements with a yield statement at the bottom, the only difference is that instead of the two most common kinds of statements in such functions, you can now use the three most common.

But a piece of code that everyone needs on a regular basis should be writable, and readable, by a novice Python user. I don't care whether it's one line or four, but I do care that a task that seems to require nothing that you don't learn in your first week with the language is beyond the ability of not just novices, but people who post modules on PyPI, write answers on StackOverflow, etc.
Use a generator function.
Of course the right answer is obvious to you and me, because we understand the difference between static and dynamic scopes, and that a generator defines a dynamic scope, and what context managers actually do, and how to translate a generator expression into a generator function. It's not that the generator function is hard to write; it's that people who don't understand how all this stuff works won't even think of the idea that an explicit generator function would help them here.

On 2012-11-15, at 04:44 , Andrew Barnert wrote:
Actually, it's extremely debatable that the generator function is correct: if the generator is not fully consumed (terminating iteration on the file) I'm pretty sure the file will *not* get closed save by the GC doing a pass on all dead objects maybe. This means this function is *not safe* as a lazy source to an arbitrary client, as that client may very well use itertools.slice or itertools.takewhile and only partially consume the generator. Here's an example: -- import itertools class Manager(object): def __enter__(self): return self def __exit__(self, *args): print("Exited") def __iter__(self): for i in range(5): yield i def foo(): with Manager() as ms: for m in ms: yield m def bar(): print("1") f = foo() print("2") # Only consume part of the iterable list(itertools.islice(f, None, 2)) print("3") bar() print("4") -- CPython output, I'm impressed that the refcounting GC actually bothers unwinding the stack and running the __exit__ handler *once bar has finished executing*:
But here's the (just as correct, as far as I can tell) output from pypy:
If the program was long running, it is possible that pypy would run __exit__ when the containing generator is released (though by no means certain, I don't know if this is specified at all). This is in fact one of the huge issues with faking dynamic scopes via threadlocals and context managers (as e.g. Flask might do, I'm not sure what actual strategy it uses), they interact rather weirdly with generators (it's also why I think Python should support actually dynamically scoped variables, it would also fix the thread-broken behavior of e.g. warnings.catch_warnings)

From: Masklinn <masklinn@masklinn.net> Sent: Thu, November 15, 2012 1:29:46 AM
Well, yes, *no possible object* is safe as a lazy source to an arbitrary client that might not fully consume, close, or destroy it. By definition, the object must stay alive as long as an arbitrary client might use it, so a client that never finishes using it means the object must stay alive forever. And, similarly, in the case of a client that does finish using it, but the only way to detect that is by GCing the client, the object must stay alive until the GC collects the client. So, the correct thing for the generator function to do in that case is… exactly what it does. Of course in that case, it would arguably be just as correct to just do "ms = Manager()" or "file = open('foo', 'r')" instead of "with Manager() as ms:" or "with open('foo', 'r') as file:". The difference is that, in cases where the client does fully consume, close, or destroy the iterator deterministically, the with version will still do the right thing, while the leaky version will not. You can test this very easily by adding an "f.close()" to the end of bar, or changing "f = foo()" to "with closing(foo()) as f:", and compare the two versions of the generator function. Put another way, if your point is an argument against with clauses, it's also an argument against with statements, and manual resource cleanup, and in fact anything but a magical GC.
This is an almost-unrelated side issue. A generator used in a single thread defines a fully deterministic dynamic scope, one that can and often should be used for cleanup. The fact that sometimes it's not the right scope for some cleanups, or that you can use them in multithreaded programs in a way that makes them indeterministic, isn't an argument that it should be hard to use them for cleanup when appropriate, is it?

On 2012-11-15, at 11:08 , Andrew Barnert wrote:
This is an almost-unrelated side issue. A generator used in a single thread defines a fully deterministic dynamic scope
I think you meant "a context manager" not "a generator", and my example quite clearly demonstrates that the interaction between context managers and generators completely break context managers as dynamic scopes.
Using context managers on threadlocals means the context manager itself is in a single-threaded environment, the multithreading is not the issue, the interaction between context managers and generators is.
isn't an argument that it should be hard to use them for cleanup when appropriate, is it?
I never wrote that, I only noted that your assertion about the function you posted (namely that it is "properly written") is dubious and risky.

defines a fully deterministic dynamic scope
I think you meant "a context manager" not "a generator"
No, I meant a generator. "As long as the generator has values to generate, and has not been closed or destroyed" is a dynamic scope. "Until the end of this with statement block" is a static scope. The only reason the context managers in both your example and mine have dynamic scope is because they're embedded in generators.
No it doesn't. It demonstrates that it's possible to create indeterminate scopes, and context managers cannot help you if you do so. "Until the client exhausts the iterator, given that the client is not going to exhaust the iterator" effectively means "Until the client goes away". Which means you need a context manager around the client. The fact that you don't have one means that your client is inherently broken. You'll have the exact same problems with a trivial local object (e.g., its __del__ method won't get called by PyPy). However, if the client *did* have a context manager (or exhausted, closed, or explicitly deleted the generator), a properly-written generator would clean itself up, while a naively-written one would not. That's what I meant by "properly-written". Not that it's guaranteed to clean up even when used by a broken client, because that is completely impossible for any object (generator or otherwise), but that it is guaranteed to clean up when used by a properly-written client.

On 2012-11-15, at 12:37 , Andrew Barnert wrote:
It isn't a dynamic scope in the sense of "dynamic scoping" which is the one I used it in, and the one usually understood when talking about dynamic scopes, which is a function of the stack context in which the code executes not the lifecycle of an object.
"Until the end of this with statement block" is a static scope.
Not from the POV of callees within the stack of which the with block is part, which again is the standard interpretation for "dynamic scopes".
There is nothing indeterminate about the scopes in a classical and usual sense, neither the dynamic scope nor the lexical scope. And languages with proper dynamic scoping support have no issue with this kind of constructs. Neither does Python when walking through the whole stack, naturally.

On Thu, Nov 15, 2012 at 8:12 PM, Mathias Panzenböck <grosser.meister.morti@gmx.net> wrote:
I think this syntax would still make sense for list comprehensions:
upperlines = [lines.upper() for line in file with open('foo', 'r') as file]
-1000. There is no discernible advantage over with open(...) as file: upperlines = [lines.upper() for line in file] Also you've got the order backwards -- when there's a sequence of 'for' and 'if' clauses in a comprehension, they are to be read from left to right, but here you're tacking something onto the end that's supposed to go first. Please don't destroy my beautiful language. --Guido
-- --Guido van Rossum (python.org/~guido)

I'm pretty sure both my original message and the blog post linked from there explained why this is not particularly useful for list comprehensions. (If you're guaranteed to exhaust the iteration in the current block—which you obviously always are for comprehensions—just make the with a statement with its own block.) The only reason I suggested it for comprehensions as well as generator expressions is that someone convinced me that it would be slightly easier to implement, and to teach to users, than if it were only available for generator expressions. From: Mathias Panzenböck <grosser.meister.morti@gmx.net> Sent: Thu, November 15, 2012 8:39:34 PM
file]

Just throwing random syntax variations on the wall to see what/if anything sticks (because I think the "as file"-assignment serves no purpose here): upperlines = (lines.upper() for line in with open('foo', 'r')) upperlines = (lines.upper() for line with open('foo', 'r')) upperlines = (lines.upper() with for line in open('foo', 'r')) Or should the for loop check if there are __enter__ and __exit__ methods and call them? Guess not, but I thought I just mention it as an alternative. For now one can do this, which is functional equivalent but adds the overhead of another generator: def managed(sequence): with sequence: for item in sequence: yield item upperlines = (lines.upper() for line in managed(open('foo', 'r'))) You could even call this helper function "with_", if you like. Or write a helper like this: def iterlines(filename,*args,**kwargs): with open(filename,*args,**kwargs) as f: for line in f: yield line upperlines = (lines.upper() for line in iterlines('foo', 'r')) Maybe there should be a way to let a file be automatically closed when EOF is encountered? Maybe an "autoclose" wrapper object that passes through every method call to the file object but when EOF is encountered during a read it closes the file object? Then one could write: upperlines = (lines.upper() for line in autoclose(open('foo', 'r'))) On 11/15/2012 04:44 AM, Andrew Barnert wrote:

I missed this the first time through among all the other alternative suggestions: Sent from my iPhone On Nov 15, 2012, at 20:33, Mathias Panzenböck <grosser.meister.morti@gmx.net>
I think this ought to be in itertools in the standard library. I don't think the extra overhead will be a problem most of the time. It solves at least the simplest cases for when a with clause would be useful, and it's even a better solution for some cases where you'd write a with statement today. In some cases you'd have to write things like managed(closing(foo)), but in those cases you probably wouldn't have wanted the with clause, either.
participants (11)
-
Andrew Barnert
-
Greg Ewing
-
Guido van Rossum
-
Joshua Landau
-
Masklinn
-
Mathias Panzenböck
-
Nick Coghlan
-
Phil Connell
-
Serhiy Storchaka
-
Steven D'Aprano
-
Terry Reedy