Make "yield" inside a with statement a SyntaxError

This mostly springs off of a comment I saw in some thread. The point of a with statement is that it ensures that some resource will be disposed of, yes? For example, this: with open(filename) as f: contents = f.read() is better than this: contents = open(filename).read() because the former definitely closes the file while the latter relies on garbage collection? The point of a yield expression is to suspend execution. This is nice for efficient looping because instead of having to hold all results in memory, each result can be consumed immediately, yes? Therefore this: def five_to_one(): for i in range(4): yield 5 - i is better than this: def five_to_one(): result = [] for i in range(4): result.append(5 - i) return result because the former suspends execution of "five_to_one" while the latter holds all five results in memory? Now, let's take a look at the following scenario: def read_multiple(*filenames): for filename in filenames: with open(filename) as f: yield f.read() Can you spot the problem? The "with open(filename)" statement is supposed to ensure that the file object is disposed of properly. However, the "yield f.read()" statement suspends execution within the with block, so if this happened: for contents in read_multiple('chunk1', 'chunk2', 'chunk3'): if contents == 'hello': break and the contents of "chunk2" were "hello" then the loop would exit, and "chunk2" would never be closed! Yielding inside a with block, therefore, doesn't make sense and can only lead to obscure bugs. The proper way to define the "read_multiple" function would be like so: def read_multiple(*filenames): for filename in filenames: with open(filename) as f: contents = f.read() yield contents Save the contents in a variable somewhere, then yield the variable, instead of suspending execution within a context manager. I believe all possible cases where one would yield inside a context manager can be covered by saving anything required from the context manager and then yielding the results outside. Therefore, I propose making a "yield" inside a with block become a SyntaxError. This means the first "read_multiple" definition I presented will become illegal and fail *at compile-time*. However, it is still legal to define a generator inside a with block: def pass_file_chars(oldfunc): with open('secretfile') as f: contents = f.read() @functools.wraps def newfunc(*args, **kwargs): for char in contents: yield oldfunc(char, *args, **kwargs) return newfunc This is probably a bad example, but I hope it still explains why it should be legal to define generators in context managers - as long as the with block serves its purpose correctly, everything else should still work normally. For those concerned about backwards compatibility: I believe that those who attempt to yield inside a context manager will already discover that results are undefined when doing so; this will simply make it more obvious that suspending execution in a with block is not meant to happen, and convert undefined behavior into a straight-up SyntaxError. What are your thoughts? Sharing, Ken Hilton;

i think a SyntaxError won't be appropriate as it is valid syntax as the lexer finds nothing wrong it falls more i think like out of index errors and the like, a ContextManagerError ? else, a request to add headings like EXAMPLES ========= in your discussion yours, -- Abdur-Rahmaan Janhangeer https://github.com/abdur-rahmaanj Mauritius

On Tue, Aug 07, 2018 at 11:31:35PM -0700, Elazar wrote:
Instead of disallowing code, this is a case for allowing with expressions:
Only if you accept Ken's conclusion, that yielding inside a with statement is harmful (I don't). And if you do accept Ken's conclusion, then with-expressions don't fix the problem of yielding from inside a with statement. (Just because with-expressions are permitted doesn't mean that all the thousands of with-statements will be instantly re-written to use them.) -- Steve

On Wed, Aug 8, 2018 at 4:14 PM, Ken Hilton <kenlhilton@gmail.com> wrote:
Technically, the 'with' statement ensures that the file will be closed before the line of code following it is run. So in this example: with open(filename, "w") as f: f.write(...) os.rename(filename, target_filename) you have a guarantee that the file is closed prior to the rename call.
This is only a problem if you consider it to be one. The value of the 'with' statement is not destroyed; for example, you're capped at _one_ open file (whereas without the context manager, it's entirely possible for file-open in a loop to leak a large number of handles).
What about this: def read_words(*filenames): for filename in filenames: with open(filename) as f: for line in f: yield from line.split() It'll read from a series of files and yield individual words (ignoring annoying little things like punctuation and hyphenation for the sake of simplicity). You are assuming that every yield-inside-with is a *single* yield.
I have no idea what this is supposed to be doing, nor why you're defining the function this way. Perhaps a better example will illustrate your point more clearly?
The current results are most certainly not undefined, and you're attempting to fix something which is not a problem. If you really find yourself running into situations like this, program your linter to warn you when you do it. ChrisA

I don't think this is a major problem. In this case, the file will be closed when the generator is garbage collected. So you'd also have to leak the generator to actually get this problem. And if leaking generators won't harm your application, neither will leaking the occasional file handle. Also, I think "Yielding inside a with block, therefore, doesn't make sense and can only lead to obscure bugs." is somewhat of an overstatement. Also, the problem isn't the with block. The cause of the "issue" you describe is that a with statement is a try...finally in disguise. With the finally clause being guaranteed to be executed. IIRC, but I can't remember where, I remember a statement from guide referencing this, and calling it an issue that any fix will cause far more pain than the issue ever will. Or something to that point. 2018-08-08 8:32 GMT+02:00 Chris Angelico <rosuav@gmail.com>:

On 8 August 2018 at 07:32, Chris Angelico <rosuav@gmail.com> wrote:
Without the context manager you could write: def read_multiple(*filenames): for filename in filenames: f = open(filename) yield f.read() f.close() Which also only leaks one file descriptor. The point of the with statement is that this was considered unsatisfactory but when you yield from a with statement the advantage of with is lost. In fact if you're happy depending on garbage collection for cleanup then you can write this more conveniently: def read_multiple(*filenames): for filename in filenames: yield open(filename).read() Or even: for text in (open(filename).read() for filename in filenames): # stuff In any case saying that you only leak one file descriptor misses the point. The with statement can do many different things and it's purpose is to guarantee *without depending on garbage collection* that the __exit__ method will be called. Yielding from the with statement circumvents that guarantee.
Actually the general way to solve this problem is to move all context managers out of iterators/generators. Instead of a helper generator what you really want in this situation is a helper context manager. I think the reason this isn't always used is just because it's a bit harder to write a context manager than a generator e.g.: from contextlib import contextmanager @contextmanager def open_cat_split(*filenames): f = None def get_words(): nonlocal f for filename in filenames: with open(filename) as f: for line in f: yield from line.split() else: f = None try: yield get_words() finally: if f is not None: f.close() Then you can do: with open_cat_split('file1.txt', 'file2.txt') as words: for word in words: print(word) I am not sure exactly what the best primitive would be to make this sort of thing easier. I most often see examples of this when someone wants to process multiple files concatenated. The stdlib fileinput module has a context manager that *almost* works: https://docs.python.org/3.7/library/fileinput.html#fileinput.input The corner case that isn't handled correctly by fileinput.input is that when the list of filenames is empty it uses sys.argv or stdin which is unlikely to be what someone wants in most of these situations. You could fix that with from contextlib import contextmanager import fileinput def open_cat(*filenames): if not filenames: @contextmanager def dummy(): yield () return dummy() else: return fileinput.input(filenames) And with that you could do: def map_split(strings): for string in strings: yield from string.split() with open_cat('file1.txt', 'file2.txt') as lines: for word in map_split(lines): print(word) Perhaps actually what is wanted is a more generic contextlib helper. I guess ExitStack is intended for this sort of thing but it's a bit unwieldy to use: https://docs.python.org/3.7/library/contextlib.html#contextlib.ExitStack We can use ExitStack to make something more convenient though: from contextlib import ExitStack, contextmanager @contextmanager def map_with(func, iterable): stack = ExitStack() def g(): for arg in iterable: stack.close() iterable_cm = func(arg) stack.push(iterable_cm) yield iterable_cm with stack: yield g() The you can do something like: def cat_split(files): for f in files: for line in f: yield from line.split() with map_with(open, ['file1.txt', 'file2.txt']) as files: for word in cat_split(files): print(word) -- Oscar

On Wed, Aug 8, 2018 at 10:32 PM, Oscar Benjamin <oscar.j.benjamin@gmail.com> wrote:
What if you remove yield from the picture? def read_multiple(callback, *filenames): for filename in filenames: f = open(filename) callback(f.read()) f.close() This is a classic approach as done in many languages that lack generators. The problem isn't that one file handle is being leaked until the next iteration; it's that an exception could permanently leak that file. Also of note is the "write-then-read" approach: def writer(fn): f = open(fn, "w") f.write(...) def reader(fn): f = open(fn) return f.read() writer("foo"); reader("foo") Without a guarantee that the file is closed *immediately* on losing its last reference, this is risky, because the write handle might still be open when the read handle is opened. The context manager guarantees both of these situations, and that doesn't change when a yield point is inserted, same as it doesn't change when a callback is inserted. The generator mechanism inverts the call stack a bit, but it isn't fundamentally different; at some point in read_multiple, it waits for some other code to do its stuff, then it continues.
And that's the whole point: you cannot be happy with garbage collection cleanup, because a non-refcounting Python implementation is not guaranteed to clean these up promptly.
Technically, it guarantees that __exit__ will be called before the next line of code after the 'with' block. It cannot actually guarantee that it will be called: with open(filename) as f: while True: pass print("Won't happen") print("Won't happen either") All that's guaranteed is that the second print call will not happen *before* the file is closed.
So... you still have a yield (in this case a "yield from") inside your 'with' block, but you've wrapped it up in an extra layer, and done a manual try/finally. How is this different? Looking only at the lower half of the function, you have this: try: yield # <== the function stops here finally: # <== the function cleans up here This is exactly the same as the context manager situation. The finally block cannot be executed until the function resumes. What you have accomplished here is a level of nesting, and you can achieve that far more simply by just closing the context manager itself. Going right back to the original "problem code": def read_multiple(*filenames): for filename in filenames: with open(filename) as f: yield f.read() for contents in read_multiple('chunk1', 'chunk2', 'chunk3'): if contents == 'hello': break We can keep the generator completely unchanged, and add a context manager to the loop, since we're not going to pump it completely: with contextlib.closing(read_multiple('chunk1', 'chunk2', 'chunk3')) as stuff: for contents in stuff: if contents == 'hello': break Now, when you exit the loop, the generator will be closed, which uses normal exception handling to clean up everything inside that function. The files are guaranteed to be closed as part of the closing of the generator.
Yeah, I think what you have here is close to what you want, but contextlib.closing() can cover it here. The key thing to remember is that generators can be closed. Use that functionality. It'll save you a lot of hassle :) ChrisA

On Wednesday, 8 August 2018 07:14:47 BST Ken Hilton wrote:
In this particular case you can fix the code: def read_multiple(*filenames): for filename in filenames: with open(filename) as f: result = f.read() yield result Of course if you used readline() and not read() the problem returns. But so long as you do not leak the generator the file will be closed immediately after the loop as the ref count of the generater hits 0. Barry

On Wed, Aug 8, 2018 at 5:05 PM, Barry Scott <barry@barrys-emacs.org> wrote:
But so long as you do not leak the generator the file will be closed immediately after the loop as the ref count of the generater hits 0.
Technically that's not guaranteed (since refcounts aren't a language feature), but if you're using this generator somewhere and want to be able to force it to close its files, all you have to do is close the generator. ChrisA

On Tue, Aug 7, 2018 at 11:14 PM, Ken Hilton <kenlhilton@gmail.com> wrote:
This is a real problem. (Well, technically the 'with' block's __exit__ function *will* eventually close the file, when the generator is garbage-collected – see PEP 342 for details – but this is not exactly satisfying, because the whole purpose of the 'with' block is to close the file *without* relying on the garbage collector.) Unfortunately, your proposal for solving it is a non-starter -- there are lots of cases where 'yield' inside a 'with' block is not only used, but is clearly the right thing to do. A notable one is when defining a next contextmanager in terms of a pre-existing contextmanager: @contextmanager def tempfile(): # This is an insecure way of making a temp file but good enough for an example tempname = pick_random_filename() with open(tempname, "w") as f: yield f Here are some links for previous discussions around these kinds of issues, none of which have really gone anywhere but might help you get a sense of the landscape of options: https://www.python.org/dev/peps/pep-0533/ https://www.python.org/dev/peps/pep-0521/ https://www.python.org/dev/peps/pep-0568/ https://github.com/python-trio/trio/issues/264 One small step that might be doable would be to start issuing ResourceWarning whenever a generator that was suspended inside a 'with' or 'try' block is GC'ed. -n -- Nathaniel J. Smith -- https://vorpus.org

On 8 August 2018 at 08:48, Nathaniel Smith <njs@pobox.com> wrote:
PEP 342 guarantees that *if* a generator is garbage collected its .close() method will be called which would in turn trigger __exit__() for any active context managers in the generator. However PEP 342 does not guarantee that the generator would be garbage collected. As you noted in PEP 533: In Python implementations that do not use reference counting (e.g. PyPy, Jython), calls to __del__ may be arbitrarily delayed... However this statement does not go far enough. It is not guaranteed that __del__ will be called *at all* under other implementations. An example to test this is: $ cat gencm.py class CM: def __enter__(self): print("Entering") return self def __exit__(self, *args): print("Exiting") def generator(): with CM(): yield 1 yield 2 yield 3 g = generator() def f(): for x in g: break # Or return f() print("End of program") $ python3 gencm.py Entering End of program Exiting $ pypy --version Python 2.7.2 (1.8+dfsg-2, Feb 19 2012, 19:18:08) [PyPy 1.8.0 with GCC 4.6.2] $ pypy gencm.py Entering End of program (I don't actually have pypy to hand right now so I'm copying this from here: https://www.mail-archive.com/tutor@python.org/msg70961.html) What the above shows is that for this example: 1) Under CPython __exit__ is called by __del__ at process exit after every line of Python code has finished. 2) Under PyPy __exit__ was not called at any point That's not a bug in PyPy: Python the language makes very few guarantees about garbage collection.
This is the only example I can think of where yielding from a with statement is the right thing to do. Really this comes from the way that the contextmanager is abusing syntax though. It takes a bunch of strange machinery to make this work as it does: https://github.com/python/cpython/blob/3.7/Lib/contextlib.py#L116
This seems like a great idea to me. -- Oscar

Ken Hilton writes:
You have one example of a bad outcome. That hardly shows that *no* yield in a with block makes sense. On the contrary Chris A's suggestion that a function might yield multiple times within one context manager seems valid, although it might be bad practice. There's another possibility. Context managers have general initialize-operate-finalize semantics. True, they *often* have the simple form "return the value of *expr* bound to *var* at entry, clean up the contents of *var* at exit". But I find it easy to imagine that there are situations where the __enter__ method is complex and the __exit__ method is a no-op. In that case the problem you identify can't happen, and yielding inside such a context manager would be harmless. A third possibility is that the __exit__ method has semantics like the "else" of "while ... else", where the "normal" use case involves a break, but occasionally the input is exhausted, and some "default" action needs to be taken. I grant that I don't have examples of the latter (after all, writing context managers is a relatively rare activity, and the resource acquisition-and-release paradigm is prominent). But Chris's point about line-oriented generators seems both plausible and common. His analysis of why the risk is small also seems plausible, so I would say this problematic case (and it is problematic) is covered by the "consenting adults" principle. Steve

On 08/08/18 07:14, Ken Hilton wrote:
An incomplete analysis and therefore an incorrect conclusion. Until the garbage collector comes out to play, read_multiple() will keep the file open, keep the tuple of filenames in memory and of course keep the environment of the generator around. That's all leakage caused by the _break_, and following your logic the obvious solution would be to ban breaks in loops that are reading from generators. But that wouldn't be helpful, obviously. -- Rhodri James *-* Kynesim Ltd

It is also possible to fix the particular issue by using another with statement, that is use: with contextlib.closing(read_multiple(…)) as chunks: for contents in chunks: … Automatically closing the generator at the end of the for loop would be nice, but getting the semantics right without breaking existing valid code is not trivial. Ronald

On 8 August 2018 at 15:22, Ronald Oussoren via Python-ideas <python-ideas@python.org> wrote:
That's a very good point Ronald. Having seen this a few times I can't think of any cases that wouldn't be solved by this. If a language change was wanted for this then perhaps it should be to do like file objects and add the __exit__ (calls close()) method to generator objects so that you can omit the closing and just do: with read_multiple(…) as chunks: for contents in chunks: ... -- Oscar

On 8 August 2018 at 15:37, Oscar Benjamin <oscar.j.benjamin@gmail.com> wrote:
Thinking about this some more: closing() can ensure finalisation but it is still generally bad to yield from a with block. Some context managers are designed to temporarily alter global state between __enter__ and __exit__ - this can be very confusing if you use them around a yield block. As an example the localcontext context manager from the decimal module does this. So if you do something like def decimal_generator(): ... with decimal.localcontext() as ctx: ctx.prec = 4 # Use 4 digits for calculations temporarily yield x + y yield y + z with decimal.localcontext() as ctx: ctx.prec = 10 # Use 10 digits for calculations temporarily total = decimal.Decimal(0) for item in decimal_generator(): total += item Here the line total += item will end up using 4 digit arithmetic rather than 10. That's because the yield allows the "local" context to leak out. The fix for that is something like: def decimal_generator(): ... ctx = decimal.localcontext() ctx.prec = 4 # Use 4 digits for calculations temporarily with ctx: val = x + y yield val with ctx: val = y + z yield val I still think that this is a good idea though:
-- Oscar

On Wed, Aug 8, 2018 at 12:28 PM Paul Moore <p.f.moore@gmail.com> wrote:
To be even more precise, PEP 550 was designed to solve state handling problem for both "await" (coroutines) and "yield" (generators). It was rejected in favour of a more simplistic PEP 567 that solves it only for coroutines. PEP 568 was later drafted by Nathaniel. It applies PEP 550's ideas to 567 API/design to allow generators to have execution context too in Python 3.8 or 3.9. The fate of PEP 568 is still undecided. Yury

Ronald Oussoren via Python-ideas wrote:
How about providing a decorator that turns the generator into a context manager def close_gen(f): @contextmanager @wraps(f) def g(*args, **kw): with closing(f(*args, **kw)) as h: yield h return g @close_gen def read_multiple(...): ... ? A more radical approach would be to add __enter__ and __exit__ methods so that for every generator function f with f() as g: pass would call g.close()

i think a SyntaxError won't be appropriate as it is valid syntax as the lexer finds nothing wrong it falls more i think like out of index errors and the like, a ContextManagerError ? else, a request to add headings like EXAMPLES ========= in your discussion yours, -- Abdur-Rahmaan Janhangeer https://github.com/abdur-rahmaanj Mauritius

On Tue, Aug 07, 2018 at 11:31:35PM -0700, Elazar wrote:
Instead of disallowing code, this is a case for allowing with expressions:
Only if you accept Ken's conclusion, that yielding inside a with statement is harmful (I don't). And if you do accept Ken's conclusion, then with-expressions don't fix the problem of yielding from inside a with statement. (Just because with-expressions are permitted doesn't mean that all the thousands of with-statements will be instantly re-written to use them.) -- Steve

On Wed, Aug 8, 2018 at 4:14 PM, Ken Hilton <kenlhilton@gmail.com> wrote:
Technically, the 'with' statement ensures that the file will be closed before the line of code following it is run. So in this example: with open(filename, "w") as f: f.write(...) os.rename(filename, target_filename) you have a guarantee that the file is closed prior to the rename call.
This is only a problem if you consider it to be one. The value of the 'with' statement is not destroyed; for example, you're capped at _one_ open file (whereas without the context manager, it's entirely possible for file-open in a loop to leak a large number of handles).
What about this: def read_words(*filenames): for filename in filenames: with open(filename) as f: for line in f: yield from line.split() It'll read from a series of files and yield individual words (ignoring annoying little things like punctuation and hyphenation for the sake of simplicity). You are assuming that every yield-inside-with is a *single* yield.
I have no idea what this is supposed to be doing, nor why you're defining the function this way. Perhaps a better example will illustrate your point more clearly?
The current results are most certainly not undefined, and you're attempting to fix something which is not a problem. If you really find yourself running into situations like this, program your linter to warn you when you do it. ChrisA

I don't think this is a major problem. In this case, the file will be closed when the generator is garbage collected. So you'd also have to leak the generator to actually get this problem. And if leaking generators won't harm your application, neither will leaking the occasional file handle. Also, I think "Yielding inside a with block, therefore, doesn't make sense and can only lead to obscure bugs." is somewhat of an overstatement. Also, the problem isn't the with block. The cause of the "issue" you describe is that a with statement is a try...finally in disguise. With the finally clause being guaranteed to be executed. IIRC, but I can't remember where, I remember a statement from guide referencing this, and calling it an issue that any fix will cause far more pain than the issue ever will. Or something to that point. 2018-08-08 8:32 GMT+02:00 Chris Angelico <rosuav@gmail.com>:

On 8 August 2018 at 07:32, Chris Angelico <rosuav@gmail.com> wrote:
Without the context manager you could write: def read_multiple(*filenames): for filename in filenames: f = open(filename) yield f.read() f.close() Which also only leaks one file descriptor. The point of the with statement is that this was considered unsatisfactory but when you yield from a with statement the advantage of with is lost. In fact if you're happy depending on garbage collection for cleanup then you can write this more conveniently: def read_multiple(*filenames): for filename in filenames: yield open(filename).read() Or even: for text in (open(filename).read() for filename in filenames): # stuff In any case saying that you only leak one file descriptor misses the point. The with statement can do many different things and it's purpose is to guarantee *without depending on garbage collection* that the __exit__ method will be called. Yielding from the with statement circumvents that guarantee.
Actually the general way to solve this problem is to move all context managers out of iterators/generators. Instead of a helper generator what you really want in this situation is a helper context manager. I think the reason this isn't always used is just because it's a bit harder to write a context manager than a generator e.g.: from contextlib import contextmanager @contextmanager def open_cat_split(*filenames): f = None def get_words(): nonlocal f for filename in filenames: with open(filename) as f: for line in f: yield from line.split() else: f = None try: yield get_words() finally: if f is not None: f.close() Then you can do: with open_cat_split('file1.txt', 'file2.txt') as words: for word in words: print(word) I am not sure exactly what the best primitive would be to make this sort of thing easier. I most often see examples of this when someone wants to process multiple files concatenated. The stdlib fileinput module has a context manager that *almost* works: https://docs.python.org/3.7/library/fileinput.html#fileinput.input The corner case that isn't handled correctly by fileinput.input is that when the list of filenames is empty it uses sys.argv or stdin which is unlikely to be what someone wants in most of these situations. You could fix that with from contextlib import contextmanager import fileinput def open_cat(*filenames): if not filenames: @contextmanager def dummy(): yield () return dummy() else: return fileinput.input(filenames) And with that you could do: def map_split(strings): for string in strings: yield from string.split() with open_cat('file1.txt', 'file2.txt') as lines: for word in map_split(lines): print(word) Perhaps actually what is wanted is a more generic contextlib helper. I guess ExitStack is intended for this sort of thing but it's a bit unwieldy to use: https://docs.python.org/3.7/library/contextlib.html#contextlib.ExitStack We can use ExitStack to make something more convenient though: from contextlib import ExitStack, contextmanager @contextmanager def map_with(func, iterable): stack = ExitStack() def g(): for arg in iterable: stack.close() iterable_cm = func(arg) stack.push(iterable_cm) yield iterable_cm with stack: yield g() The you can do something like: def cat_split(files): for f in files: for line in f: yield from line.split() with map_with(open, ['file1.txt', 'file2.txt']) as files: for word in cat_split(files): print(word) -- Oscar

On Wed, Aug 8, 2018 at 10:32 PM, Oscar Benjamin <oscar.j.benjamin@gmail.com> wrote:
What if you remove yield from the picture? def read_multiple(callback, *filenames): for filename in filenames: f = open(filename) callback(f.read()) f.close() This is a classic approach as done in many languages that lack generators. The problem isn't that one file handle is being leaked until the next iteration; it's that an exception could permanently leak that file. Also of note is the "write-then-read" approach: def writer(fn): f = open(fn, "w") f.write(...) def reader(fn): f = open(fn) return f.read() writer("foo"); reader("foo") Without a guarantee that the file is closed *immediately* on losing its last reference, this is risky, because the write handle might still be open when the read handle is opened. The context manager guarantees both of these situations, and that doesn't change when a yield point is inserted, same as it doesn't change when a callback is inserted. The generator mechanism inverts the call stack a bit, but it isn't fundamentally different; at some point in read_multiple, it waits for some other code to do its stuff, then it continues.
And that's the whole point: you cannot be happy with garbage collection cleanup, because a non-refcounting Python implementation is not guaranteed to clean these up promptly.
Technically, it guarantees that __exit__ will be called before the next line of code after the 'with' block. It cannot actually guarantee that it will be called: with open(filename) as f: while True: pass print("Won't happen") print("Won't happen either") All that's guaranteed is that the second print call will not happen *before* the file is closed.
So... you still have a yield (in this case a "yield from") inside your 'with' block, but you've wrapped it up in an extra layer, and done a manual try/finally. How is this different? Looking only at the lower half of the function, you have this: try: yield # <== the function stops here finally: # <== the function cleans up here This is exactly the same as the context manager situation. The finally block cannot be executed until the function resumes. What you have accomplished here is a level of nesting, and you can achieve that far more simply by just closing the context manager itself. Going right back to the original "problem code": def read_multiple(*filenames): for filename in filenames: with open(filename) as f: yield f.read() for contents in read_multiple('chunk1', 'chunk2', 'chunk3'): if contents == 'hello': break We can keep the generator completely unchanged, and add a context manager to the loop, since we're not going to pump it completely: with contextlib.closing(read_multiple('chunk1', 'chunk2', 'chunk3')) as stuff: for contents in stuff: if contents == 'hello': break Now, when you exit the loop, the generator will be closed, which uses normal exception handling to clean up everything inside that function. The files are guaranteed to be closed as part of the closing of the generator.
Yeah, I think what you have here is close to what you want, but contextlib.closing() can cover it here. The key thing to remember is that generators can be closed. Use that functionality. It'll save you a lot of hassle :) ChrisA

On Wednesday, 8 August 2018 07:14:47 BST Ken Hilton wrote:
In this particular case you can fix the code: def read_multiple(*filenames): for filename in filenames: with open(filename) as f: result = f.read() yield result Of course if you used readline() and not read() the problem returns. But so long as you do not leak the generator the file will be closed immediately after the loop as the ref count of the generater hits 0. Barry

On Wed, Aug 8, 2018 at 5:05 PM, Barry Scott <barry@barrys-emacs.org> wrote:
But so long as you do not leak the generator the file will be closed immediately after the loop as the ref count of the generater hits 0.
Technically that's not guaranteed (since refcounts aren't a language feature), but if you're using this generator somewhere and want to be able to force it to close its files, all you have to do is close the generator. ChrisA

On Tue, Aug 7, 2018 at 11:14 PM, Ken Hilton <kenlhilton@gmail.com> wrote:
This is a real problem. (Well, technically the 'with' block's __exit__ function *will* eventually close the file, when the generator is garbage-collected – see PEP 342 for details – but this is not exactly satisfying, because the whole purpose of the 'with' block is to close the file *without* relying on the garbage collector.) Unfortunately, your proposal for solving it is a non-starter -- there are lots of cases where 'yield' inside a 'with' block is not only used, but is clearly the right thing to do. A notable one is when defining a next contextmanager in terms of a pre-existing contextmanager: @contextmanager def tempfile(): # This is an insecure way of making a temp file but good enough for an example tempname = pick_random_filename() with open(tempname, "w") as f: yield f Here are some links for previous discussions around these kinds of issues, none of which have really gone anywhere but might help you get a sense of the landscape of options: https://www.python.org/dev/peps/pep-0533/ https://www.python.org/dev/peps/pep-0521/ https://www.python.org/dev/peps/pep-0568/ https://github.com/python-trio/trio/issues/264 One small step that might be doable would be to start issuing ResourceWarning whenever a generator that was suspended inside a 'with' or 'try' block is GC'ed. -n -- Nathaniel J. Smith -- https://vorpus.org

On 8 August 2018 at 08:48, Nathaniel Smith <njs@pobox.com> wrote:
PEP 342 guarantees that *if* a generator is garbage collected its .close() method will be called which would in turn trigger __exit__() for any active context managers in the generator. However PEP 342 does not guarantee that the generator would be garbage collected. As you noted in PEP 533: In Python implementations that do not use reference counting (e.g. PyPy, Jython), calls to __del__ may be arbitrarily delayed... However this statement does not go far enough. It is not guaranteed that __del__ will be called *at all* under other implementations. An example to test this is: $ cat gencm.py class CM: def __enter__(self): print("Entering") return self def __exit__(self, *args): print("Exiting") def generator(): with CM(): yield 1 yield 2 yield 3 g = generator() def f(): for x in g: break # Or return f() print("End of program") $ python3 gencm.py Entering End of program Exiting $ pypy --version Python 2.7.2 (1.8+dfsg-2, Feb 19 2012, 19:18:08) [PyPy 1.8.0 with GCC 4.6.2] $ pypy gencm.py Entering End of program (I don't actually have pypy to hand right now so I'm copying this from here: https://www.mail-archive.com/tutor@python.org/msg70961.html) What the above shows is that for this example: 1) Under CPython __exit__ is called by __del__ at process exit after every line of Python code has finished. 2) Under PyPy __exit__ was not called at any point That's not a bug in PyPy: Python the language makes very few guarantees about garbage collection.
This is the only example I can think of where yielding from a with statement is the right thing to do. Really this comes from the way that the contextmanager is abusing syntax though. It takes a bunch of strange machinery to make this work as it does: https://github.com/python/cpython/blob/3.7/Lib/contextlib.py#L116
This seems like a great idea to me. -- Oscar

Ken Hilton writes:
You have one example of a bad outcome. That hardly shows that *no* yield in a with block makes sense. On the contrary Chris A's suggestion that a function might yield multiple times within one context manager seems valid, although it might be bad practice. There's another possibility. Context managers have general initialize-operate-finalize semantics. True, they *often* have the simple form "return the value of *expr* bound to *var* at entry, clean up the contents of *var* at exit". But I find it easy to imagine that there are situations where the __enter__ method is complex and the __exit__ method is a no-op. In that case the problem you identify can't happen, and yielding inside such a context manager would be harmless. A third possibility is that the __exit__ method has semantics like the "else" of "while ... else", where the "normal" use case involves a break, but occasionally the input is exhausted, and some "default" action needs to be taken. I grant that I don't have examples of the latter (after all, writing context managers is a relatively rare activity, and the resource acquisition-and-release paradigm is prominent). But Chris's point about line-oriented generators seems both plausible and common. His analysis of why the risk is small also seems plausible, so I would say this problematic case (and it is problematic) is covered by the "consenting adults" principle. Steve

On 08/08/18 07:14, Ken Hilton wrote:
An incomplete analysis and therefore an incorrect conclusion. Until the garbage collector comes out to play, read_multiple() will keep the file open, keep the tuple of filenames in memory and of course keep the environment of the generator around. That's all leakage caused by the _break_, and following your logic the obvious solution would be to ban breaks in loops that are reading from generators. But that wouldn't be helpful, obviously. -- Rhodri James *-* Kynesim Ltd

It is also possible to fix the particular issue by using another with statement, that is use: with contextlib.closing(read_multiple(…)) as chunks: for contents in chunks: … Automatically closing the generator at the end of the for loop would be nice, but getting the semantics right without breaking existing valid code is not trivial. Ronald

On 8 August 2018 at 15:22, Ronald Oussoren via Python-ideas <python-ideas@python.org> wrote:
That's a very good point Ronald. Having seen this a few times I can't think of any cases that wouldn't be solved by this. If a language change was wanted for this then perhaps it should be to do like file objects and add the __exit__ (calls close()) method to generator objects so that you can omit the closing and just do: with read_multiple(…) as chunks: for contents in chunks: ... -- Oscar

On 8 August 2018 at 15:37, Oscar Benjamin <oscar.j.benjamin@gmail.com> wrote:
Thinking about this some more: closing() can ensure finalisation but it is still generally bad to yield from a with block. Some context managers are designed to temporarily alter global state between __enter__ and __exit__ - this can be very confusing if you use them around a yield block. As an example the localcontext context manager from the decimal module does this. So if you do something like def decimal_generator(): ... with decimal.localcontext() as ctx: ctx.prec = 4 # Use 4 digits for calculations temporarily yield x + y yield y + z with decimal.localcontext() as ctx: ctx.prec = 10 # Use 10 digits for calculations temporarily total = decimal.Decimal(0) for item in decimal_generator(): total += item Here the line total += item will end up using 4 digit arithmetic rather than 10. That's because the yield allows the "local" context to leak out. The fix for that is something like: def decimal_generator(): ... ctx = decimal.localcontext() ctx.prec = 4 # Use 4 digits for calculations temporarily with ctx: val = x + y yield val with ctx: val = y + z yield val I still think that this is a good idea though:
-- Oscar

On Wed, Aug 8, 2018 at 12:28 PM Paul Moore <p.f.moore@gmail.com> wrote:
To be even more precise, PEP 550 was designed to solve state handling problem for both "await" (coroutines) and "yield" (generators). It was rejected in favour of a more simplistic PEP 567 that solves it only for coroutines. PEP 568 was later drafted by Nathaniel. It applies PEP 550's ideas to 567 API/design to allow generators to have execution context too in Python 3.8 or 3.9. The fate of PEP 568 is still undecided. Yury

Ronald Oussoren via Python-ideas wrote:
How about providing a decorator that turns the generator into a context manager def close_gen(f): @contextmanager @wraps(f) def g(*args, **kw): with closing(f(*args, **kw)) as h: yield h return g @close_gen def read_multiple(...): ... ? A more radical approach would be to add __enter__ and __exit__ methods so that for every generator function f with f() as g: pass would call g.close()
participants (15)
-
Abdur-Rahmaan Janhangeer
-
Barry Scott
-
Chris Angelico
-
Elazar
-
Jacco van Dorp
-
Ken Hilton
-
Nathaniel Smith
-
Oscar Benjamin
-
Paul Moore
-
Peter Otten
-
Rhodri James
-
Ronald Oussoren
-
Stephen J. Turnbull
-
Steven D'Aprano
-
Yury Selivanov