[Python-ideas] Make "yield" inside a with statement a SyntaxError

Chris Angelico rosuav at gmail.com
Wed Aug 8 10:27:29 EDT 2018


On Wed, Aug 8, 2018 at 10:32 PM, Oscar Benjamin
<oscar.j.benjamin at gmail.com> wrote:
> Without the context manager you could write:
>
>     def read_multiple(*filenames):
>         for filename in filenames:
>             f = open(filename)
>             yield f.read()
>             f.close()
>
> Which also only leaks one file descriptor. The point of the with
> statement is that this was considered unsatisfactory but when you
> yield from a with statement the advantage of with is lost.

What if you remove yield from the picture?

     def read_multiple(callback, *filenames):
         for filename in filenames:
             f = open(filename)
             callback(f.read())
             f.close()

This is a classic approach as done in many languages that lack
generators. The problem isn't that one file handle is being leaked
until the next iteration; it's that an exception could permanently
leak that file. Also of note is the "write-then-read" approach:

def writer(fn):
    f = open(fn, "w")
    f.write(...)
def reader(fn):
    f = open(fn)
    return f.read()
writer("foo"); reader("foo")

Without a guarantee that the file is closed *immediately* on losing
its last reference, this is risky, because the write handle might
still be open when the read handle is opened.

The context manager guarantees both of these situations, and that
doesn't change when a yield point is inserted, same as it doesn't
change when a callback is inserted. The generator mechanism inverts
the call stack a bit, but it isn't fundamentally different; at some
point in read_multiple, it waits for some other code to do its stuff,
then it continues.

> In fact if you're happy depending on garbage collection for cleanup
> then you can write this more conveniently:
>
> def read_multiple(*filenames):
>     for filename in filenames:
>         yield open(filename).read()

And that's the whole point: you cannot be happy with garbage
collection cleanup, because a non-refcounting Python implementation is
not guaranteed to clean these up promptly.

> In any case saying that you only leak one file descriptor misses the
> point. The with statement can do many different things and it's
> purpose is to guarantee *without depending on garbage collection* that
> the __exit__ method will be called. Yielding from the with statement
> circumvents that guarantee.

Technically, it guarantees that __exit__ will be called before the
next line of code after the 'with' block. It cannot actually guarantee
that it will be called:

with open(filename) as f:
    while True: pass
    print("Won't happen")
print("Won't happen either")

All that's guaranteed is that the second print call will not happen
*before* the file is closed.

> Actually the general way to solve this problem is to move all context
> managers out of iterators/generators. Instead of a helper generator
> what you really want in this situation is a helper context manager. I
> think the reason this isn't always used is just because it's a bit
> harder to write a context manager than a generator e.g.:
>
> from contextlib import contextmanager
>
> @contextmanager
> def open_cat_split(*filenames):
>     f = None
>     def get_words():
>         nonlocal f
>         for filename in filenames:
>             with open(filename) as f:
>                 for line in f:
>                     yield from line.split()
>         else:
>             f = None
>     try:
>         yield get_words()
>     finally:
>         if f is not None:
>             f.close()
>
> Then you can do:
>
> with open_cat_split('file1.txt', 'file2.txt') as words:
>     for word in words:
>         print(word)
>

So... you still have a yield (in this case a "yield from") inside your
'with' block, but you've wrapped it up in an extra layer, and done a
manual try/finally. How is this different? Looking only at the lower
half of the function, you have this:

try:
    yield # <== the function stops here
finally:
    # <== the function cleans up here

This is exactly the same as the context manager situation. The finally
block cannot be executed until the function resumes.

What you have accomplished here is a level of nesting, and you can
achieve that far more simply by just closing the context manager
itself. Going right back to the original "problem code":

    def read_multiple(*filenames):
        for filename in filenames:
            with open(filename) as f:
                yield f.read()

    for contents in read_multiple('chunk1', 'chunk2', 'chunk3'):
        if contents == 'hello':
            break

We can keep the generator completely unchanged, and add a context
manager to the loop, since we're not going to pump it completely:

  with contextlib.closing(read_multiple('chunk1', 'chunk2', 'chunk3')) as stuff:
      for contents in stuff:
          if contents == 'hello':
              break

Now, when you exit the loop, the generator will be closed, which uses
normal exception handling to clean up everything inside that function.
The files are guaranteed to be closed as part of the closing of the
generator.

> Perhaps actually what is wanted is a more generic contextlib helper. I
> guess ExitStack is intended for this sort of thing but it's a bit
> unwieldy to use:
> https://docs.python.org/3.7/library/contextlib.html#contextlib.ExitStack
>
> We can use ExitStack to make something more convenient though:
>
> from contextlib import ExitStack, contextmanager
>
> @contextmanager
> def map_with(func, iterable):
>     stack = ExitStack()
>     def g():
>         for arg in iterable:
>             stack.close()
>             iterable_cm = func(arg)
>             stack.push(iterable_cm)
>             yield iterable_cm
>     with stack:
>         yield g()
>
> The you can do something like:
>
> def cat_split(files):
>     for f in files:
>         for line in f:
>             yield from line.split()
>
> with map_with(open, ['file1.txt', 'file2.txt']) as files:
>     for word in cat_split(files):
>         print(word)

Yeah, I think what you have here is close to what you want, but
contextlib.closing() can cover it here.

The key thing to remember is that generators can be closed. Use that
functionality. It'll save you a lot of hassle :)

ChrisA


More information about the Python-ideas mailing list