[Python-ideas] Make "yield" inside a with statement a SyntaxError

Wed Aug 8 08:32:10 EDT 2018

On 8 August 2018 at 07:32, Chris Angelico <rosuav at gmail.com> wrote:
> On Wed, Aug 8, 2018 at 4:14 PM, Ken Hilton <kenlhilton at gmail.com> wrote:
>>
>> Now, let's take a look at the following scenario:
>>
>>     def read_multiple(*filenames):
>>         for filename in filenames:
>>             with open(filename) as f:
>>                 yield f.read()
>>
>> Can you spot the problem? The "with open(filename)" statement is supposed to
>> ensure that the file object is disposed of properly. However, the "yield
>> f.read()" statement suspends execution within the with block, so if this
>> happened:
>>
>>     for contents in read_multiple('chunk1', 'chunk2', 'chunk3'):
>>         if contents == 'hello':
>>             break
>>
>> and the contents of "chunk2" were "hello" then the loop would exit, and
>> "chunk2" would never be closed! Yielding inside a with block, therefore,
>> doesn't make sense and can only lead to obscure bugs.
>
> This is only a problem if you consider it to be one. The value of the
> 'with' statement is not destroyed; for example, you're capped at _one_
> open file (whereas without the context manager, it's entirely possible
> for file-open in a loop to leak a large number of handles).

Without the context manager you could write:

    def read_multiple(*filenames):
        for filename in filenames:
            f = open(filename)
            yield f.read()
            f.close()

Which also only leaks one file descriptor. The point of the with
statement is that this was considered unsatisfactory but when you
yield from a with statement the advantage of with is lost.

In fact if you're happy depending on garbage collection for cleanup
then you can write this more conveniently:

def read_multiple(*filenames):
    for filename in filenames:
        yield open(filename).read()

Or even:

for text in (open(filename).read() for filename in filenames):
    # stuff

In any case saying that you only leak one file descriptor misses the
point. The with statement can do many different things and it's
purpose is to guarantee *without depending on garbage collection* that
the __exit__ method will be called. Yielding from the with statement
circumvents that guarantee.

>> I believe all possible cases where one would yield inside a context manager
>> can be covered by saving anything required from the context manager and then
>> yielding the results outside. Therefore, I propose making a "yield" inside a
>> with block become a SyntaxError.
>
> What about this:
>
>     def read_words(*filenames):
>          for filename in filenames:
>              with open(filename) as f:
>                  for line in f:
>                      yield from line.split()
>
> It'll read from a series of files and yield individual words (ignoring
> annoying little things like punctuation and hyphenation for the sake
> of simplicity). You are assuming that every yield-inside-with is a
> *single* yield.

Actually the general way to solve this problem is to move all context
managers out of iterators/generators. Instead of a helper generator
what you really want in this situation is a helper context manager. I
think the reason this isn't always used is just because it's a bit
harder to write a context manager than a generator e.g.:

from contextlib import contextmanager

@contextmanager
def open_cat_split(*filenames):
    f = None
    def get_words():
        nonlocal f
        for filename in filenames:
            with open(filename) as f:
                for line in f:
                    yield from line.split()
        else:
            f = None
    try:
        yield get_words()
    finally:
        if f is not None:
            f.close()

Then you can do:

with open_cat_split('file1.txt', 'file2.txt') as words:
    for word in words:
        print(word)

I am not sure exactly what the best primitive would be to make this
sort of thing easier. I most often see examples of this when someone
wants to process multiple files concatenated. The stdlib fileinput
module has a context manager that *almost* works:
https://docs.python.org/3.7/library/fileinput.html#fileinput.input

The corner case that isn't handled correctly by fileinput.input is
that when the list of filenames is empty it uses sys.argv or stdin
which is unlikely to be what someone wants in most of these
situations. You could fix that with

from contextlib import contextmanager
import fileinput

def open_cat(*filenames):
    if not filenames:
        @contextmanager
        def dummy():
            yield ()
        return dummy()
    else:
        return fileinput.input(filenames)

And with that you could do:

def map_split(strings):
    for string in strings:
        yield from string.split()

with open_cat('file1.txt', 'file2.txt') as lines:
    for word in map_split(lines):
        print(word)

Perhaps actually what is wanted is a more generic contextlib helper. I
guess ExitStack is intended for this sort of thing but it's a bit
unwieldy to use:
https://docs.python.org/3.7/library/contextlib.html#contextlib.ExitStack

We can use ExitStack to make something more convenient though:

from contextlib import ExitStack, contextmanager

@contextmanager
def map_with(func, iterable):
    stack = ExitStack()
    def g():
        for arg in iterable:
            stack.close()
            iterable_cm = func(arg)
            stack.push(iterable_cm)
            yield iterable_cm
    with stack:
        yield g()

The you can do something like:

def cat_split(files):
    for f in files:
        for line in f:
            yield from line.split()

with map_with(open, ['file1.txt', 'file2.txt']) as files:
    for word in cat_split(files):
        print(word)

--
Oscar