[pypy-dev] RFC: draft idea for making for loops automatically close iterators

Mon Oct 17 08:04:46 EDT 2016

On 17 October 2016 at 09:08, Nathaniel Smith <njs at pobox.com> wrote:
> Hi all,
>
> I've been poking at an idea for changing how 'for' loops work to
> hopefully make them work better for pypy and async/await code. I
> haven't taken it to python-ideas yet -- this is its first public
> outing, actually -- but since it directly addresses pypy GC issues I
> thought I'd send around a draft to see what you think. (E.g., would
> this be something that makes your life easier?)

To be clear, I'm not a PyPy dev so I'm just answering from a general
Python perspective here.

> Always inject resources, and do all cleanup at the top level
> ------------------------------------------------------------
>
> It was suggested on python-dev (XX find link) that a pattern to avoid
> these problems is to always pass resources in from above, e.g.
> ``read_newline_separated_json`` should take a file object rather than
> a path, with cleanup handled at the top level::

I suggested this and I still think that it is the best idea.

>   def read_newline_separated_json(file_handle):
>       for line in file_handle:
>           yield json.loads(line)
>
>   def read_users(file_handle):
>       for document in read_newline_separated_json(file_handle):
>           yield User.from_json(document)
>
>   with open(path) as file_handle:
>       for user in read_users(file_handle):
>           ...
>
> This works well in simple cases; here it lets us avoid the "N+1
> problem". But unfortunately, it breaks down quickly when things get
> more complex. Consider if instead of reading from a file, our
> generator was processing the body returned by an HTTP GET request --
> while handling redirects and authentication via OAUTH. Then we'd
> really want the sockets to be managed down inside our HTTP client
> library, not at the top level. Plus there are other cases where
> ``finally`` blocks embedded inside generators are important in their
> own right: db transaction management, emitting logging information
> during cleanup (one of the major motivating use cases for WSGI
> ``close``), and so forth.

I haven't written the kind of code that you're describing so I can't
say exactly how I would do it. I imagine though that helpers could be
used to solve some of the problems that you're referring to though.
Here's a case I do know where the above suggestion is awkward:

def concat(filenames):
    for filename in filenames:
        with open(filename) as inputfile:
            yield from inputfile

for line in concat(filenames):
    ...

It's still possible to safely handle this use case by creating a
helper though. fileinput.input almost does what you want:

with fileinput.input(filenames) as lines:
    for line in lines:
        ...

Unfortunately if filenames is empty this will default to sys.stdin so
it's not perfect but really I think introducing useful helpers for
common cases (rather than core language changes) should be considered
as the obvious solution here. Generally it would have been better if
the discussion for PEP 525 has focussed more on helping people to
debug/fix dependence on __del__ rather than trying to magically fix
broken code.

> New convenience functions
> -------------------------
>
> The ``itertools`` module gains a new iterator wrapper that can be used
> to selectively disable the new ``__iterclose__`` behavior::
>
>   # XX FIXME: I feel like there might be a better name for this one?
>   class protect(iterable):
>       def __init__(self, iterable):
>           self._it = iter(iterable)
>
>       def __iter__(self):
>           return self
>
>       def __next__(self):
>           return next(self._it)
>
>       def __iterclose__(self):
>           # Swallow __iterclose__ without passing it on
>           pass
>
> Example usage (assuming that file objects implements ``__iterclose__``)::
>
>   with open(...) as handle:
>       # Iterate through the same file twice:
>       for line in itertools.protect(handle):
>           ...
>       handle.seek(0)
>       for line in itertools.protect(handle):
>           ...

It would be much simpler to reverse this suggestion and say let's
introduce a helper that selectively *enables* the new behaviour you're
proposing i.e.:

for line in itertools.closeafter(open(...)):
    ...
    if not line.startswith('#'):
        break  # <--------------- file gets closed here

Then we can leave (async) for loops as they are and there are no
backward compatbility problems etc.

-- 
Oscar