[pypy-dev] RFC: draft idea for making for loops automatically close iterators
Nathaniel Smith
njs at pobox.com
Tue Oct 18 19:24:36 EDT 2016
Hi Oscar,
Thanks for the comments! Can I ask that you hold onto them until I
post to python-ideas, though? (Should be later today.) It's a
discussion worth having, but if we have it here then we'll just end up
having to repeat it there anyway :-).
-n
On Mon, Oct 17, 2016 at 5:04 AM, Oscar Benjamin
<oscar.j.benjamin at gmail.com> wrote:
> On 17 October 2016 at 09:08, Nathaniel Smith <njs at pobox.com> wrote:
>> Hi all,
>>
>> I've been poking at an idea for changing how 'for' loops work to
>> hopefully make them work better for pypy and async/await code. I
>> haven't taken it to python-ideas yet -- this is its first public
>> outing, actually -- but since it directly addresses pypy GC issues I
>> thought I'd send around a draft to see what you think. (E.g., would
>> this be something that makes your life easier?)
>
> To be clear, I'm not a PyPy dev so I'm just answering from a general
> Python perspective here.
>
>> Always inject resources, and do all cleanup at the top level
>> ------------------------------------------------------------
>>
>> It was suggested on python-dev (XX find link) that a pattern to avoid
>> these problems is to always pass resources in from above, e.g.
>> ``read_newline_separated_json`` should take a file object rather than
>> a path, with cleanup handled at the top level::
>
> I suggested this and I still think that it is the best idea.
>
>> def read_newline_separated_json(file_handle):
>> for line in file_handle:
>> yield json.loads(line)
>>
>> def read_users(file_handle):
>> for document in read_newline_separated_json(file_handle):
>> yield User.from_json(document)
>>
>> with open(path) as file_handle:
>> for user in read_users(file_handle):
>> ...
>>
>> This works well in simple cases; here it lets us avoid the "N+1
>> problem". But unfortunately, it breaks down quickly when things get
>> more complex. Consider if instead of reading from a file, our
>> generator was processing the body returned by an HTTP GET request --
>> while handling redirects and authentication via OAUTH. Then we'd
>> really want the sockets to be managed down inside our HTTP client
>> library, not at the top level. Plus there are other cases where
>> ``finally`` blocks embedded inside generators are important in their
>> own right: db transaction management, emitting logging information
>> during cleanup (one of the major motivating use cases for WSGI
>> ``close``), and so forth.
>
> I haven't written the kind of code that you're describing so I can't
> say exactly how I would do it. I imagine though that helpers could be
> used to solve some of the problems that you're referring to though.
> Here's a case I do know where the above suggestion is awkward:
>
> def concat(filenames):
> for filename in filenames:
> with open(filename) as inputfile:
> yield from inputfile
>
> for line in concat(filenames):
> ...
>
> It's still possible to safely handle this use case by creating a
> helper though. fileinput.input almost does what you want:
>
> with fileinput.input(filenames) as lines:
> for line in lines:
> ...
>
> Unfortunately if filenames is empty this will default to sys.stdin so
> it's not perfect but really I think introducing useful helpers for
> common cases (rather than core language changes) should be considered
> as the obvious solution here. Generally it would have been better if
> the discussion for PEP 525 has focussed more on helping people to
> debug/fix dependence on __del__ rather than trying to magically fix
> broken code.
>
>> New convenience functions
>> -------------------------
>>
>> The ``itertools`` module gains a new iterator wrapper that can be used
>> to selectively disable the new ``__iterclose__`` behavior::
>>
>> # XX FIXME: I feel like there might be a better name for this one?
>> class protect(iterable):
>> def __init__(self, iterable):
>> self._it = iter(iterable)
>>
>> def __iter__(self):
>> return self
>>
>> def __next__(self):
>> return next(self._it)
>>
>> def __iterclose__(self):
>> # Swallow __iterclose__ without passing it on
>> pass
>>
>> Example usage (assuming that file objects implements ``__iterclose__``)::
>>
>> with open(...) as handle:
>> # Iterate through the same file twice:
>> for line in itertools.protect(handle):
>> ...
>> handle.seek(0)
>> for line in itertools.protect(handle):
>> ...
>
> It would be much simpler to reverse this suggestion and say let's
> introduce a helper that selectively *enables* the new behaviour you're
> proposing i.e.:
>
> for line in itertools.closeafter(open(...)):
> ...
> if not line.startswith('#'):
> break # <--------------- file gets closed here
>
> Then we can leave (async) for loops as they are and there are no
> backward compatbility problems etc.
>
> --
> Oscar
> _______________________________________________
> pypy-dev mailing list
> pypy-dev at python.org
> https://mail.python.org/mailman/listinfo/pypy-dev
--
Nathaniel J. Smith -- https://vorpus.org
More information about the pypy-dev
mailing list