[pypy-dev] RFC: draft idea for making for loops automatically close iterators

Nathaniel Smith njs at pobox.com
Tue Oct 18 19:24:36 EDT 2016


Hi Oscar,

Thanks for the comments! Can I ask that you hold onto them until I
post to python-ideas, though? (Should be later today.) It's a
discussion worth having, but if we have it here then we'll just end up
having to repeat it there anyway :-).

-n

On Mon, Oct 17, 2016 at 5:04 AM, Oscar Benjamin
<oscar.j.benjamin at gmail.com> wrote:
> On 17 October 2016 at 09:08, Nathaniel Smith <njs at pobox.com> wrote:
>> Hi all,
>>
>> I've been poking at an idea for changing how 'for' loops work to
>> hopefully make them work better for pypy and async/await code. I
>> haven't taken it to python-ideas yet -- this is its first public
>> outing, actually -- but since it directly addresses pypy GC issues I
>> thought I'd send around a draft to see what you think. (E.g., would
>> this be something that makes your life easier?)
>
> To be clear, I'm not a PyPy dev so I'm just answering from a general
> Python perspective here.
>
>> Always inject resources, and do all cleanup at the top level
>> ------------------------------------------------------------
>>
>> It was suggested on python-dev (XX find link) that a pattern to avoid
>> these problems is to always pass resources in from above, e.g.
>> ``read_newline_separated_json`` should take a file object rather than
>> a path, with cleanup handled at the top level::
>
> I suggested this and I still think that it is the best idea.
>
>>   def read_newline_separated_json(file_handle):
>>       for line in file_handle:
>>           yield json.loads(line)
>>
>>   def read_users(file_handle):
>>       for document in read_newline_separated_json(file_handle):
>>           yield User.from_json(document)
>>
>>   with open(path) as file_handle:
>>       for user in read_users(file_handle):
>>           ...
>>
>> This works well in simple cases; here it lets us avoid the "N+1
>> problem". But unfortunately, it breaks down quickly when things get
>> more complex. Consider if instead of reading from a file, our
>> generator was processing the body returned by an HTTP GET request --
>> while handling redirects and authentication via OAUTH. Then we'd
>> really want the sockets to be managed down inside our HTTP client
>> library, not at the top level. Plus there are other cases where
>> ``finally`` blocks embedded inside generators are important in their
>> own right: db transaction management, emitting logging information
>> during cleanup (one of the major motivating use cases for WSGI
>> ``close``), and so forth.
>
> I haven't written the kind of code that you're describing so I can't
> say exactly how I would do it. I imagine though that helpers could be
> used to solve some of the problems that you're referring to though.
> Here's a case I do know where the above suggestion is awkward:
>
> def concat(filenames):
>     for filename in filenames:
>         with open(filename) as inputfile:
>             yield from inputfile
>
> for line in concat(filenames):
>     ...
>
> It's still possible to safely handle this use case by creating a
> helper though. fileinput.input almost does what you want:
>
> with fileinput.input(filenames) as lines:
>     for line in lines:
>         ...
>
> Unfortunately if filenames is empty this will default to sys.stdin so
> it's not perfect but really I think introducing useful helpers for
> common cases (rather than core language changes) should be considered
> as the obvious solution here. Generally it would have been better if
> the discussion for PEP 525 has focussed more on helping people to
> debug/fix dependence on __del__ rather than trying to magically fix
> broken code.
>
>> New convenience functions
>> -------------------------
>>
>> The ``itertools`` module gains a new iterator wrapper that can be used
>> to selectively disable the new ``__iterclose__`` behavior::
>>
>>   # XX FIXME: I feel like there might be a better name for this one?
>>   class protect(iterable):
>>       def __init__(self, iterable):
>>           self._it = iter(iterable)
>>
>>       def __iter__(self):
>>           return self
>>
>>       def __next__(self):
>>           return next(self._it)
>>
>>       def __iterclose__(self):
>>           # Swallow __iterclose__ without passing it on
>>           pass
>>
>> Example usage (assuming that file objects implements ``__iterclose__``)::
>>
>>   with open(...) as handle:
>>       # Iterate through the same file twice:
>>       for line in itertools.protect(handle):
>>           ...
>>       handle.seek(0)
>>       for line in itertools.protect(handle):
>>           ...
>
> It would be much simpler to reverse this suggestion and say let's
> introduce a helper that selectively *enables* the new behaviour you're
> proposing i.e.:
>
> for line in itertools.closeafter(open(...)):
>     ...
>     if not line.startswith('#'):
>         break  # <--------------- file gets closed here
>
> Then we can leave (async) for loops as they are and there are no
> backward compatbility problems etc.
>
> --
> Oscar
> _______________________________________________
> pypy-dev mailing list
> pypy-dev at python.org
> https://mail.python.org/mailman/listinfo/pypy-dev



-- 
Nathaniel J. Smith -- https://vorpus.org


More information about the pypy-dev mailing list