[Python-ideas] Deterministic iterator cleanup

Fri Oct 21 02:03:11 EDT 2016

On Wed, Oct 19, 2016 at 3:07 PM, Paul Moore <p.f.moore at gmail.com> wrote:
> On 19 October 2016 at 20:21, Nathaniel Smith <njs at pobox.com> wrote:
>> On Wed, Oct 19, 2016 at 11:38 AM, Paul Moore <p.f.moore at gmail.com> wrote:
>>> On 19 October 2016 at 19:13, Chris Angelico <rosuav at gmail.com> wrote:
>>>> Now it *won't* correctly call the end-of-iteration function, because
>>>> there's no 'for' loop. This is going to either (a) require that EVERY
>>>> consumer of an iterator follow this new protocol, or (b) introduce a
>>>> ton of edge cases.
>>>
>>> Also, unless I'm misunderstanding the proposal, there's a fairly major
>>> compatibility break. At present we have:
>>>
>>>>>> lst = [1,2,3,4]
>>>>>> it = iter(lst)
>>>>>> for i in it:
>>> ...   if i == 2: break
>>>
>>>>>> for i in it:
>>> ...   print(i)
>>> 3
>>> 4
>>>>>>
>>>
>>> With the proposed behaviour, if I understand it, "it" would be closed
>>> after the first loop, so resuming "it" for the second loop wouldn't
>>> work. Am I right in that? I know there's a proposed itertools function
>>> to bring back the old behaviour, but it's still a compatibility break.
>>> And code like this, that partially consumes an iterator, is not
>>> uncommon.
>>
>> Right -- did you reach the "transition plan" section? (I know it's
>> wayyy down there.) The proposal is to hide this behind a __future__ at
>> first + a mechanism during the transition period to catch code that
>> depends on the old behavior and issue deprecation warnings. But it is
>> a compatibility break, yes.
>
> I missed that you propose phasing this in, but it doesn't really alter
> much, I think the current behaviour is valuable and common, and I'm -1
> on breaking it. It's just too much of a fundamental change to how
> loops and iterators interact for me to be comfortable with it -
> particularly as it's only needed for a very specific use case (none of
> my programs ever use async - why should I have to rewrite my loops
> with a clumsy extra call just to cater for a problem that only occurs
> in async code?)
>
> IMO, and I'm sorry if this is controversial, there's a *lot* of new
> language complexity that's been introduced for the async use case, and
> it's only the fact that it can be pretty much ignored by people who
> don't need or use async features that makes it acceptable (the "you
> don't pay for what you don't use" principle). The problem with this
> proposal is that it doesn't conform to that principle - it has a
> direct, negative impact on users who have no interest in async.

Oh, goodness, no -- like Yury said, the use cases here are not
specific to async at all. I mean, none of the examples are async even
:-).

The motivation here is that prompt (non-GC-dependent) cleanup is a
good thing for a variety of reasons: determinism, portability across
Python implementations, proper exception propagation, etc. async does
add yet another entry to this list, but I don't the basic principle is
controversial. 'with' blocks are a whole chunk of extra syntax that
were added to the language just for this use case. In fact 'with'
blocks weren't even needed for the functionality -- we already had
'try/finally', they just weren't ergonomic enough. This use case is so
important that it's had multiple rounds of syntax directed at it
before async/await was even a glimmer in C#'s eye :-).

BUT, currently, 'with' and 'try/finally' have a gap: if you use them
inside a generator (async or not, doesn't matter), then they often
fail at accomplishing their core purpose. Sure, they'll execute their
cleanup code whenever the generator is cleaned up, but there's no
ergonomic way to clean up the generator. Oops. I mean, you *could*
respond by saying "you should never use 'with' or 'try/finally' inside
a generator" and maybe add that as a rule to your style manual and
linter -- and some people in this thread have suggested more-or-less
that -- but that seems like a step backwards. This proposal instead
tries to solve the problem of making 'with'/'try/finally' work and be
ergonomic in general, and it should be evaluated on that basis, not on
the async/await stuff.

The reason I'm emphasizing async generators is that they effect the
timeline, not the motivation:

- PEP 525 actually does add async-only complexity to the language (the
new GC hooks). It doesn't affect non-async users, but it is still
complexity. And it's possible that if we have iterclose, then we don't
need the new GC hooks (though this is still an open discussion :-)).
If this is true, then now is the time to act, while reverting the GC
hooks change is still a possibility; otherwise, we risk the situation
where we add iterclose later, decide that the GC hooks no longer
provide enough additional value to justify their complexity... but
we're stuck with them anyway.

- For synchronous iteration, the need for a transition period means
that the iterclose proposal will take a few years to provide benefits.
For asynchronous iteration, it could potentially start providing
benefits much sooner -- but there's a very narrow window for that,
before people start using async generators and backwards compatibility
constraints kick in. If we delay a few months then we'll probably have
to delay a few years.

...that said, I guess there is one way that async/await directly
affected my motivation here, though it's not what you think :-).
async/await have gotten me experimenting with writing network servers,
and let me tell you, there is nothing that focuses the mind on
correctness and simplicity like trying to write a public-facing
asynchronous network server. You might think "oh well if you're trying
to do some fancy rocket science and this is a feature for rocket
scientists then that's irrelevant to me", but that's actually not what
I mean at all. The rocket science part is like, trying to run through
all possible execution orders of the different callbacks in your head,
or to mentally simulate what happens if a client shows up that writes
at 1 byte/second. When I'm trying to do that,then the last thing I
want is be distracted by also trying to figure out boring mechanical
stuff like whether or not the language is actually going to execute my
'finally' block -- yet right now that's a question that actually
cannot be answered without auditing my whole source code! And that
boring mechanical stuff is still boring mechanical stuff when writing
less terrifying code -- it's just that I'm so used to wasting a
trickle of cognitive energy on this kind of thing it that normally I
don't notice it so much.

And, also, regarding the "clumsy extra call": the preserve() call
isn't just arbitrary clumsiness -- it's a signal that hey, you're
turning off a safety feature. Now the language won't take care of this
cleanup for you, so it's your responsibility. Maybe you should think
about how you want to handle that. Of course your decision could be
"whatever, this is a one-off script, the GC is good enough". But it's
probably worth the ~0.5 seconds of thought to make that an active,
conscious decision, because they aren't all one-off scripts.

-n

-- 
Nathaniel J. Smith -- https://vorpus.org