[Python-ideas] Deterministic iterator cleanup

Fri Oct 21 07:13:45 EDT 2016

On Fri, Oct 21, 2016 at 11:07:46AM +0100, Paul Moore wrote:
> On 21 October 2016 at 10:53, Steven D'Aprano <steve at pearwood.info> wrote:
> > On Wed, Oct 19, 2016 at 12:33:57PM -0700, Nathaniel Smith wrote:
> >
> >> I should also say, regarding your specific example, I guess it's an
> >> open question whether we would want list_iterator.__iterclose__ to
> >> actually do anything. It could flip the iterator to a state where it
> >> always raises StopIteration,
> >
> > That seems like the most obvious.

I've changed my mind -- I think maybe it should do nothing, and preserve 
the current behaviour of lists.

I'm now more concerned with keeping current behaviour as much as 
possible than creating some sort of consistent error condition for all 
iterators. Consistency is over-rated, and we already have inconsistency 
here: file iterators behave differently from list iterators, because 
they can be closed:

py> f = open('/proc/mdstat', 'r')
py> a = list(f)
py> b = list(f)
py> len(a), len(b)
(20, 0)
py> f.close()
py> c = list(f)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
ValueError: I/O operation on closed file.

We don't need to add a close() to list iterators just so they are 
consistent with files. Just let __iterclose__ be a no-op.

> So - does this mean "unless you understand what preserve() does,
> you're OK to not use it and your code will continue to work as
> before"? If so, then I'd be happy with this.

Almost.

Code like this will behave exactly the same as it currently does:

for x in it:
    process(x)

y = list(it)

If it is a file object, the second call to list() will raise ValueError; 
if it is a list_iterator, or generator, etc., y will be an empty list.
That part (I think) shouldn't change.

What *will* change is code that partially processes the iterator in two 
different places. A simple example:

py> it = iter([1, 2, 3, 4, 5, 6])
py> for x in it:
...     if x == 4: break
...
py> for x in it:
...     print(x)
...
5
6

This *may* change. With this proposal, the first loop will "close" the 
iterator when you exit from the loop. For a list, there's no finaliser, 
no __del__ to call, so we can keep the current behaviour and nobody will 
notice any difference.

But if `it` is a file iterator instead of a list iterator, the file will 
be closed when you exit the first for-loop, and the second loop will 
raise ValueError. That will be different.

The fix here is simple: protect the first call from closing:

for x in itertools.preserve(it):  # preserve, protect, whatever
    ...

Or, if `it` is your own class, give it a __iterclose__ method that does 
nothing.

This is a backwards-incompatible change, so I think we would need to do 
this:

(1) In Python 3.7, we introduce a __future__ directive:

    from __future__ import iterclose

to enable the new behaviour. (Remember, future directives apply on a 
module-by-module basis.)

(2) Without the directive, we keep the old behaviour, except that 
warnings are raised if something will change.

(3) Then in 3.8 iterclose becomes the default, the warnings go away, and 
the new behaviour just happens.

If that's too fast for people, we could slow it down: 

(1) Add the future directive to Python 3.7;

(2) but no warnings by default (you have to opt-in to the 
warnings with an environment variable, or command-line switch).

(3) Then in 3.8 the warnings are on by default;

(4) And the iterclose behaviour doesn't become standard until 3.9.

That means if this change worries you, you can ignore it until you 
migrate to 3.8 (which won't be production-ready until about 2020 or so), 
and don't have to migrate your code until 3.9, which will be a year or 
two later. But early adopters can start targetting the new functionality 
from 3.7 if they like.

I don't think there's any need for a __future__ directive for 
aiterclose, since there's not enough backwards-incompatibility to care 
about. (I think, but don't mind if people disagree.) That can happen 
starting in 3.7, and when people complain that their syncronous 
generators don't have deterministic garbage collection like their 
asyncronous ones do, we can point them at the future directive.

Bottom line is: at first I thought this was a scary change that would 
break too much code. But now I think it won't break much, and we can 
ease into it really slowly over two or three releases. So I think that 
the cost is probably low. I'm still not sure on how great the benefit 
will be, but I'm leaning towards a +1 on this.

-- 
Steve