[Python-ideas] Deterministic iterator cleanup
Steven D'Aprano
steve at pearwood.info
Fri Oct 21 07:13:45 EDT 2016
On Fri, Oct 21, 2016 at 11:07:46AM +0100, Paul Moore wrote:
> On 21 October 2016 at 10:53, Steven D'Aprano <steve at pearwood.info> wrote:
> > On Wed, Oct 19, 2016 at 12:33:57PM -0700, Nathaniel Smith wrote:
> >
> >> I should also say, regarding your specific example, I guess it's an
> >> open question whether we would want list_iterator.__iterclose__ to
> >> actually do anything. It could flip the iterator to a state where it
> >> always raises StopIteration,
> >
> > That seems like the most obvious.
I've changed my mind -- I think maybe it should do nothing, and preserve
the current behaviour of lists.
I'm now more concerned with keeping current behaviour as much as
possible than creating some sort of consistent error condition for all
iterators. Consistency is over-rated, and we already have inconsistency
here: file iterators behave differently from list iterators, because
they can be closed:
py> f = open('/proc/mdstat', 'r')
py> a = list(f)
py> b = list(f)
py> len(a), len(b)
(20, 0)
py> f.close()
py> c = list(f)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
ValueError: I/O operation on closed file.
We don't need to add a close() to list iterators just so they are
consistent with files. Just let __iterclose__ be a no-op.
> So - does this mean "unless you understand what preserve() does,
> you're OK to not use it and your code will continue to work as
> before"? If so, then I'd be happy with this.
Almost.
Code like this will behave exactly the same as it currently does:
for x in it:
process(x)
y = list(it)
If it is a file object, the second call to list() will raise ValueError;
if it is a list_iterator, or generator, etc., y will be an empty list.
That part (I think) shouldn't change.
What *will* change is code that partially processes the iterator in two
different places. A simple example:
py> it = iter([1, 2, 3, 4, 5, 6])
py> for x in it:
... if x == 4: break
...
py> for x in it:
... print(x)
...
5
6
This *may* change. With this proposal, the first loop will "close" the
iterator when you exit from the loop. For a list, there's no finaliser,
no __del__ to call, so we can keep the current behaviour and nobody will
notice any difference.
But if `it` is a file iterator instead of a list iterator, the file will
be closed when you exit the first for-loop, and the second loop will
raise ValueError. That will be different.
The fix here is simple: protect the first call from closing:
for x in itertools.preserve(it): # preserve, protect, whatever
...
Or, if `it` is your own class, give it a __iterclose__ method that does
nothing.
This is a backwards-incompatible change, so I think we would need to do
this:
(1) In Python 3.7, we introduce a __future__ directive:
from __future__ import iterclose
to enable the new behaviour. (Remember, future directives apply on a
module-by-module basis.)
(2) Without the directive, we keep the old behaviour, except that
warnings are raised if something will change.
(3) Then in 3.8 iterclose becomes the default, the warnings go away, and
the new behaviour just happens.
If that's too fast for people, we could slow it down:
(1) Add the future directive to Python 3.7;
(2) but no warnings by default (you have to opt-in to the
warnings with an environment variable, or command-line switch).
(3) Then in 3.8 the warnings are on by default;
(4) And the iterclose behaviour doesn't become standard until 3.9.
That means if this change worries you, you can ignore it until you
migrate to 3.8 (which won't be production-ready until about 2020 or so),
and don't have to migrate your code until 3.9, which will be a year or
two later. But early adopters can start targetting the new functionality
from 3.7 if they like.
I don't think there's any need for a __future__ directive for
aiterclose, since there's not enough backwards-incompatibility to care
about. (I think, but don't mind if people disagree.) That can happen
starting in 3.7, and when people complain that their syncronous
generators don't have deterministic garbage collection like their
asyncronous ones do, we can point them at the future directive.
Bottom line is: at first I thought this was a scary change that would
break too much code. But now I think it won't break much, and we can
ease into it really slowly over two or three releases. So I think that
the cost is probably low. I'm still not sure on how great the benefit
will be, but I'm leaning towards a +1 on this.
--
Steve
More information about the Python-ideas
mailing list