[Python-ideas] Deterministic iterator cleanup
Paul Moore
p.f.moore at gmail.com
Fri Oct 21 06:03:51 EDT 2016
On 21 October 2016 at 07:03, Nathaniel Smith <njs at pobox.com> wrote:
> Oh, goodness, no -- like Yury said, the use cases here are not
> specific to async at all. I mean, none of the examples are async even
> :-).
[...]
Ah I follow now. Sorry for the misunderstanding, I'd skimmed a bit
more than I realised I had.
However, it still feels to me that the code I currently write doesn't
need this feature, and I'm therefore unclear as to why it's
sufficiently important to warrant a backward compatibility break.
It's quite possible that I've never analysed my code well enough to
*notice* that there's a problem. Or that I rely on CPython's GC
behaviour without realising it. Also, it's honestly very rare that I
need deterministic cleanup, as opposed to guaranteed cleanup - running
out of file handles, for example, isn't really a problem I encounter.
But it's also possible that it's a code design difference. You use the
example (from memory, sorry if this is slightly different to what you
wrote):
def filegen(filename):
with open(filename) as f:
for line in f:
yield line
# caller
for line in filegen(name):
...
I wouldn't normally write a function like that - I'd factor it
differently, with the generator taking an open file (or file-like
object) and the caller opening the file:
def filegen(fd):
for line in f:
yield line
# caller
with open(filename) as fd:
for line in filegen(fd):
...
With that pattern, there's no issue. And the filegen function is more
generic, as it can be used with *any* file-like object (a StringIO,
for testing, for example).
> And, also, regarding the "clumsy extra call": the preserve() call
> isn't just arbitrary clumsiness -- it's a signal that hey, you're
> turning off a safety feature. Now the language won't take care of this
> cleanup for you, so it's your responsibility. Maybe you should think
> about how you want to handle that. Of course your decision could be
> "whatever, this is a one-off script, the GC is good enough". But it's
> probably worth the ~0.5 seconds of thought to make that an active,
> conscious decision, because they aren't all one-off scripts.
Well, if preserve() did mean just that, then that would be OK. I'd
never use it, as I don't care about deterministic cleanup, so it makes
no difference to me if it's on or off.
But that's not the case - in fact, preserve() means "give me the old
Python 3.5 behaviour", and (because deterministic cleanup isn't
important to me) that's a vague and unclear distinction. So I don't
know whether my code is affected by the behaviour change and I have to
guess at whether I need preserve().
What I think is needed here is a clear explanation of how this
proposal affects existing code that *doesn't* need or care about
cleanup. The example that's been mentioned is
with open(filename) as f:
for line in f:
if is_end_of_header(line): break
process_header(line)
for line in f:
process_body(line)
and similar code that relies on being able to part-process an iterator
in a for loop, and then have a later loop pick up where the first left
off.
Most users of iterators and generators probably have little
understanding of GeneratorExit, closing generators, etc. And that's a
good thing - it's why iterators in Python are so useful. So the
proposal needs to explain how it impacts that sort of user, in terms
that they understand. It's a real pity that the explanation isn't "you
can ignore all of this, as you aren't affected by the problem it's
trying to solve" - that's what I was getting at.
At the moment, the take home message for such users feels like it's
"you might need to scatter preserve() around your code, to avoid the
behaviour change described above, which you glazed over because it
talked about all that coroutiney stuff you don't understand" :-)
Paul
Paul
More information about the Python-ideas
mailing list