[pypy-dev] RFC: draft idea for making for loops automatically close iterators

hubo hubo at jiedaibao.com
Fri Oct 21 10:13:45 EDT 2016


Well I'm really shocked to find out what I thought was a "automatic close" is really the ref-couting GC of CPython, means that a lot of my code breaks in PyPy...
It really becomes a big problem after iterators heavily used in Python nowadays. Some builtin functions like zip, map, filter return iterators in Python 3 instead of lists in Python 2, means invisible bugs for code ported from Python 2, like zip(my_generator(), my_other_generator()) may leave the iterators open if exited from a for loop. Even in Python 2, functions in itertools may create these bugs.
In CPython, this kind of code will work because of the ref-counting GC, so it is not obvious in CPython, but they break in PyPy.

I'm wondering since a ref-counting GC implemention is not possible for PyPy, is it possible to hack on the for loop to make it "try to" collect the generator? That may really save a lot of lives. If the generator is still referenced after the for loop, it may be the programmer's fault for not calling close(), but loop through a returned value is something different - sometimes you even do not know if it is a generator.

2016-10-21 

hubo 



发件人:Armin Rigo <armin.rigo at gmail.com>
发送时间:2016-10-18 16:01
主题:Re: [pypy-dev] RFC: draft idea for making for loops automatically close iterators
收件人:"Nathaniel Smith"<njs at pobox.com>
抄送:"PyPy Developer Mailing List"<pypy-dev at python.org>

Hi, 

On 17 October 2016 at 10:08, Nathaniel Smith <njs at pobox.com> wrote: 
> thought I'd send around a draft to see what you think. (E.g., would 
> this be something that makes your life easier?) 

As a general rule, PyPy's GC behavior is similar to CPython's if we 
tweak the program to start a chain of references at a self-referential 
object.  So for example, consider that the outermost loop of a program 
takes the objects like the async generators, and stores them inside 
such an object: 

    class A: 
         def __init__(self, ref): 
              self.ref = ref 
              self.myself = self 

and then immediately forget that A instance.  Then both this A 
instance and everything it refers to is kept alive until the next 
cyclic GC occurs.  PyPy just always exhibits that behavior instead of 
only when you start with reference cycles. 

So the real issue should not be "how to so something that will make 
PyPy happy", or not only---it should be "how to do something that will 
make CPython happy even in case of reference cycles".  If you don't, 
then arguably CPython is slightly broken. 

Yes, anything that can reduce file descriptor leaks in Python sounds good to me. 


A bientôt, 

Armin. 
_______________________________________________ 
pypy-dev mailing list 
pypy-dev at python.org 
https://mail.python.org/mailman/listinfo/pypy-dev 
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/pypy-dev/attachments/20161021/2dd41f84/attachment-0001.html>


More information about the pypy-dev mailing list