[Python-Dev] PEP 525, third round, better finalization

Oscar Benjamin oscar.j.benjamin at gmail.com
Sat Sep 3 14:38:15 EDT 2016


On 3 September 2016 at 16:42, Nick Coghlan <ncoghlan at gmail.com> wrote:
> On 2 September 2016 at 19:13, Nathaniel Smith <njs at pobox.com> wrote:
>> This works OK on CPython because the reference-counting gc will call
>> handle.__del__() at the end of the scope (so on CPython it's at level
>> 2), but it famously causes huge problems when porting to PyPy with
>> it's much faster and more sophisticated gc that only runs when
>> triggered by memory pressure. (Or for "PyPy" you can substitute
>> "Jython", "IronPython", whatever.) Technically this code doesn't
>> actually "leak" file descriptors on PyPy, because handle.__del__()
>> will get called *eventually* (this code is at level 1, not level 0),
>> but by the time "eventually" arrives your server process has probably
>> run out of file descriptors and crashed. Level 1 isn't good enough. So
>> now we have all learned to instead write
...
>> BUT, with the current PEP 525 proposal, trying to use this generator
>> in this way is exactly analogous to the open(path).read() case: on
>> CPython it will work fine -- the generator object will leave scope at
>> the end of the 'async for' loop, cleanup methods will be called, etc.
>> But on PyPy, the weakref callback will not be triggered until some
>> arbitrary time later, you will "leak" file descriptors, and your
>> server will crash.
>
> That suggests the PyPy GC should probably be tracking pressure on more
> resources than just memory when deciding whether or not to trigger a
> GC run.

PyPy's GC is conformant to the language spec AFAICT:
https://docs.python.org/3/reference/datamodel.html#object.__del__

"""
object.__del__(self)

Called when the instance is about to be destroyed. This is also called
a destructor. If a base class has a __del__() method, the derived
class’s __del__() method, if any, must explicitly call it to ensure
proper deletion of the base class part of the instance. Note that it
is possible (though not recommended!) for the __del__() method to
postpone destruction of the instance by creating a new reference to
it. It may then be called at a later time when this new reference is
deleted. It is not guaranteed that __del__() methods are called for
objects that still exist when the interpreter exits.
"""

Note the last sentence. It is also not guaranteed (across different
Python implementations and regardless of the CPython-specific notes in
the docs) that any particular object will cease to exist before the
interpreter exits. Taken together these two imply that it is not
guaranteed that *any* __del__ method will ever be called.

Antoine's excellent work in PEP 442 has improved the situation with
CPython but the language spec (covering all implementations) remains
the same and changing that requires a new PEP and coordination with
other implementations. Without changing it is a mistake to base a new
core language feature (async finalisation) on CPython-specific
implementation details. Already using with (or try/finally etc.)
inside a generator function behaves differently under PyPy:

$ cat gentest.py

def generator_needs_finalisation():
    try:
        for n in range(10):
            yield n
    finally:
        print('Doing important cleanup')

for obj in generator_needs_finalisation():
    if obj == 5:
        break

print('Process exit')

$ python gentest.py
Doing important cleanup
Process exit

So here the cleanup is triggered by the reference count of the
generator falling at the break statement. Under CPython this
corresponds to Nathaniel's "level 2" cleanup. If we keep another
reference around it gets done at process exit:

$ cat gentest2.py

def generator_needs_finalisation():
    try:
        for n in range(10):
            yield n
    finally:
        print('Doing important cleanup')

gen = generator_needs_finalisation()
for obj in gen:
    if obj == 5:
        break

print('Process exit')

$ python gentest2.py
Process exit
Doing important cleanup

So that's Nathaniel's "level 1" cleanup. However if you run either of
these scripts under PyPy the cleanup simply won't occur (i.e. "level
0" cleanup):

$ pypy gentest.py
Process exit
$ pypy gentest2.py
Process exit

I don't think PyPy is in breach of the language spec here. Python made
a decision a long time ago to shun RAII-style implicit cleanup in
favour if with-style explicit cleanup.

The solution to this problem is to move resource management outside of
the generator functions. This is true for ordinary generators without
an event-loop etc. The example in the PEP is

async def square_series(con, to):
    async with con.transaction():
        cursor = con.cursor(
            'SELECT generate_series(0, $1) AS i', to)
        async for row in cursor:
            yield row['i'] ** 2

async for i in square_series(con, 1000):
    if i == 100:
        break

The normal generator equivalent of this is:

def square_series(con, to):
    with con.transaction():
        cursor = con.cursor(
            'SELECT generate_series(0, $1) AS i', to)
        for row in cursor:
            yield row['i'] ** 2

This code is already broken: move the with statement outside to the
caller of the generator function.

Going back to Nathaniel's example:

 def get_file_contents(path):
      with open(path) as handle:
          return handle.read()

Nick wants it to be generator function so we don't have to load the
whole file into memory i.e.:

 def get_file_lines(path):
      with open(path) as handle:
          yield from handle

However this is now broken if the iterator is not fully consumed:

for line in get_file_lines(path):
    if line.startswith('#'):
        break

The answer is to move the with statement outside and pass the handle
into your generator function:

def get_file_lines(handle):
    yield from handle

with open(path) as handle:
    for line in get_file_lines(handle):
    if line.startswith('#'):
        break

Of course in this case get_file_lines is trivial and can be omitted
but this fix works more generally in the case that get_file_lines
actually does some processing on the lines of the file: move the with
statement outside and turn the generator function into an
iterator-style filter.

--
Oscar


More information about the Python-Dev mailing list