[Python-Dev] PEP 525, third round, better finalization
Nick Coghlan
ncoghlan at gmail.com
Sat Sep 3 15:15:01 EDT 2016
On 4 September 2016 at 04:38, Oscar Benjamin <oscar.j.benjamin at gmail.com> wrote:
> On 3 September 2016 at 16:42, Nick Coghlan <ncoghlan at gmail.com> wrote:
>> On 2 September 2016 at 19:13, Nathaniel Smith <njs at pobox.com> wrote:
>>> This works OK on CPython because the reference-counting gc will call
>>> handle.__del__() at the end of the scope (so on CPython it's at level
>>> 2), but it famously causes huge problems when porting to PyPy with
>>> it's much faster and more sophisticated gc that only runs when
>>> triggered by memory pressure. (Or for "PyPy" you can substitute
>>> "Jython", "IronPython", whatever.) Technically this code doesn't
>>> actually "leak" file descriptors on PyPy, because handle.__del__()
>>> will get called *eventually* (this code is at level 1, not level 0),
>>> but by the time "eventually" arrives your server process has probably
>>> run out of file descriptors and crashed. Level 1 isn't good enough. So
>>> now we have all learned to instead write
> ...
>>> BUT, with the current PEP 525 proposal, trying to use this generator
>>> in this way is exactly analogous to the open(path).read() case: on
>>> CPython it will work fine -- the generator object will leave scope at
>>> the end of the 'async for' loop, cleanup methods will be called, etc.
>>> But on PyPy, the weakref callback will not be triggered until some
>>> arbitrary time later, you will "leak" file descriptors, and your
>>> server will crash.
>>
>> That suggests the PyPy GC should probably be tracking pressure on more
>> resources than just memory when deciding whether or not to trigger a
>> GC run.
>
> PyPy's GC is conformant to the language spec
The language spec doesn't say anything about what triggers GC cycles -
that's purely a decision for runtime implementors based on the
programming experience they want to provide their users.
CPython runs GC pretty eagerly, with it being immediate when the
automatic reference counting is sufficient and the cyclic GC doesn't
have to get involved at all.
If I understand correctly, PyPy currently decides whether or not to
trigger a GC cycle based primarily on memory pressure, even though the
uncollected garbage may also be holding on to system resources other
than memory (like file descriptors).
For synchronous code, that's a relatively easy burden to push back
onto the programmer - assuming fair thread scheduling, a with
statement can ensure reliably ensure prompt resource cleanup.
That assurance goes out the window as soon as you explicitly pause
code execution inside the body of the with statement - it doesn't
matter whether its via yield, yield from, or await, you've completely
lost that assurance of immediacy.
At that point, even CPython doesn't ensure prompt release of resources
- it just promises to try to clean things up as soon as it can and as
best it can (which is usually pretty soon and pretty well, with recent
iterations of 3.x, but event loops will still happily keep things
alive indefinitely if they're waiting for events that never happen).
For synchronous generators, you can make your API a bit more
complicated, and ask your caller to handle the manual resource
management, but you may not want to do that.
The asynchronous case is even worse though, as there, you often simply
can't readily push the burden back onto the programmer, because the
code is *meant* to be waiting for events and reacting to them, rather
than proceeding deterministically from beginning to end.
So while it's good that PEP 492 and 525 attempt to adapt synchronous
resource management models to the asynchronous world, it's also
important to remember that there's a fundamental mismatch of
underlying concepts when it comes to trying to pair up deterministic
resource management with asynchronous code - you're often going to
want to tip the model on its side and set up a dedicated resource
manager that other components can interact with, and then have the
resource manager take care of promptly releasing the resources when
the other components go away (perhaps with notions of leases and lease
renewals if you simply cannot afford unexpected delays in resources
being released).
Cheers,
Nick.
--
Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia
More information about the Python-Dev
mailing list