PEP 442: Safe object finalization
Hello, I would like to submit the following PEP for discussion and evaluation. Regards Antoine. PEP: 442 Title: Safe object finalization Version: $Revision$ Last-Modified: $Date$ Author: Antoine Pitrou <solipsis@pitrou.net> Status: Draft Type: Standards Track Content-Type: text/x-rst Created: 2013-05-18 Python-Version: 3.4 Post-History: Resolution: TBD Abstract ======== This PEP proposes to deal with the current limitations of object finalization. The goal is to be able to define and run finalizers for any object, regardless of their position in the object graph. This PEP doesn't call for any change in Python code. Objects with existing finalizers will benefit automatically. Definitions =========== Reference A directional link from an object to another. The target of the reference is kept alive by the reference, as long as the source is itself alive and the reference isn't cleared. Weak reference A directional link from an object to another, which doesn't keep alive its target. This PEP focusses on non-weak references. Reference cycle A cyclic subgraph of directional links between objects, which keeps those objects from being collected in a pure reference-counting scheme. Cyclic isolate (CI) A reference cycle in which no object is referenced from outside the cycle *and* whose objects are still in a usable, non-broken state: they can access each other from their respective finalizers. Cyclic garbage collector (GC) A device able to detect cyclic isolates and turn them into cyclic trash. Objects in cyclic trash are eventually disposed of by the natural effect of the references being cleared and their reference counts dropping to zero. Cyclic trash (CT) A reference cycle, or former reference cycle, in which no object is referenced from outside the cycle *and* whose objects have started being cleared by the GC. Objects in cyclic trash are potential zombies; if they are accessed by Python code, the symptoms can vary from weird AttributeErrors to crashes. Zombie / broken object An object part of cyclic trash. The term stresses that the object is not safe: its outgoing references may have been cleared, or one of the objects it references may be zombie. Therefore, it should not be accessed by arbitrary code (such as finalizers). Finalizer A function or method called when an object is intended to be disposed of. The finalizer can access the object and release any resource held by the object (for example mutexes or file descriptors). An example is a ``__del__`` method. Resurrection The process by which a finalizer creates a new reference to an object in a CI. This can happen as a quirky but supported side-effect of ``__del__`` methods. Impact ====== While this PEP discusses CPython-specific implementation details, the change in finalization semantics is expected to affect the Python ecosystem as a whole. In particular, this PEP obsoletes the current guideline that "objects with a ``__del__`` method should not be part of a reference cycle". Benefits ======== The primary benefits of this PEP regard objects with finalizers, such as objects with a ``__del__`` method and generators with a ``finally`` block. Those objects can now be reclaimed when they are part of a reference cycle. The PEP also paves the way for further benefits: * The module shutdown procedure may not need to set global variables to None anymore. This could solve a well-known class of irritating issues. The PEP doesn't change the semantics of: * Weak references caught in reference cycles. * C extension types with a custom ``tp_dealloc`` function. Description =========== Reference-counted disposal -------------------------- In normal reference-counted disposal, an object's finalizer is called just before the object is deallocated. If the finalizer resurrects the object, deallocation is aborted. *However*, if the object was already finalized, then the finalizer isn't called. This prevents us from finalizing zombies (see below). Disposal of cyclic isolates --------------------------- Cyclic isolates are first detected by the garbage collector, and then disposed of. The detection phase doesn't change and won't be described here. Disposal of a CI traditionally works in the following order: 1. Weakrefs to CI objects are cleared, and their callbacks called. At this point, the objects are still safe to use. 2. The CI becomes a CT as the GC systematically breaks all known references inside it (using the ``tp_clear`` function). 3. Nothing. All CT objects should have been disposed of in step 2 (as a side-effect of clearing references); this collection is finished. This PEP proposes to turn CI disposal into the following sequence (new steps are in bold): 1. Weakrefs to CI objects are cleared, and their callbacks called. At this point, the objects are still safe to use. 2. **The finalizers of all CI objects are called.** 3. **The CI is traversed again to determine if it is still isolated. If it is determined that at least one object in CI is now reachable from outside the CI, this collection is aborted and the whole CI is resurrected. Otherwise, proceed.** 4. The CI becomes a CT as the GC systematically breaks all known references inside it (using the ``tp_clear`` function). 5. Nothing. All CT objects should have been disposed of in step 4 (as a side-effect of clearing references); this collection is finished. C-level changes =============== Type objects get a new ``tp_finalize`` slot to which ``__del__`` methods are bound. Generators are also modified to use this slot, rather than ``tp_del``. At the C level, a ``tp_finalize`` function is a normal function which will be called with a regular, alive object as its only argument. It should not attempt to revive or collect the object. For compatibility, ``tp_del`` is kept in the type structure. Handling of objects with a non-NULL ``tp_del`` is unchanged: when part of a CI, they are not finalized and end up in ``gc.garbage``. However, a non-NULL ``tp_del`` is not encountered anymore in the CPython source tree (except for testing purposes). On the internal side, a bit is reserved in the GC header for GC-managed objects to signal that they were finalized. This helps avoid finalizing an object twice (and, especially, finalizing a CT object after it was broken by the GC). Discussion ========== Predictability -------------- Following this scheme, an object's finalizer is always called exactly once. The only exception is if an object is resurrected: the finalizer will be called again later. For CI objects, the order in which finalizers are called (step 2 above) is undefined. Safety ------ It is important to explain why the proposed change is safe. There are two aspects to be discussed: * Can a finalizer access zombie objects (including the object being finalized)? * What happens if a finalizer mutates the object graph so as to impact the CI? Let's discuss the first issue. We will divide possible cases in two categories: * If the object being finalized is part of the CI: by construction, no objects in CI are zombies yet, since CI finalizers are called before any reference breaking is done. Therefore, the finalizer cannot access zombie objects, which don't exist. * If the object being finalized is not part of the CI/CT: by definition, objects in the CI/CT don't have any references pointing to them from outside the CI/CT. Therefore, the finalizer cannot reach any zombie object (that is, even if the object being finalized was itself referenced from a zombie object). Now for the second issue. There are three potential cases: * The finalizer clears an existing reference to a CI object. The CI object may be disposed of before the GC tries to break it, which is fine (the GC simply has to be aware of this possibility). * The finalizer creates a new reference to a CI object. This can only happen from a CI object's finalizer (see above why). Therefore, the new reference will be detected by the GC after all CI finalizers are called (step 3 above), and collection will be aborted without any objects being broken. * The finalizer clears or creates a reference to a non-CI object. By construction, this is not a problem. Implementation ============== An implementation is available in branch ``finalize`` of the repository at http://hg.python.org/features/finalize/. Validation ========== Besides running the normal Python test suite, the implementation adds test cases for various finalization possibilities including reference cycles, object resurrection and legacy ``tp_del`` slots. The implementation has also been checked to not produce any regressions on the following test suites: * `Tulip <http://code.google.com/p/tulip/>`_, which makes an extensive use of generators * `Tornado <http://www.tornadoweb.org>`_ * `SQLAlchemy <http://www.sqlalchemy.org/>`_ * `Django <https://www.djangoproject.com/>`_ * `zope.interface <http://pypi.python.org/pypi/zope.interface>`_ References ========== Notes about reference cycle collection and weak reference callbacks: http://hg.python.org/cpython/file/4e687d53b645/Modules/gc_weakref.txt Generator memory leak: http://bugs.python.org/issue17468 Allow objects to decide if they can be collected by GC: http://bugs.python.org/issue9141 Module shutdown procedure based on GC http://bugs.python.org/issue812369 Copyright ========= This document has been placed in the public domain. .. Local Variables: mode: indented-text indent-tabs-mode: nil sentence-end-double-space: t fill-column: 70 coding: utf-8 End:
On Sat, May 18, 2013 at 6:59 PM, Antoine Pitrou <solipsis@pitrou.net> wrote:
Resurrection The process by which a finalizer creates a new reference to an object in a CI. This can happen as a quirky but supported side-effect of ``__del__`` methods.
I really like the PEP overall, but could we at least get the option to have cases of object resurrection spit out a warning? And a clear rationale for not turning on such a warning by default? Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia
On Sat, 18 May 2013 21:05:48 +1000 Nick Coghlan <ncoghlan@gmail.com> wrote:
On Sat, May 18, 2013 at 6:59 PM, Antoine Pitrou <solipsis@pitrou.net> wrote:
Resurrection The process by which a finalizer creates a new reference to an object in a CI. This can happen as a quirky but supported side-effect of ``__del__`` methods.
I really like the PEP overall, but could we at least get the option to have cases of object resurrection spit out a warning? And a clear rationale for not turning on such a warning by default?
Where would you put the option? As for the rationale, it's simply compatibility: resurrection works without warnings right now :) Regards Antoine.
On Sat, May 18, 2013 at 9:46 PM, Antoine Pitrou <solipsis@pitrou.net> wrote:
On Sat, 18 May 2013 21:05:48 +1000 Nick Coghlan <ncoghlan@gmail.com> wrote:
On Sat, May 18, 2013 at 6:59 PM, Antoine Pitrou <solipsis@pitrou.net> wrote:
Resurrection The process by which a finalizer creates a new reference to an object in a CI. This can happen as a quirky but supported side-effect of ``__del__`` methods.
I really like the PEP overall, but could we at least get the option to have cases of object resurrection spit out a warning? And a clear rationale for not turning on such a warning by default?
Where would you put the option? As for the rationale, it's simply compatibility: resurrection works without warnings right now :)
Command line, probably. However, you're right that's something we can consider later - for the PEP it's enough that it still works, and we just avoid calling the __del__ method a second time. Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia
On Sat, 18 May 2013 22:51:35 +1000 Nick Coghlan <ncoghlan@gmail.com> wrote:
On Sat, May 18, 2013 at 9:46 PM, Antoine Pitrou <solipsis@pitrou.net> wrote:
On Sat, 18 May 2013 21:05:48 +1000 Nick Coghlan <ncoghlan@gmail.com> wrote:
On Sat, May 18, 2013 at 6:59 PM, Antoine Pitrou <solipsis@pitrou.net> wrote:
Resurrection The process by which a finalizer creates a new reference to an object in a CI. This can happen as a quirky but supported side-effect of ``__del__`` methods.
I really like the PEP overall, but could we at least get the option to have cases of object resurrection spit out a warning? And a clear rationale for not turning on such a warning by default?
Where would you put the option? As for the rationale, it's simply compatibility: resurrection works without warnings right now :)
Command line, probably. However, you're right that's something we can consider later - for the PEP it's enough that it still works, and we just avoid calling the __del__ method a second time.
Actually, the __del__ method is called again on the next destruction attempt - as mentioned in the PEP: « Following this scheme, an object's finalizer is always called exactly once. The only exception is if an object is resurrected: the finalizer will be called again later. » I could change it to only call __del__ ever once, it just sounded more logical to call it each time destruction is attempted. (this is in contrast to weakrefs, though, which are cleared once and for all) Regards Antoine.
Hi Antoine, On Sat, May 18, 2013 at 10:59 AM, Antoine Pitrou <solipsis@pitrou.net> wrote:
Cyclic isolate (CI) A reference cycle in which no object is referenced from outside the cycle *and* whose objects are still in a usable, non-broken state: they can access each other from their respective finalizers.
Does this definition include more complicated cases? For example: A -> B -> A and A -> C -> A Neither cycle is isolated. If there is no reference from outside, then the set of all three objects is isolated, but isn't strictly a cycle. I think the term is "strongly connected component".
1. Weakrefs to CI objects are cleared, and their callbacks called. At this point, the objects are still safe to use.
2. **The finalizers of all CI objects are called.**
You need to be very careful about what each call to a finalizer can do to the object graph. It may already be what you're doing, but the most careful solution is to collect in "1." the complete list of objects with finalizers that are in cycles; then incref them all; then call the finalizer of each of them; then decref them all. Such a solution gives new cases to think about, which are slightly unexpected for CPython's model: for example, if you have a cycle A -> B -> A, let's say the GC calls A.__del__ first; it might cause it to store a reference to B somewhere else, e.g. in some global; but then the GC calls B.__del__ anyway. This is probably fine but should be considered.
3. **The CI is traversed again to determine if it is still isolated.
How is this done? I don't see a clear way to determine it by looking only at the objects in the CI, given that arbitrary modifications of the object graph may have occurred. The solution I can think of doesn't seem robust against minor changes done by the finalizer. Take the example "A -> lst -> B -> A", where the reference from A to B is via a list (e.g. there is an attribute "A.attr = [B]"). If A.__del__ does the seemingly innocent change of replacing the list with a copy of itself, e.g. "A.attr = A.attr[:]", then after the finalizers are called, "lst" is gone and we're left with "A -> lst2 -> B -> A". Checking that this cycle is still isolated requires a possibly large number of checks, as far as I can tell. This can lead to O(n**2) behavior if there are n objects in total and O(n) cycles. The solution seems to be to simply wait for the next GC execution. Assuming that a finalizer is only called once, this only delays a bit freeing objects with finalizers in cycles (but your PEP still works to call finalizers and eventually collect the objects). Alternatively, this might be done immediately: in the point "3." above we can forget everything we found so far, and redo the tracking on all objects (this time ignoring finalizers that were already called). In fact, it may be necessary anyway: anything found before might be invalid after the finalizers are called, so forgetting it all and redoing the tracking from scratch seems to be the only way.
Type objects get a new ``tp_finalize`` slot to which ``__del__`` methods are bound. Generators are also modified to use this slot, rather than ``tp_del``. At the C level, a ``tp_finalize`` function is a normal function which will be called with a regular, alive object as its only argument. It should not attempt to revive or collect the object.
Do you mean the opposite in the latest sentence? ``tp_finalize`` can do anything... A bientôt, Armin.
Hi Armin, On Sat, 18 May 2013 15:24:08 +0200 Armin Rigo <arigo@tunes.org> wrote:
Hi Antoine,
On Sat, May 18, 2013 at 10:59 AM, Antoine Pitrou <solipsis@pitrou.net> wrote:
Cyclic isolate (CI) A reference cycle in which no object is referenced from outside the cycle *and* whose objects are still in a usable, non-broken state: they can access each other from their respective finalizers.
Does this definition include more complicated cases? For example:
A -> B -> A and A -> C -> A
Neither cycle is isolated. If there is no reference from outside, then the set of all three objects is isolated, but isn't strictly a cycle. I think the term is "strongly connected component".
Yes, I should fix this definition to be more exact.
1. Weakrefs to CI objects are cleared, and their callbacks called. At this point, the objects are still safe to use.
2. **The finalizers of all CI objects are called.**
You need to be very careful about what each call to a finalizer can do to the object graph. It may already be what you're doing, but the most careful solution is to collect in "1." the complete list of objects with finalizers that are in cycles; then incref them all; then call the finalizer of each of them; then decref them all. Such a solution gives new cases to think about, which are slightly unexpected for CPython's model: for example, if you have a cycle A -> B -> A, let's say the GC calls A.__del__ first; it might cause it to store a reference to B somewhere else, e.g. in some global; but then the GC calls B.__del__ anyway. This is probably fine but should be considered.
Yes, I know this is possible. My opinion is that it is fine to call B's finalizer anyway. Calling all finalizers regardless of interim changes in the object graph also makes things a bit more deterministic: otherwise, which finalizers are called would depend on the call order, which is undefined.
3. **The CI is traversed again to determine if it is still isolated.
How is this done? I don't see a clear way to determine it by looking only at the objects in the CI, given that arbitrary modifications of the object graph may have occurred.
The same way a generation is traversed, but restricted to the CI. First the gc_refs field of each CI object is initialized to its ob_refcnt (again). Then, tp_traverse is called on each CI object, and each visited CI object has its gc_refs decremented. This substracts CI-internal references from the gc_refs fields. At the end of the traversal, if all CI objects have their gc_refs equal to 0, then the CI has no external reference to it and can be cleared. If at least one CI object has non-zero gc_refs, the CI cannot be cleared.
Alternatively, this might be done immediately: in the point "3." above we can forget everything we found so far, and redo the tracking on all objects (this time ignoring finalizers that were already called).
This would also be more costly, performance-wise. A CI should generally be quite small, but a whole generation is arbitrary big.
Type objects get a new ``tp_finalize`` slot to which ``__del__`` methods are bound. Generators are also modified to use this slot, rather than ``tp_del``. At the C level, a ``tp_finalize`` function is a normal function which will be called with a regular, alive object as its only argument. It should not attempt to revive or collect the object.
Do you mean the opposite in the latest sentence? ``tp_finalize`` can do anything...
Not exactly, but I worded it poorly. What I meant is that the C code in tp_finalize shouldn't *manually* revive the object, since it is called with an object with a strictly positive refcount. Regards Antoine.
Hi Antoine, On Sat, May 18, 2013 at 3:45 PM, Antoine Pitrou <solipsis@pitrou.net> wrote:
How is this done? I don't see a clear way to determine it by looking only at the objects in the CI, given that arbitrary modifications of the object graph may have occurred.
The same way a generation is traversed, but restricted to the CI.
First the gc_refs field of each CI object is initialized to its ob_refcnt (again).
Then, tp_traverse is called on each CI object, and each visited CI object has its gc_refs decremented. This substracts CI-internal references from the gc_refs fields.
At the end of the traversal, if all CI objects have their gc_refs equal to 0, then the CI has no external reference to it and can be cleared. If at least one CI object has non-zero gc_refs, the CI cannot be cleared.
Ok, indeed. Then you really should call finalizers only once: in case one of the finalizers in a cycle did a trivial change like I described, the algorithm above will conservatively assume the cycle should be kept alive. At the next GC collection we must not call the finalizer again, because it's likely to just do a similar trivial change. (There are other open questions about calling finalizers multiple times; e.g. an instance of this class has its finalizer called ad infinitum and leaks, even though X() is never part of any cycle: class X(object): def __del__(self): print "tick" lst = [self] lst.append(lst) Try interactively: every gc.collect() prints "tick", even if you make only one instance.) A bientôt, Armin.
On Sat, 18 May 2013 16:22:55 +0200 Armin Rigo <arigo@tunes.org> wrote:
Hi Antoine,
On Sat, May 18, 2013 at 3:45 PM, Antoine Pitrou <solipsis@pitrou.net> wrote:
How is this done? I don't see a clear way to determine it by looking only at the objects in the CI, given that arbitrary modifications of the object graph may have occurred.
The same way a generation is traversed, but restricted to the CI.
First the gc_refs field of each CI object is initialized to its ob_refcnt (again).
Then, tp_traverse is called on each CI object, and each visited CI object has its gc_refs decremented. This substracts CI-internal references from the gc_refs fields.
At the end of the traversal, if all CI objects have their gc_refs equal to 0, then the CI has no external reference to it and can be cleared. If at least one CI object has non-zero gc_refs, the CI cannot be cleared.
Ok, indeed. Then you really should call finalizers only once: in case one of the finalizers in a cycle did a trivial change like I described, the algorithm above will conservatively assume the cycle should be kept alive. At the next GC collection we must not call the finalizer again, because it's likely to just do a similar trivial change.
Well, the finalizer will only be called if the resurrected object is dereferenced again; otherwise the object won't be considered by the GC. So, this will only happen if someone keeps trying to destroy a resurrected object. Calling finalizers only once is fine with me, but it would be a change in behaviour; I don't know if it may break existing code. (for example, say someone is using __del__ to manage a freelist) Regards Antoine.
2013/5/18 Antoine Pitrou <solipsis@pitrou.net>:
Calling finalizers only once is fine with me, but it would be a change in behaviour; I don't know if it may break existing code.
I agree with Armin that this is better behavior. (Mostly significantly consistent with weakrefs.)
(for example, say someone is using __del__ to manage a freelist)
Do you know if it breaks any of the projects you tested it with? -- Regards, Benjamin
On Sun, 2 Jun 2013 19:16:17 -0700 Benjamin Peterson <benjamin@python.org> wrote:
2013/5/18 Antoine Pitrou <solipsis@pitrou.net>:
Calling finalizers only once is fine with me, but it would be a change in behaviour; I don't know if it may break existing code.
I agree with Armin that this is better behavior. (Mostly significantly consistent with weakrefs.)
Keep in mind that it is a limitation of weakrefs, not a feature: you can't "unclear" a weakref.
(for example, say someone is using __del__ to manage a freelist)
Do you know if it breaks any of the projects you tested it with?
I haven't tested it (yet). Regards Antoine.
On Sat, May 18, 2013 at 10:33 PM, Antoine Pitrou <solipsis@pitrou.net> wrote:
On Sat, 18 May 2013 16:22:55 +0200 Armin Rigo <arigo@tunes.org> wrote:
Hi Antoine,
On Sat, May 18, 2013 at 3:45 PM, Antoine Pitrou <solipsis@pitrou.net> wrote:
How is this done? I don't see a clear way to determine it by looking only at the objects in the CI, given that arbitrary modifications of the object graph may have occurred.
The same way a generation is traversed, but restricted to the CI.
First the gc_refs field of each CI object is initialized to its ob_refcnt (again).
Then, tp_traverse is called on each CI object, and each visited CI object has its gc_refs decremented. This substracts CI-internal references from the gc_refs fields.
At the end of the traversal, if all CI objects have their gc_refs equal to 0, then the CI has no external reference to it and can be cleared. If at least one CI object has non-zero gc_refs, the CI cannot be cleared.
Ok, indeed. Then you really should call finalizers only once: in case one of the finalizers in a cycle did a trivial change like I described, the algorithm above will conservatively assume the cycle should be kept alive. At the next GC collection we must not call the finalizer again, because it's likely to just do a similar trivial change.
Well, the finalizer will only be called if the resurrected object is dereferenced again; otherwise the object won't be considered by the GC. So, this will only happen if someone keeps trying to destroy a resurrected object.
Calling finalizers only once is fine with me, but it would be a change in behaviour; I don't know if it may break existing code.
(for example, say someone is using __del__ to manage a freelist)
Regards
Antoine.
PyPy already ever calls finalizers once. If you resurrect an object, it'll be alive, but it's finalizer will not be called again. We discussed a few changes a while ago and we decided (I think even debated on python-dev) than even such behavior is correct: * you have a reference cycle A <-> B, C references A. C references itself. * you collect the stuff. We do topological order, so C finalizer is called first (they're only undefined inside a cycle) * then A and B finalizers are called in undefined order, even if C finalizer would resurrect C. * no more finalizers for those objects are called I'm not sure if it's cool for CPython or not to do such changes Cheers, fijal
Great PEP, I would really like to see this happen as it defines much saner semantics for finalization than what we currently have. One small question below: This PEP proposes to turn CI disposal into the following sequence (new
steps are in bold):
1. Weakrefs to CI objects are cleared, and their callbacks called. At this point, the objects are still safe to use.
2. **The finalizers of all CI objects are called.**
3. **The CI is traversed again to determine if it is still isolated. If it is determined that at least one object in CI is now reachable from outside the CI, this collection is aborted and the whole CI is resurrected. Otherwise, proceed.**
Not sure if my question is the same as Armin's here, but worth a try: by saying "the CI is traversed again" do you mean the original objects from the CI as discovered earlier, or is a new scan being done? What about a new object entering the CI during step (2)? I.e. the original CI was A->B->A but now one of the finalizers created some C such that B->C and C->A adding it to the connected component? Reading your description in (3) strictly it says: in this case the collection is aborted. This CI will be disposed next time collection is run. Is this correct? Eli
4. The CI becomes a CT as the GC systematically breaks all known references inside it (using the ``tp_clear`` function).
5. Nothing. All CT objects should have been disposed of in step 4 (as a side-effect of clearing references); this collection is finished.
Eli
On Sat, 18 May 2013 06:37:54 -0700 Eli Bendersky <eliben@gmail.com> wrote:
Great PEP, I would really like to see this happen as it defines much saner semantics for finalization than what we currently have. One small question below:
This PEP proposes to turn CI disposal into the following sequence (new
steps are in bold):
1. Weakrefs to CI objects are cleared, and their callbacks called. At this point, the objects are still safe to use.
2. **The finalizers of all CI objects are called.**
3. **The CI is traversed again to determine if it is still isolated. If it is determined that at least one object in CI is now reachable from outside the CI, this collection is aborted and the whole CI is resurrected. Otherwise, proceed.**
Not sure if my question is the same as Armin's here, but worth a try: by saying "the CI is traversed again" do you mean the original objects from the CI as discovered earlier, or is a new scan being done? What about a new object entering the CI during step (2)? I.e. the original CI was A->B->A but now one of the finalizers created some C such that B->C and C->A adding it to the connected component?
It is the original CI which is traversed. If a new reference is introduced into the reference chain, the traversal in step 3 will decide to resurrect the CI. This is not necessarily a problem, since the next GC collection will try collecting again.
Reading your description in (3) strictly it says: in this case the collection is aborted. This CI will be disposed next time collection is run. Is this correct?
Yup. Regards Antoine.
On Sat, May 18, 2013 at 6:47 AM, Antoine Pitrou <solipsis@pitrou.net> wrote:
On Sat, 18 May 2013 06:37:54 -0700 Eli Bendersky <eliben@gmail.com> wrote:
Great PEP, I would really like to see this happen as it defines much saner semantics for finalization than what we currently have. One small question below:
This PEP proposes to turn CI disposal into the following sequence (new
steps are in bold):
1. Weakrefs to CI objects are cleared, and their callbacks called. At this point, the objects are still safe to use.
2. **The finalizers of all CI objects are called.**
3. **The CI is traversed again to determine if it is still isolated. If it is determined that at least one object in CI is now reachable from outside the CI, this collection is aborted and the whole CI is resurrected. Otherwise, proceed.**
Not sure if my question is the same as Armin's here, but worth a try: by saying "the CI is traversed again" do you mean the original objects from the CI as discovered earlier, or is a new scan being done? What about a new object entering the CI during step (2)? I.e. the original CI was A->B->A but now one of the finalizers created some C such that B->C and C->A adding it to the connected component?
It is the original CI which is traversed. If a new reference is introduced into the reference chain, the traversal in step 3 will decide to resurrect the CI. This is not necessarily a problem, since the next GC collection will try collecting again.
Reading your description in (3) strictly it says: in this case the collection is aborted. This CI will be disposed next time collection is run. Is this correct?
Yup.
Thanks, this actually makes a lot of sense. It's strictly better than the current situation where objects with __del__ are never collected. In the proposed scheme, the weird ones will be delayed and some really weird ones may never be collected, but the vast majority of __del__ methods do no resurrection so usually it will just work. This is a great proposal - killer new feature for 3.4 ;-) Eli
On 18/05/2013 9:59am, Antoine Pitrou wrote:
This PEP proposes to turn CI disposal into the following sequence (new steps are in bold):
1. Weakrefs to CI objects are cleared, and their callbacks called. At this point, the objects are still safe to use.
2. **The finalizers of all CI objects are called.**
How do you know that one of the finalizers will not do something which causes another to fail? Presumably the following would cause an AttributeError to be printed: class Node: def __init__(self): self.next = None def __del__(self): print(self, self.next) del self.next # break Node object a = Node() b = Node() a.next = b b.next = a del a, b gc.collect() Are there are less contrived examples which will cause errors where currently there are none? -- Richard
On Sat, 18 May 2013 14:56:38 +0100 Richard Oudkerk <shibturn@gmail.com> wrote:
On 18/05/2013 9:59am, Antoine Pitrou wrote:
This PEP proposes to turn CI disposal into the following sequence (new steps are in bold):
1. Weakrefs to CI objects are cleared, and their callbacks called. At this point, the objects are still safe to use.
2. **The finalizers of all CI objects are called.**
How do you know that one of the finalizers will not do something which causes another to fail?
Presumably the following would cause an AttributeError to be printed:
class Node: def __init__(self): self.next = None def __del__(self): print(self, self.next) del self.next # break Node object
a = Node() b = Node() a.next = b b.next = a del a, b gc.collect()
It works fine: $ ./python sbt.py <__main__.Node object at 0x7f3acbf8f400> <__main__.Node object at 0x7f3acbf8f878> <__main__.Node object at 0x7f3acbf8f878> <__main__.Node object at 0x7f3acbf8f400> The reason is that, when you execute "del self.next", this removes the last reference to self.next and destroys it immediately. In essence, you were expecting to see: - enter a.__del__, destroy b - leave a.__del__ - enter b.__del__ oops? But what happens is: - enter a.__del__, destroy b - enter b.__del__ - leave b.__del__ - leave a.__del__ Regards Antoine.
On 18/05/2013 3:18pm, Antoine Pitrou wrote:
It works fine:
$ ./python sbt.py <__main__.Node object at 0x7f3acbf8f400> <__main__.Node object at 0x7f3acbf8f878> <__main__.Node object at 0x7f3acbf8f878> <__main__.Node object at 0x7f3acbf8f400>
The reason is that, when you execute "del self.next", this removes the last reference to self.next and destroys it immediately.
So even more contrived: class Node: def __init__(self, x): self.x = x self.next = None def __del__(self): print(self.x, self.next.x) del self.x a = Node(1) b = Node(2) a.next = b b.next = a del a, b gc.collect() -- Richard
On Sat, 18 May 2013 15:52:56 +0100 Richard Oudkerk <shibturn@gmail.com> wrote:
On 18/05/2013 3:18pm, Antoine Pitrou wrote:
It works fine:
$ ./python sbt.py <__main__.Node object at 0x7f3acbf8f400> <__main__.Node object at 0x7f3acbf8f878> <__main__.Node object at 0x7f3acbf8f878> <__main__.Node object at 0x7f3acbf8f400>
The reason is that, when you execute "del self.next", this removes the last reference to self.next and destroys it immediately.
So even more contrived:
class Node: def __init__(self, x): self.x = x self.next = None def __del__(self): print(self.x, self.next.x) del self.x
a = Node(1) b = Node(2) a.next = b b.next = a del a, b gc.collect()
Indeed, there is an exception during destruction (which is ignored as any exception raised from __del__): $ ./python sbt.py 1 2 Exception ignored in: <bound method Node.__del__ of <__main__.Node object at 0x7f543cf0bb50>> Traceback (most recent call last): File "sbt.py", line 17, in __del__ print(self.x, self.next.x) AttributeError: 'Node' object has no attribute 'x' The only reason this currently succeeds is that the objects end up in gc.garbage, of course. Regards Antoine.
On 5/18/2013 11:22 AM, Antoine Pitrou wrote:
On Sat, 18 May 2013 15:52:56 +0100 Richard Oudkerk <shibturn@gmail.com> wrote:
So even more contrived:
class Node: def __init__(self, x): self.x = x self.next = None def __del__(self): print(self.x, self.next.x) del self.x
An attribute reference that can fail should be wrapped with try-except.
a = Node(1) b = Node(2) a.next = b b.next = a del a, b gc.collect()
Indeed, there is an exception during destruction (which is ignored as any exception raised from __del__):
$ ./python sbt.py 1 2 Exception ignored in: <bound method Node.__del__ of <__main__.Node object at 0x7f543cf0bb50>> Traceback (most recent call last): File "sbt.py", line 17, in __del__ print(self.x, self.next.x) AttributeError: 'Node' object has no attribute 'x'
Though ignored, the bug is reported, hinting that you should fix it ;-).
2013/5/18 Antoine Pitrou <solipsis@pitrou.net>:
Hello,
I would like to submit the following PEP for discussion and evaluation.
Will the API of the gc module be at all affected? I assume nothing will just be printed for DEBUG_UNCOLLECTABLE. Maybe there should be a way to discover when a cycle is resurrected? -- Regards, Benjamin
On Sun, 2 Jun 2013 19:27:49 -0700 Benjamin Peterson <benjamin@python.org> wrote:
2013/5/18 Antoine Pitrou <solipsis@pitrou.net>:
Hello,
I would like to submit the following PEP for discussion and evaluation.
Will the API of the gc module be at all affected? I assume nothing will just be printed for DEBUG_UNCOLLECTABLE.
Objects with tp_del may still exist (third-party extensions perhaps).
Maybe there should be a way to discover when a cycle is resurrected?
Is it more important than discovering when a non-cycle is resurrected? Regards Antoine.
participants (8)
-
Antoine Pitrou
-
Armin Rigo
-
Benjamin Peterson
-
Eli Bendersky
-
Maciej Fijalkowski
-
Nick Coghlan
-
Richard Oudkerk
-
Terry Jan Reedy