breaking cycles that include __del__

Right now, the presence of __del__ method prevents cycles from being collected. It is possible for an application to rummage through gc.garbage and break links between objects. However, for a large application this may be inefficient or inelegant for several reasons: - It may be most natural to break the links on objects of type Foo, even though Foo is a minority of items (e.g., if Foo is parent node with many children). - It's difficult to know how frequently to sift through the garbage - Third-party package may create noncollectable garbage that we need to sift through, even though we don't know how to safely break links on their objects. - If every package sifts through the garbage independently, there's a lot of extra work going on. I propose the following simple solution. When the garbage collector is about to add an object to gc.garbage, it checks to see if the object has a __clear__ method. If so, the garbage collector calls it, giving the object an opportunity to safely break any cycles it may be in. If it breaks the cycles, great! The object can now safely be collected and __del__ will eventually be called. If not, then the object gets added to gc.garbage after all and no harm is done. It would be simple to implement, efficient, and the cognitive load for the programmer using __del__ and __clear__ is small. Thoughts? -- Daniel Stutzbach, Ph.D. President, Stutzbach Enterprises, LLC <http://stutzbachenterprises.com>

On 17 Oct 2009, at 15:34, Daniel Stutzbach wrote:
I remember a thread about this on comp.lang.python in March: http://mail.python.org/pipermail/python-list/2009-March/174419.html -- Arnaud

Daniel Stutzbach wrote:
It would be simple to implement, efficient, and the cognitive load for the programmer using __del__ and __clear__ is small.
I believe it would be better to find ways to streamline the existing "collectible-yet-finalised" mechanism using weak references rather than adding another way to do it. Although even before that, just documenting how weakrefs can be used instead of __del__ in the weakref module docs and crosslinking to that from the __del__ method documentation would be an improvement on the status quo. The gist is that you split your object into a "core" which requires finalisation (but can never itself be caught up in a cycle) and the main object which may participate in cycles. A weak reference to the main object is placed in a global container. When the weak reference callback is invoked, the object core is still in a valid state so it can be finalised safely, even though the main object may be in the midst of cycle collection. This recipe gives a pretty good idea of how that approach works: http://code.activestate.com/recipes/519610/ Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia ---------------------------------------------------------------

On Sat, Oct 17, 2009 at 11:17 AM, Nick Coghlan <ncoghlan@gmail.com> wrote:
That idiom (and that recipe in particular! yuck!) is pretty convoluted, IMO. -- Daniel Stutzbach, Ph.D. President, Stutzbach Enterprises, LLC <http://stutzbachenterprises.com>

On Mon, Oct 19, 2009 at 08:25, Daniel Stutzbach <daniel@stutzbachenterprises.com> wrote:
The critical idea is to list attributes that cannot be cleared, leaving everything else to be cleared by default. This is MUCH less precarious than calling arbitrary code from within the GC, and accomplishes nearly the same thing. The use of a core further benefits the GC implementation, but not much the user. Obviously a recipe like that cannot show real benefit from it, and I admit the recipe is pretty ugly. -- Adam Olsen, aka Rhamphoryncus

Daniel Stutzbach wrote:
True, but as Adam pointed out, something along those lines is necessary to avoid accessing already destroyed objects in __del__ methods. My point was mainly that the idiom works, we know it works, we just don't have a particularly nice way of spelling it. (Even I thought that recipe was fairly ugly - I just linked it because it does a good job of explaining the requirements of the technique) For the diagnostic task you're talking about, wouldn't it be better to create your own collection of weakrefs to instances of the class you're interested in, checking that the object has been closed in the callback and then iterate over that collection in an atexit handler? You wouldn't have any __del__ methods of your own interfering with garbage collection then and the atexit handler would be a final diagnostic to check if the objects were getting caught up in cycles with *other* objects with __del__ methods. Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia ---------------------------------------------------------------

On Sat, Oct 17, 2009 at 9:34 AM, Daniel Stutzbach < daniel@stutzbachenterprises.com> wrote:
I should add that my personal use-case runs something like this. Obviously, most uses of __del__ revolve around object with with special resources that need to be freed "by hand" (such as a resource allocated via ctypes). To guarantee the timely freeing of resources, it's always better to close the object by hand with a .close() method or using a context manager. That works great, except there's often no easy way to test a program to verifying that .close is always called when appropriate. There's a simple recipe to make sure that .close() was called on an object, by using __del__ to make sure the object was closed "by hand": class Widget: closed = False def close(self): if self.closed: return self.closed = True # Free various resources from ctypes, etc. def __del__(self): log_an_error('%s not closed properly' % repr(self)) self.close() Unfortunately, if the object ends up in a cycle, then the recipe can't do its job because __del__ will never be called. I think I'll experiment by having my program periodically check gc.garbage and call any __clear__ methods it finds. For that matter, I may just call any __del__ methods it finds and see what happens. If the order of calls to __del__ matters, I'll try it as a bug in my __del__ methods. -- Daniel Stutzbach, Ph.D. President, Stutzbach Enterprises, LLC <http://stutzbachenterprises.com>

Daniel Stutzbach wrote:
Unfortunately, if the object ends up in a cycle, then the recipe can't do its job because __del__ will never be called.
The way I have always dealt with these problems is to not have overly-generalized objects. If I have a resource to acquire that needs to have a __del__ to clean-up after itself (as you suggest, in case a close() was not called), then that is *all* that object does. The only way you can get trapped into a cycle is if you design a resource-managing object that holds references to other objects, which was bad design. I don't see the need for a recipe like the one Nick linked to. Everyone can agree it is ugly, I think. But, the real goal was to split out the core of the object required for clean-up. So, I am saying, actually split it out! Classes are cheap. class _CoreWidget: def __init__(self, *args, **kwds): self.closed = False # Acquire resources def close(self): if self.closed: return self.closed = True # Free resources def __del__(self): if not self.closed: print('%s not closed properly' % repr(self)) self.close() class Widget: def __init__(self, *args, **kwds): self._core_widget = _CoreWidget(*args, **kwds) x, y = Widget(), Widget() x.y, y.x = y, x del x; del y # <__main__._CoreWidget instance at 0x00B3F418> not closed properly # <__main__._CoreWidget instance at 0x00B3E328> not closed properly -- Scott Dial scott@scottdial.com scodial@cs.indiana.edu

Daniel Stutzbach <daniel@...> writes:
Unfortunately, if the object ends up in a cycle, then the recipe can't do its job because __del__ will never be called.
You don't need __del__, a weakref is good enough since you only need to remember an adequate string representation of the object: class Widget: closed = False def __init__(self): # ... def error_not_closed(_, r=repr(self)): log_an_error('%s not closed properly' % r) self._wr_not_closed = weakref.ref(self, error_not_closed) def close(self): if self.closed: return self.closed = True # Destroy weakref self._wr_not_closed = None # Free various resources from ctypes, etc. Regards Antoine.

On Tue, Oct 20, 2009 at 8:01 AM, Antoine Pitrou <solipsis@pitrou.net> wrote:
Thank you! This is a much simpler recipe than the others I have seen. It should probably be placed right in the documentation for __del__, as a suggested alternative. Having direct support from the garbage collector would still be more memory efficient (instead of requiring a weakref and a bound function object), but it will do for many purposes in a pinch. -- Daniel Stutzbach, Ph.D. President, Stutzbach Enterprises, LLC <http://stutzbachenterprises.com>

I'm playing around with this, and in following variant the weakref callback is never called (in Python 2.6 and 3.1). I do not see why. I know I've created a cycle between the callback and the object itself, but the garbage collector should detect and free the cycle. import weakref, gc class Foo: def __init__(self): self._weakref = weakref.ref(self, self.__free__) def __free__(self): print("I'm free!") x = Foo() del x gc.collect() print('test') print(gc.garbage) -- Daniel Stutzbach, Ph.D. President, Stutzbach Enterprises, LLC <http://stutzbachenterprises.com>

On Tue, Oct 20, 2009 at 10:58, Daniel Stutzbach <daniel@stutzbachenterprises.com> wrote:
Your weakref callback is a method of your object. The callback requires that method still be alive, but isn't triggered until self is deleted. If the GC does anything it'll have to include clearing the weakref, deleting the callback. Note that, for robustness, it's preferable to make the weakref globally reachable until it's called, such as in a set. Also note that Antoine's example saved a snapshot of the object's id. Storing state is harder, and requires you to encapsulate it in some other object. A list for Antoine's 'r' argument might be a decently simple bodge. -- Adam Olsen, aka Rhamphoryncus

Adam Olsen wrote:
This is made clear in the gcmodule.c comments, but is severely lacking from the actual documentation in the weakref module. But also, Antoine's is flawed in the same manner. Antoine's example works merely because the gc module is left out of the issue (there are no cycles in it). If you introduce a cycle, then it falls apart, just like Daniel's: import weakref, gc class Foo: def __init__(self): def call_free(_, r=repr(self)): print(r) self._weakref = weakref.ref(self, call_free) x,y = Foo(), Foo() x.y, y.x = y, x del x del y gc.collect() print(gc.garbage) The whole source of the confusion is documented in the handle_weakrefs() in Modules/gcmodule.c at line 600: /* Headache time. `op` is going away, and is weakly referenced by * `wr`, which has a callback. Should the callback be invoked? If wr * is also trash, no: * * 1. There's no need to call it. The object and the weakref are * both going away, so it's legitimate to pretend the weakref is * going away first. The user has to ensure a weakref outlives its * referent if they want a guarantee that the wr callback will get * invoked. * * 2. It may be catastrophic to call it. If the callback is also in * cyclic trash (CT), then although the CT is unreachable from * outside the current generation, CT may be reachable from the * callback. Then the callback could resurrect insane objects. * * Since the callback is never needed and may be unsafe in this case, * wr is simply left in the unreachable set. Note that because we * already called _PyWeakref_ClearRef(wr), its callback will never * trigger. * * OTOH, if wr isn't part of CT, we should invoke the callback: the * weakref outlived the trash. Note that since wr isn't CT in this * case, its callback can't be CT either -- wr acted as an external * root to this generation, and therefore its callback did too. So * nothing in CT is reachable from the callback either, so it's hard * to imagine how calling it later could create a problem for us. wr * is moved to wrcb_to_call in this case. */ if (IS_TENTATIVELY_UNREACHABLE(wr)) continue; The only way to guarantee the callback occurs is if you attach it to some other object that will outlive your object, *but* must also not get pulled into the same gc generation, otherwise it will *still not be called*. I believe this is the source of your advice to store the weakref in some globally reachable set. I believe given the way modules are currently deallocated, this is guaranteed to work. Should modules ever be included in the gc, then perhaps this would have to be revisited. -- Scott Dial scott@scottdial.com scodial@cs.indiana.edu

On Tue, Oct 20, 2009 at 12:59 PM, Scott Dial < scott+python-ideas@scottdial.com <scott%2Bpython-ideas@scottdial.com>>wrote:
Should modules ever be included in the gc, then perhaps this would have to be revisited.
You mean like this? http://bugs.python.org/issue812369 -- Daniel Stutzbach, Ph.D. President, Stutzbach Enterprises, LLC <http://stutzbachenterprises.com>

On Tue, Oct 20, 2009 at 11:59, Scott Dial <scott+python-ideas@scottdial.com> wrote:
Right, but if you want to guarantee your method will be called on shutdown it's better to fall back on an atexit handler. Once we start tearing down modules we can't promise any sane state, so it'd be better to disable weakref callbacks entirely. -- Adam Olsen, aka Rhamphoryncus

Scott Dial <scott+python-ideas@...> writes:
You're right, I've been too quick in posting this. The following works, however: import weakref, gc class Foo: _w = {} def __init__(self): k = id(self) def not_closed(_, d=Foo._w, k=k, r=repr(self)): del d[k] print ("%s not closed!" % r) Foo._w[k] = weakref.ref(self, not_closed) def close(self): # Close... # and then destroy weakref Foo._w.pop(id(self), None) x,y = Foo(), Foo() x.y, y.x = y, x y.close() del x del y gc.collect() print(gc.garbage) Cheers Antoine.

On 17 Oct 2009, at 15:34, Daniel Stutzbach wrote:
I remember a thread about this on comp.lang.python in March: http://mail.python.org/pipermail/python-list/2009-March/174419.html -- Arnaud

Daniel Stutzbach wrote:
It would be simple to implement, efficient, and the cognitive load for the programmer using __del__ and __clear__ is small.
I believe it would be better to find ways to streamline the existing "collectible-yet-finalised" mechanism using weak references rather than adding another way to do it. Although even before that, just documenting how weakrefs can be used instead of __del__ in the weakref module docs and crosslinking to that from the __del__ method documentation would be an improvement on the status quo. The gist is that you split your object into a "core" which requires finalisation (but can never itself be caught up in a cycle) and the main object which may participate in cycles. A weak reference to the main object is placed in a global container. When the weak reference callback is invoked, the object core is still in a valid state so it can be finalised safely, even though the main object may be in the midst of cycle collection. This recipe gives a pretty good idea of how that approach works: http://code.activestate.com/recipes/519610/ Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia ---------------------------------------------------------------

On Sat, Oct 17, 2009 at 11:17 AM, Nick Coghlan <ncoghlan@gmail.com> wrote:
That idiom (and that recipe in particular! yuck!) is pretty convoluted, IMO. -- Daniel Stutzbach, Ph.D. President, Stutzbach Enterprises, LLC <http://stutzbachenterprises.com>

On Mon, Oct 19, 2009 at 08:25, Daniel Stutzbach <daniel@stutzbachenterprises.com> wrote:
The critical idea is to list attributes that cannot be cleared, leaving everything else to be cleared by default. This is MUCH less precarious than calling arbitrary code from within the GC, and accomplishes nearly the same thing. The use of a core further benefits the GC implementation, but not much the user. Obviously a recipe like that cannot show real benefit from it, and I admit the recipe is pretty ugly. -- Adam Olsen, aka Rhamphoryncus

Daniel Stutzbach wrote:
True, but as Adam pointed out, something along those lines is necessary to avoid accessing already destroyed objects in __del__ methods. My point was mainly that the idiom works, we know it works, we just don't have a particularly nice way of spelling it. (Even I thought that recipe was fairly ugly - I just linked it because it does a good job of explaining the requirements of the technique) For the diagnostic task you're talking about, wouldn't it be better to create your own collection of weakrefs to instances of the class you're interested in, checking that the object has been closed in the callback and then iterate over that collection in an atexit handler? You wouldn't have any __del__ methods of your own interfering with garbage collection then and the atexit handler would be a final diagnostic to check if the objects were getting caught up in cycles with *other* objects with __del__ methods. Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia ---------------------------------------------------------------

On Sat, Oct 17, 2009 at 9:34 AM, Daniel Stutzbach < daniel@stutzbachenterprises.com> wrote:
I should add that my personal use-case runs something like this. Obviously, most uses of __del__ revolve around object with with special resources that need to be freed "by hand" (such as a resource allocated via ctypes). To guarantee the timely freeing of resources, it's always better to close the object by hand with a .close() method or using a context manager. That works great, except there's often no easy way to test a program to verifying that .close is always called when appropriate. There's a simple recipe to make sure that .close() was called on an object, by using __del__ to make sure the object was closed "by hand": class Widget: closed = False def close(self): if self.closed: return self.closed = True # Free various resources from ctypes, etc. def __del__(self): log_an_error('%s not closed properly' % repr(self)) self.close() Unfortunately, if the object ends up in a cycle, then the recipe can't do its job because __del__ will never be called. I think I'll experiment by having my program periodically check gc.garbage and call any __clear__ methods it finds. For that matter, I may just call any __del__ methods it finds and see what happens. If the order of calls to __del__ matters, I'll try it as a bug in my __del__ methods. -- Daniel Stutzbach, Ph.D. President, Stutzbach Enterprises, LLC <http://stutzbachenterprises.com>

Daniel Stutzbach wrote:
Unfortunately, if the object ends up in a cycle, then the recipe can't do its job because __del__ will never be called.
The way I have always dealt with these problems is to not have overly-generalized objects. If I have a resource to acquire that needs to have a __del__ to clean-up after itself (as you suggest, in case a close() was not called), then that is *all* that object does. The only way you can get trapped into a cycle is if you design a resource-managing object that holds references to other objects, which was bad design. I don't see the need for a recipe like the one Nick linked to. Everyone can agree it is ugly, I think. But, the real goal was to split out the core of the object required for clean-up. So, I am saying, actually split it out! Classes are cheap. class _CoreWidget: def __init__(self, *args, **kwds): self.closed = False # Acquire resources def close(self): if self.closed: return self.closed = True # Free resources def __del__(self): if not self.closed: print('%s not closed properly' % repr(self)) self.close() class Widget: def __init__(self, *args, **kwds): self._core_widget = _CoreWidget(*args, **kwds) x, y = Widget(), Widget() x.y, y.x = y, x del x; del y # <__main__._CoreWidget instance at 0x00B3F418> not closed properly # <__main__._CoreWidget instance at 0x00B3E328> not closed properly -- Scott Dial scott@scottdial.com scodial@cs.indiana.edu

Daniel Stutzbach <daniel@...> writes:
Unfortunately, if the object ends up in a cycle, then the recipe can't do its job because __del__ will never be called.
You don't need __del__, a weakref is good enough since you only need to remember an adequate string representation of the object: class Widget: closed = False def __init__(self): # ... def error_not_closed(_, r=repr(self)): log_an_error('%s not closed properly' % r) self._wr_not_closed = weakref.ref(self, error_not_closed) def close(self): if self.closed: return self.closed = True # Destroy weakref self._wr_not_closed = None # Free various resources from ctypes, etc. Regards Antoine.

On Tue, Oct 20, 2009 at 8:01 AM, Antoine Pitrou <solipsis@pitrou.net> wrote:
Thank you! This is a much simpler recipe than the others I have seen. It should probably be placed right in the documentation for __del__, as a suggested alternative. Having direct support from the garbage collector would still be more memory efficient (instead of requiring a weakref and a bound function object), but it will do for many purposes in a pinch. -- Daniel Stutzbach, Ph.D. President, Stutzbach Enterprises, LLC <http://stutzbachenterprises.com>

I'm playing around with this, and in following variant the weakref callback is never called (in Python 2.6 and 3.1). I do not see why. I know I've created a cycle between the callback and the object itself, but the garbage collector should detect and free the cycle. import weakref, gc class Foo: def __init__(self): self._weakref = weakref.ref(self, self.__free__) def __free__(self): print("I'm free!") x = Foo() del x gc.collect() print('test') print(gc.garbage) -- Daniel Stutzbach, Ph.D. President, Stutzbach Enterprises, LLC <http://stutzbachenterprises.com>

On Tue, Oct 20, 2009 at 10:58, Daniel Stutzbach <daniel@stutzbachenterprises.com> wrote:
Your weakref callback is a method of your object. The callback requires that method still be alive, but isn't triggered until self is deleted. If the GC does anything it'll have to include clearing the weakref, deleting the callback. Note that, for robustness, it's preferable to make the weakref globally reachable until it's called, such as in a set. Also note that Antoine's example saved a snapshot of the object's id. Storing state is harder, and requires you to encapsulate it in some other object. A list for Antoine's 'r' argument might be a decently simple bodge. -- Adam Olsen, aka Rhamphoryncus

Adam Olsen wrote:
This is made clear in the gcmodule.c comments, but is severely lacking from the actual documentation in the weakref module. But also, Antoine's is flawed in the same manner. Antoine's example works merely because the gc module is left out of the issue (there are no cycles in it). If you introduce a cycle, then it falls apart, just like Daniel's: import weakref, gc class Foo: def __init__(self): def call_free(_, r=repr(self)): print(r) self._weakref = weakref.ref(self, call_free) x,y = Foo(), Foo() x.y, y.x = y, x del x del y gc.collect() print(gc.garbage) The whole source of the confusion is documented in the handle_weakrefs() in Modules/gcmodule.c at line 600: /* Headache time. `op` is going away, and is weakly referenced by * `wr`, which has a callback. Should the callback be invoked? If wr * is also trash, no: * * 1. There's no need to call it. The object and the weakref are * both going away, so it's legitimate to pretend the weakref is * going away first. The user has to ensure a weakref outlives its * referent if they want a guarantee that the wr callback will get * invoked. * * 2. It may be catastrophic to call it. If the callback is also in * cyclic trash (CT), then although the CT is unreachable from * outside the current generation, CT may be reachable from the * callback. Then the callback could resurrect insane objects. * * Since the callback is never needed and may be unsafe in this case, * wr is simply left in the unreachable set. Note that because we * already called _PyWeakref_ClearRef(wr), its callback will never * trigger. * * OTOH, if wr isn't part of CT, we should invoke the callback: the * weakref outlived the trash. Note that since wr isn't CT in this * case, its callback can't be CT either -- wr acted as an external * root to this generation, and therefore its callback did too. So * nothing in CT is reachable from the callback either, so it's hard * to imagine how calling it later could create a problem for us. wr * is moved to wrcb_to_call in this case. */ if (IS_TENTATIVELY_UNREACHABLE(wr)) continue; The only way to guarantee the callback occurs is if you attach it to some other object that will outlive your object, *but* must also not get pulled into the same gc generation, otherwise it will *still not be called*. I believe this is the source of your advice to store the weakref in some globally reachable set. I believe given the way modules are currently deallocated, this is guaranteed to work. Should modules ever be included in the gc, then perhaps this would have to be revisited. -- Scott Dial scott@scottdial.com scodial@cs.indiana.edu

On Tue, Oct 20, 2009 at 12:59 PM, Scott Dial < scott+python-ideas@scottdial.com <scott%2Bpython-ideas@scottdial.com>>wrote:
Should modules ever be included in the gc, then perhaps this would have to be revisited.
You mean like this? http://bugs.python.org/issue812369 -- Daniel Stutzbach, Ph.D. President, Stutzbach Enterprises, LLC <http://stutzbachenterprises.com>

On Tue, Oct 20, 2009 at 11:59, Scott Dial <scott+python-ideas@scottdial.com> wrote:
Right, but if you want to guarantee your method will be called on shutdown it's better to fall back on an atexit handler. Once we start tearing down modules we can't promise any sane state, so it'd be better to disable weakref callbacks entirely. -- Adam Olsen, aka Rhamphoryncus

Scott Dial <scott+python-ideas@...> writes:
You're right, I've been too quick in posting this. The following works, however: import weakref, gc class Foo: _w = {} def __init__(self): k = id(self) def not_closed(_, d=Foo._w, k=k, r=repr(self)): del d[k] print ("%s not closed!" % r) Foo._w[k] = weakref.ref(self, not_closed) def close(self): # Close... # and then destroy weakref Foo._w.pop(id(self), None) x,y = Foo(), Foo() x.y, y.x = y, x y.close() del x del y gc.collect() print(gc.garbage) Cheers Antoine.
participants (6)
-
Adam Olsen
-
Antoine Pitrou
-
Arnaud Delobelle
-
Daniel Stutzbach
-
Nick Coghlan
-
Scott Dial