Re: [Python-ideas] CPython's cyclic garbage collector (was Automatic context managers)
On 27/04/2013 02:56, Chris Angelico wrote:
On Sat, Apr 27, 2013 at 9:45 AM, Dave Angel
wrote: I didn't know there was a callback that a user could hook into. That's very interesting.
On Sat, Apr 27, 2013 at 10:22 AM, Skip Montanaro
wrote: Whenever the GC finds a cycle that is unreferenced but uncollectable, it stores those objects in the list gc.garbage. At that point, if the user wishes to clean up those cycles, it is up to them to delve into gc.garbage, untangle the objects contained within, break the cycles, and remove them from the list so that they can be freed by the ref counter.
I wonder if it would be useful to provide a gc.garbagehook analogous to sys.excepthook? Users could assign a function of their choice to much the cyclic garbage periodically.
Just a thought, flying out of my fingers before my brain could stop it...
As far as I know, Dave, there isn't currently one; Skip, that's close to what I'm talking about - it saves on the periodic check. But burying it in gc.garbagehook implies having a separate piece of code that knows how to break the reference cycles, whereas the __del__ method puts the code right there in the code that has the problem. Actually, *ANY* solution to this problem implies having __del__ able to cope with the cycle being broken. Here's an example, perhaps a silly one, but not far different in nature from some things I've done in C++. (Granted, all the Python implementations of those same algorithms have involved built-in types rather than linked lists, but still.)
class DLCircList: def __init__(self,payload): self.payload=payload self.next=self.prev=self print("Creating node: %s"%self.payload) def __del__(self): print("Deleting node %s from cycle %s"%(self.payload,self.enum())) self.prev.next=self.next self.next.prev=self.prev def attach(self,other): assert(self.next==self) # Don't attach twice self.prev=other self.next=other.next other.next=self self.next.prev=self print("Adding node %s to cycle %s"%(self.payload,self.enum())) def enum(self): """Return a list of all node payloads in this cycle.""" ptr=self.next nodes=[self.payload] while ptr!=self: nodes.append(ptr.payload) ptr=ptr.next return nodes
lst=DLCircList("foo") DLCircList("bar").attach(lst) DLCircList("quux").attach(lst) DLCircList("asdf").attach(lst) DLCircList("qwer").attach(lst) DLCircList("zxcv").attach(lst) print("Enumerating list: %s"%lst.enum())
del lst import gc gbg=gc.collect() print("And we have garbage: %s"%gbg) print(gc.garbage)
Supposing you did this many many times, and you wanted decent garbage collection. How would you write a __del__ method, how would you write something to clean up gc.garbage? One way or another, something will have to deal with the possibility that the invariants have been broken, so my theory is that that possibility should be entirely within __del__. (Since __del__ calls enum(), it's possible for enum() to throw DestructedObject or whatever, but standard exception handling will deal with that.)
How about this: If an object has a __collect__ method, then that method will be called whenever the object is collected, either because its reference count has reached 0 (or maybe this should be done explicitly), or because it has been detected by the GC as being part of a cycle. If the method is called (and doesn't raise an exception?), then the object is not added to the garbage list. The principal purpose of method is to give the object the chance to break any cycles. It should be noted that the method could be called more than once. Here is a modified version of the code: class DLCircList: def __init__(self,payload): self._collected = False self.payload = payload self.next = self.prev = self print("Creating node: %s" % self.payload) def __del__(self): self.__collect__() # Implicit or explicit? print("Deleting node %s" % self.payload) def attach(self,other): assert self.next == self # Don't attach twice self.prev = other self.next = other.next other.next = self self.next.prev = self print("Adding node %s to cycle %s" % (self.payload, self.enum())) def enum(self): """Return a list of all node payloads in this cycle.""" ptr = self.next nodes = [self.payload] while ptr != self: nodes.append(ptr.payload) ptr = ptr.next return nodes def __collect__(self): print("Collecting node % s" % self.payload) if self.prev is None: # Already broken the cycle. print("Already collected %s" % self.payload) return self.prev.next = self.next self.next.prev = self.prev # Break the cycle. self.prev = self.next = None def callback(phase, info): if phase == "stop": new_garbage = [] for obj in gc.garbage: if hasattr(obj, "__collect__"): obj.__collect__() else: new_garbage.append(obj) gc.garbage[:] = new_garbage import gc gc.callbacks.append(callback) lst = DLCircList("foo") DLCircList("bar").attach(lst) DLCircList("quux").attach(lst) DLCircList("asdf").attach(lst) DLCircList("qwer").attach(lst) DLCircList("zxcv").attach(lst) print("Enumerating list: % s" % lst.enum()) del lst print("And we have garbage #1: % s" % gc.collect()) print("And we have garbage #2: % s" % gc.collect())
participants (2)
-
Chris Angelico
-
MRAB