[Python-ideas] CPython's cyclic garbage collector (was Automatic context managers)

MRAB python at mrabarnett.plus.com
Mon Apr 29 18:49:03 CEST 2013


On 27/04/2013 02:56, Chris Angelico wrote:
> On Sat, Apr 27, 2013 at 9:45 AM, Dave Angel <davea at davea.name> wrote:
>> I didn't know there was a callback that a user could hook into.  That's very
>> interesting.
>>
>
> On Sat, Apr 27, 2013 at 10:22 AM, Skip Montanaro <skip at pobox.com> wrote:
>>> Whenever the GC finds a cycle that is unreferenced but uncollectable,
>>> it stores those objects in the list gc.garbage.  At that point, if the
>>> user wishes to clean up those cycles, it is up to them to delve into
>>> gc.garbage, untangle the objects contained within, break the cycles,
>>> and remove them from the list so that they can be freed by the ref
>>> counter.
>>
>> I wonder if it would be useful to provide a gc.garbagehook analogous
>> to sys.excepthook?
>> Users could assign a function of their choice to much the cyclic
>> garbage periodically.
>>
>> Just a thought, flying out of my fingers before my brain could stop it...
>
> As far as I know, Dave, there isn't currently one; Skip, that's close
> to what I'm talking about - it saves on the periodic check. But
> burying it in gc.garbagehook implies having a separate piece of code
> that knows how to break the reference cycles, whereas the __del__
> method puts the code right there in the code that has the problem.
> Actually, *ANY* solution to this problem implies having __del__ able
> to cope with the cycle being broken. Here's an example, perhaps a
> silly one, but not far different in nature from some things I've done
> in C++. (Granted, all the Python implementations of those same
> algorithms have involved built-in types rather than linked lists, but
> still.)
>
> class DLCircList:
> 	def __init__(self,payload):
> 		self.payload=payload
> 		self.next=self.prev=self
> 		print("Creating node: %s"%self.payload)
> 	def __del__(self):
> 		print("Deleting node %s from cycle %s"%(self.payload,self.enum()))
> 		self.prev.next=self.next
> 		self.next.prev=self.prev
> 	def attach(self,other):
> 		assert(self.next==self) # Don't attach twice
> 		self.prev=other
> 		self.next=other.next
> 		other.next=self
> 		self.next.prev=self
> 		print("Adding node %s to cycle %s"%(self.payload,self.enum()))
> 	def enum(self):
> 		"""Return a list of all node payloads in this cycle."""
> 		ptr=self.next
> 		nodes=[self.payload]
> 		while ptr!=self:
> 			nodes.append(ptr.payload)
> 			ptr=ptr.next
> 		return nodes
>
> lst=DLCircList("foo")
> DLCircList("bar").attach(lst)
> DLCircList("quux").attach(lst)
> DLCircList("asdf").attach(lst)
> DLCircList("qwer").attach(lst)
> DLCircList("zxcv").attach(lst)
> print("Enumerating list: %s"%lst.enum())
>
> del lst
> import gc
> gbg=gc.collect()
> print("And we have garbage: %s"%gbg)
> print(gc.garbage)
>
>
>
> Supposing you did this many many times, and you wanted decent garbage
> collection. How would you write a __del__ method, how would you write
> something to clean up gc.garbage? One way or another, something will
> have to deal with the possibility that the invariants have been
> broken, so my theory is that that possibility should be entirely
> within __del__. (Since __del__ calls enum(), it's possible for enum()
> to throw DestructedObject or whatever, but standard exception handling
> will deal with that.)
>
How about this:

If an object has a __collect__ method, then that method will be called 
whenever
the object is collected, either because its reference count has reached 
0 (or
maybe this should be done explicitly), or because it has been detected 
by the
GC as being part of a cycle. If the method is called (and doesn't raise an
exception?), then the object is not added to the garbage list.

The principal purpose of method is to give the object the chance to 
break any
cycles.

It should be noted that the method could be called more than once.

Here is a modified version of the code:


class DLCircList:
     def __init__(self,payload):
         self._collected = False
         self.payload = payload
         self.next = self.prev = self
         print("Creating node: %s" % self.payload)
     def __del__(self):
         self.__collect__() # Implicit or explicit?
         print("Deleting node %s" % self.payload)
     def attach(self,other):
         assert self.next == self # Don't attach twice
         self.prev = other
         self.next = other.next
         other.next = self
         self.next.prev = self
         print("Adding node %s to cycle %s" % (self.payload, self.enum()))
     def enum(self):
         """Return a list of all node payloads in this cycle."""
         ptr = self.next
         nodes = [self.payload]
         while ptr != self:
             nodes.append(ptr.payload)
             ptr = ptr.next
         return nodes
     def __collect__(self):
         print("Collecting node % s" % self.payload)
         if self.prev is None:
             # Already broken the cycle.
             print("Already collected %s" % self.payload)
             return

         self.prev.next = self.next
         self.next.prev = self.prev

         # Break the cycle.
         self.prev = self.next = None

def callback(phase, info):
     if phase == "stop":
         new_garbage = []

         for obj in gc.garbage:
             if hasattr(obj, "__collect__"):
                 obj.__collect__()
             else:
                 new_garbage.append(obj)

         gc.garbage[:] = new_garbage

import gc
gc.callbacks.append(callback)

lst = DLCircList("foo")
DLCircList("bar").attach(lst)
DLCircList("quux").attach(lst)
DLCircList("asdf").attach(lst)
DLCircList("qwer").attach(lst)
DLCircList("zxcv").attach(lst)
print("Enumerating list:  % s" % lst.enum())

del lst
print("And we have garbage #1:  % s" % gc.collect())
print("And we have garbage #2:  % s" % gc.collect())





More information about the Python-ideas mailing list