[Python-ideas] CPython's cyclic garbage collector (was Automatic context managers)
MRAB
python at mrabarnett.plus.com
Mon Apr 29 18:49:03 CEST 2013
On 27/04/2013 02:56, Chris Angelico wrote:
> On Sat, Apr 27, 2013 at 9:45 AM, Dave Angel <davea at davea.name> wrote:
>> I didn't know there was a callback that a user could hook into. That's very
>> interesting.
>>
>
> On Sat, Apr 27, 2013 at 10:22 AM, Skip Montanaro <skip at pobox.com> wrote:
>>> Whenever the GC finds a cycle that is unreferenced but uncollectable,
>>> it stores those objects in the list gc.garbage. At that point, if the
>>> user wishes to clean up those cycles, it is up to them to delve into
>>> gc.garbage, untangle the objects contained within, break the cycles,
>>> and remove them from the list so that they can be freed by the ref
>>> counter.
>>
>> I wonder if it would be useful to provide a gc.garbagehook analogous
>> to sys.excepthook?
>> Users could assign a function of their choice to much the cyclic
>> garbage periodically.
>>
>> Just a thought, flying out of my fingers before my brain could stop it...
>
> As far as I know, Dave, there isn't currently one; Skip, that's close
> to what I'm talking about - it saves on the periodic check. But
> burying it in gc.garbagehook implies having a separate piece of code
> that knows how to break the reference cycles, whereas the __del__
> method puts the code right there in the code that has the problem.
> Actually, *ANY* solution to this problem implies having __del__ able
> to cope with the cycle being broken. Here's an example, perhaps a
> silly one, but not far different in nature from some things I've done
> in C++. (Granted, all the Python implementations of those same
> algorithms have involved built-in types rather than linked lists, but
> still.)
>
> class DLCircList:
> def __init__(self,payload):
> self.payload=payload
> self.next=self.prev=self
> print("Creating node: %s"%self.payload)
> def __del__(self):
> print("Deleting node %s from cycle %s"%(self.payload,self.enum()))
> self.prev.next=self.next
> self.next.prev=self.prev
> def attach(self,other):
> assert(self.next==self) # Don't attach twice
> self.prev=other
> self.next=other.next
> other.next=self
> self.next.prev=self
> print("Adding node %s to cycle %s"%(self.payload,self.enum()))
> def enum(self):
> """Return a list of all node payloads in this cycle."""
> ptr=self.next
> nodes=[self.payload]
> while ptr!=self:
> nodes.append(ptr.payload)
> ptr=ptr.next
> return nodes
>
> lst=DLCircList("foo")
> DLCircList("bar").attach(lst)
> DLCircList("quux").attach(lst)
> DLCircList("asdf").attach(lst)
> DLCircList("qwer").attach(lst)
> DLCircList("zxcv").attach(lst)
> print("Enumerating list: %s"%lst.enum())
>
> del lst
> import gc
> gbg=gc.collect()
> print("And we have garbage: %s"%gbg)
> print(gc.garbage)
>
>
>
> Supposing you did this many many times, and you wanted decent garbage
> collection. How would you write a __del__ method, how would you write
> something to clean up gc.garbage? One way or another, something will
> have to deal with the possibility that the invariants have been
> broken, so my theory is that that possibility should be entirely
> within __del__. (Since __del__ calls enum(), it's possible for enum()
> to throw DestructedObject or whatever, but standard exception handling
> will deal with that.)
>
How about this:
If an object has a __collect__ method, then that method will be called
whenever
the object is collected, either because its reference count has reached
0 (or
maybe this should be done explicitly), or because it has been detected
by the
GC as being part of a cycle. If the method is called (and doesn't raise an
exception?), then the object is not added to the garbage list.
The principal purpose of method is to give the object the chance to
break any
cycles.
It should be noted that the method could be called more than once.
Here is a modified version of the code:
class DLCircList:
def __init__(self,payload):
self._collected = False
self.payload = payload
self.next = self.prev = self
print("Creating node: %s" % self.payload)
def __del__(self):
self.__collect__() # Implicit or explicit?
print("Deleting node %s" % self.payload)
def attach(self,other):
assert self.next == self # Don't attach twice
self.prev = other
self.next = other.next
other.next = self
self.next.prev = self
print("Adding node %s to cycle %s" % (self.payload, self.enum()))
def enum(self):
"""Return a list of all node payloads in this cycle."""
ptr = self.next
nodes = [self.payload]
while ptr != self:
nodes.append(ptr.payload)
ptr = ptr.next
return nodes
def __collect__(self):
print("Collecting node % s" % self.payload)
if self.prev is None:
# Already broken the cycle.
print("Already collected %s" % self.payload)
return
self.prev.next = self.next
self.next.prev = self.prev
# Break the cycle.
self.prev = self.next = None
def callback(phase, info):
if phase == "stop":
new_garbage = []
for obj in gc.garbage:
if hasattr(obj, "__collect__"):
obj.__collect__()
else:
new_garbage.append(obj)
gc.garbage[:] = new_garbage
import gc
gc.callbacks.append(callback)
lst = DLCircList("foo")
DLCircList("bar").attach(lst)
DLCircList("quux").attach(lst)
DLCircList("asdf").attach(lst)
DLCircList("qwer").attach(lst)
DLCircList("zxcv").attach(lst)
print("Enumerating list: % s" % lst.enum())
del lst
print("And we have garbage #1: % s" % gc.collect())
print("And we have garbage #2: % s" % gc.collect())
More information about the Python-ideas
mailing list