python-like garbage collector & workaround
I am writing a garbage collector that is similar to Python's. I want to know what you think, what problems I may encounter, and what kind of value I'm looking at. For my review of literature, I have read excerpts from, and stepped through, Python's GC. I'm picturing it as a specialized breadth-first search. I am concerned by the inability to call user-defined finalization methods. I'm considering a workaround that performs GC in two steps. First, it requests the objects to drop their references that participate in the cycle. Then, it enqueues the decref'ed object for an unnested destruction. Here is a proof-of-concept implementation. http://groups.google.com/group/comp.lang.python/browse_thread/thread/d3bb410...
castironpi-ng@comcast.net wrote:
I'm considering a workaround that performs GC in two steps. First, it requests the objects to drop their references that participate in the cycle. Then, it enqueues the decref'ed object for an unnested destruction.
I don't see how that solves anything. The problem is that the destructors might depend on other objects in the cycle that have already been deallocated. Deferring the calling of the destructors doesn't help with that. The only thing that will help is decoupling the destructor from the object being destroyed. You can do that now by storing a weak reference to the object with the destructor as a callback. But the destructor needs to be designed so that it can work without holding any reference to the object being destroyed, since it will no longer exist by the time the destructor is called. -- Greg
----- Original Message ----- From: "Greg Ewing" < greg.ewing@canterbury.ac.nz > To: castironpi-ng@comcast.net Cc: Python-ideas@python.org Sent: Saturday, March 28, 2009 5:32:27 PM GMT -06:00 US/Canada Central Subject: Re: [Python-ideas] python-like garbage collector & workaround castironpi-ng@comcast.net wrote:
I'm considering a workaround that performs GC in two steps. First, it requests the objects to drop their references that participate in the cycle. Then, it enqueues the decref'ed object for an unnested destruction.
I don't see how that solves anything. The problem is that the destructors might depend on other objects in the cycle that have already been deallocated. Deferring the calling of the destructors doesn't help with that. The only thing that will help is decoupling the destructor from the object being destroyed. You can do that now by storing a weak reference to the object with the destructor as a callback. But the destructor needs to be designed so that it can work without holding any reference to the object being destroyed, since it will no longer exist by the time the destructor is called. -- Greg ==================================== Nice response time.
Deferring the calling of the destructors doesn't help with that.
I beg to differ. There is a complex example in the test code at the address. Here is a simple one. 'A' has a reference to 'B' and 'B' has a reference to 'A'. They both need to call each other's methods during their respective finalizations. 1. Ref counts: A-1, B-1 2. Request A to drop ref. to B. 3. Ref counts: A-1, B-0. 4. Finalize & deallocate B. 5. ... B drops ref. to A 6. Ref counts: A-0 7. Finalize & deallocate A. 'A' performs its final call to 'B' in step 2, still having a reference to it. It empties the attribute of its own that refers to B. 'B's reference count goes to 0. 'B' performs its final call to 'A' in step 5, still having a reference to it. 'A' still has control of its fields, and can make remaining subordinate calls if necessary. 'B' releases its reference to 'A', and 'A's reference count goes to zero. 'B' is deallocated. 'A' performs its finalization, and should check its field to see if it still has the reference to B. If it did, it would perform the call in step 2. In this case, it doesn't, and it can keep a record of the fact that it already made that final call. 'A's finalizer exits without any calls to 'B', because the field that held its reference to 'B' is clear. 'A' is deallocated.
But the destructor needs to be designed so that it can work without holding any reference to the object being destroyed
I want to give 'A' control of that. To accomplish this, I bring it to A's attention the fact that it has left reachability, /and/ is in a cycle with B. It can perform its normal finalization at this time and maintain its consistency of state. I believe it solves the problem of failing to call the destructor at all, but I may have just shirked it. Will it work?
2009/3/29 <castironpi-ng@comcast.net>:
----- Original Message ----- From: "Greg Ewing" <greg.ewing@canterbury.ac.nz> To: castironpi-ng@comcast.net Cc: Python-ideas@python.org Sent: Saturday, March 28, 2009 5:32:27 PM GMT -06:00 US/Canada Central Subject: Re: [Python-ideas] python-like garbage collector & workaround
castironpi-ng@comcast.net wrote:
> I'm considering a workaround that performs GC in two steps. First, it
requests the objects to drop their references that participate in the cycle. Then, it enqueues the decref'ed object for an unnested destruction.
Castironpi, I don't think your solution solves the problem. In a single stage finalization design, it is allways possible to call the destructors of the objects in the cycle in random order. The problem is that now when A gets finalized, it cannot use its reference to B anymore because B may have already been finalized, and thus we cannot assume B can still be used for anything usefull. The problem, of course, is that one of A or B may still need the other during its finalization. In your solution, the real question is what the state of an object is supposed to be when it is in between the two stages of finalization. Is it still supposed to be a fully functional object, that handles all operations just as if it were still fully alive? In that case the object can only drop the references that it doesn't actually need to perform any of its operations (not just finalization). But if we assume that an object has all its references for a reason, there is nothing it can drop. (except if it uses a reference for caching or similar things. But I think that is only a minority of all use cases.) If you propose an object counts as 'finalized' (or at least, no longer fully functional) when it is in between stages of finalization, we have the same problem as in the single stage random order finalization: other objects that refer to it can no longer use it for anything usefull. The only option that is left is to have the object be in some in-between state. But that really complicates Pythons object model, because every object now has two visible states: alive and about-to-die. So every object that wants to support this form of finalization has to specify what kind of operations are still available in its about-to-die state, and all destructors of all objects need to restrict themselves to only these kind of operations. And then, of course, there is still the question of what to do if there are still cycles left after the first stage. If you still think your proposal is usefull, you'll probably need to explain why these problems don't matter enough or whether there are important use cases that it solves.
On Mar 30, 7:36 pm, Jan Kanis <jan.ka...@phil.uu.nl> wrote:
2009/3/29 <castironpi...@comcast.net>:
----- Original Message ----- From: "Greg Ewing" <greg.ew...@canterbury.ac.nz> To: castironpi...@comcast.net Cc: Python-id...@python.org Sent: Saturday, March 28, 2009 5:32:27 PM GMT -06:00 US/Canada Central Subject: Re: [Python-ideas] python-like garbage collector & workaround
castironpi...@comcast.net wrote:
> I'm considering a workaround that performs GC in two steps. First, it
requests the objects to drop their references that participate in the cycle. Then, it enqueues the decref'ed object for an unnested destruction.
Castironpi, I don't think your solution solves the problem. In a single stage finalization design, it is allways possible to call the destructors of the objects in the cycle in random order. The problem is that now when A gets finalized, it cannot use its reference to B anymore because B may have already been finalized, and thus we cannot assume B can still be used for anything usefull. The problem, of course, is that one of A or B may still need the other during its finalization.
In your solution, the real question is what the state of an object is supposed to be when it is in between the two stages of finalization. Is it still supposed to be a fully functional object, that handles all operations just as if it were still fully alive? In that case the object can only drop the references that it doesn't actually need to perform any of its operations (not just finalization). But if we assume that an object has all its references for a reason, there is nothing it can drop. (except if it uses a reference for caching or similar things. But I think that is only a minority of all use cases.) If you propose an object counts as 'finalized' (or at least, no longer fully functional) when it is in between stages of finalization, we have the same problem as in the single stage random order finalization: other objects that refer to it can no longer use it for anything usefull.
The only option that is left is to have the object be in some in-between state. But that really complicates Pythons object model, because every object now has two visible states: alive and about-to-die. So every object that wants to support this form of finalization has to specify what kind of operations are still available in its about-to-die state, and all destructors of all objects need to restrict themselves to only these kind of operations. And then, of course, there is still the question of what to do if there are still cycles left after the first stage.
If you still think your proposal is usefull, you'll probably need to explain why these problems don't matter enough or whether there are important use cases that it solves. _______________________________________________ Python-ideas mailing list Python-id...@python.orghttp://mail.python.org/mailman/listinfo/python-ideas
Hi, sorry for the delay. Not to beat a dead horse, but I've tried c- l- py for the technical discussion. For user-defined types, the only option is currently to scan the 'gc.garbage' collection. I want to provide more options for the users of my system. I was thinking about implementing weak-refs, so long as it's worth it. That's one approach that I understand doesn't have the same problem. However, that risks losing an object in the middle of execution, not just when both have left reachability. Another option is to add a 'is reachable' system call. Weak-refs would be akin to an 'still exists' call. class StarCrossedLover: def __init__( self, other ): self.other= other def __del__( self ): kill( self.other ) romero= StarCrossedLover( ) joriet= StarCrossedLover( ) In this example, I think weak-refs would be most accurate. However, that litters the entire class code with weak-ref testing. Under my model, self.other may or may not be valid in the destructor. With weak-refs, self.other may or may not be valid anywhere at all! However, they are consistent and __del__ methods can always get called. Should I add them? Thanks in advance.
participants (4)
-
Aaron Brady
-
castironpi-ng@comcast.net
-
Greg Ewing
-
Jan Kanis