gcmodule issue w/adding __del__ to generator objects

Working on the PEP 342/343 generator enhancements, I've got working send/throw/close() methods, but am not sure how to deal with getting __del__ to invoke close(). Naturally, I can add a "__del__" entry to its methods list easily enough, but the 'has_finalizer()' function in gcmodule.c only checks for a __del__ attribute on instance objects, and for tp_del on heap types. It looks to me like the correct fix would be to check for tp_del always, not just on heap types. However, when I tried this, I started getting warnings from the tests, saying that 22 uncollectable objects were being created (all generators, in test_generators). It seems that the tests create cycles via globals(), since they define a bunch of generator functions and then call them, saving the generator iterators (or objects that reference them) in global variables after investigating this a bit, it seems to me that either has_finalizer() needs to

At 05:50 PM 6/18/2005 -0400, Phillip J. Eby wrote:
Working on the PEP 342/343 generator enhancements, I've got working send/throw/close() methods, but am not sure how to deal with getting __del__ to invoke close(). Naturally, I can add a "__del__" entry to its methods list easily enough, but the 'has_finalizer()' function in gcmodule.c only checks for a __del__ attribute on instance objects, and for tp_del on heap types.
It looks to me like the correct fix would be to check for tp_del always, not just on heap types. However, when I tried this, I started getting warnings from the tests, saying that 22 uncollectable objects were being created (all generators, in test_generators).
It seems that the tests create cycles via globals(), since they define a bunch of generator functions and then call them, saving the generator iterators (or objects that reference them) in global variables
after investigating this a bit, it seems to me that either has_finalizer() needs to
Whoops. I hit send by accident. Anyway, the issue seems to mostly be that the tests create generator-iterators in global variables. With a bit of effort, I've been able to stomp most of the cycles.
_______________________________________________ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/pje%40telecommunity.com

Argh! My email client's shortcut for Send is Ctrl-E, which is the same as end-of-line in the editor I've been using all day. Anyway, the problem is that it seems to me as though actually checking for tp_del is too aggressive (conservative?) for generators, because sometimes a generator object is finished or un-started, and therefore can't resurrect objects during close(). However, I don't really know how to implement another strategy; gcmodule isn't exactly my forte. :) Any input from the GC gurus would be appreciated. Thanks! At 05:56 PM 6/18/2005 -0400, Phillip J. Eby wrote:
At 05:50 PM 6/18/2005 -0400, Phillip J. Eby wrote:
Working on the PEP 342/343 generator enhancements, I've got working send/throw/close() methods, but am not sure how to deal with getting __del__ to invoke close(). Naturally, I can add a "__del__" entry to its methods list easily enough, but the 'has_finalizer()' function in gcmodule.c only checks for a __del__ attribute on instance objects, and for tp_del on heap types.
It looks to me like the correct fix would be to check for tp_del always, not just on heap types. However, when I tried this, I started getting warnings from the tests, saying that 22 uncollectable objects were being created (all generators, in test_generators).
It seems that the tests create cycles via globals(), since they define a bunch of generator functions and then call them, saving the generator iterators (or objects that reference them) in global variables
after investigating this a bit, it seems to me that either has_finalizer() needs to
Whoops. I hit send by accident. Anyway, the issue seems to mostly be that the tests create generator-iterators in global variables. With a bit of effort, I've been able to stomp most of the cycles.

One more note; I tried changing generators to set their gi_frame to None whenever the generator finishes normally or with an error; this eliminated most of the reference cycles, and I was able to make test_generators work correctly with only 3 explicit close() calls, for the "fun" tests that use objects which hold references to generators that in turn reference the object. So, I think I've got this sorted out, assuming that I'm not doing something hideously insane by having 'has_finalizer()' always check tp_del even for non-heap types, and defining a tp_del slot for generators to call close() in. I ended up having to copy a bunch of stuff from typeobject.c in order to make this work, as there doesn't appear to be any way to share stuff like subtype_del and subtype_dealloc in a meaningful way with the generator code. At 06:00 PM 6/18/2005 -0400, Phillip J. Eby wrote:
Argh! My email client's shortcut for Send is Ctrl-E, which is the same as end-of-line in the editor I've been using all day. Anyway, the problem is that it seems to me as though actually checking for tp_del is too aggressive (conservative?) for generators, because sometimes a generator object is finished or un-started, and therefore can't resurrect objects during close(). However, I don't really know how to implement another strategy; gcmodule isn't exactly my forte. :) Any input from the GC gurus would be appreciated. Thanks!
At 05:56 PM 6/18/2005 -0400, Phillip J. Eby wrote:
At 05:50 PM 6/18/2005 -0400, Phillip J. Eby wrote:
Working on the PEP 342/343 generator enhancements, I've got working send/throw/close() methods, but am not sure how to deal with getting __del__ to invoke close(). Naturally, I can add a "__del__" entry to its methods list easily enough, but the 'has_finalizer()' function in gcmodule.c only checks for a __del__ attribute on instance objects, and for tp_del on heap types.
It looks to me like the correct fix would be to check for tp_del always, not just on heap types. However, when I tried this, I started getting warnings from the tests, saying that 22 uncollectable objects were being created (all generators, in test_generators).
It seems that the tests create cycles via globals(), since they define a bunch of generator functions and then call them, saving the generator iterators (or objects that reference them) in global variables
after investigating this a bit, it seems to me that either has_finalizer() needs to
Whoops. I hit send by accident. Anyway, the issue seems to mostly be that the tests create generator-iterators in global variables. With a bit of effort, I've been able to stomp most of the cycles.
_______________________________________________ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/pje%40telecommunity.com

On Sat, Jun 18, 2005 at 06:24:48PM -0400, Phillip J. Eby wrote:
So, I think I've got this sorted out, assuming that I'm not doing something hideously insane by having 'has_finalizer()' always check tp_del even for non-heap types, and defining a tp_del slot for generators to call close() in.
That sounds like the right thing to do. I suspect the "uncollectable cycles" problem will not be completely solvable. With this change, all generators become objects with finalizers. In reality, a 'file' object, for example, has a finalizer as well but it gets away without telling the GC that because its finalizer doesn't do anything "evil". Since generators can do arbitrary things, the GC must assume the worst. Most cycles involving enhanced generators can probably be broken by the GC because the generator is not in the strongly connected part of cycle. The GC will have to work a little harder to figure that out but that's probably not too significant. The real problem is that some cycles involving enhanced generators will not be breakable by the GC. I think some programs that used to work okay are now going to start leaking memory because objects will accumulate in gc.garbage. Now, I could be wrong about all this. I've have not been following the PEP 343 discussion too closely. Maybe Guido has some clever idea. Also, I find it difficult to hold in my head a complete model of how the GC now works. It's an incredibly subtle piece of code. Perhaps Tim can comment. Neil

At 06:50 PM 6/18/2005 -0600, Neil Schemenauer wrote:
On Sat, Jun 18, 2005 at 06:24:48PM -0400, Phillip J. Eby wrote:
So, I think I've got this sorted out, assuming that I'm not doing something hideously insane by having 'has_finalizer()' always check tp_del even for non-heap types, and defining a tp_del slot for generators to call close() in.
That sounds like the right thing to do.
I suspect the "uncollectable cycles" problem will not be completely solvable. With this change, all generators become objects with finalizers. In reality, a 'file' object, for example, has a finalizer as well but it gets away without telling the GC that because its finalizer doesn't do anything "evil". Since generators can do arbitrary things, the GC must assume the worst.
Yep. It's too bad that there's no simple way to guarantee that the generator won't resurrect anything. On the other hand, close() is guaranteed to give the generator at most one chance to do this. So, perhaps there's some way we could have the GC close() generators in unreachable cycles. No, wait, that would mean they could resurrect things, right? Argh.
Most cycles involving enhanced generators can probably be broken by the GC because the generator is not in the strongly connected part of cycle. The GC will have to work a little harder to figure that out but that's probably not too significant.
Yep; by setting the generator's frame to None, I was able to significantly reduce the number of generator cycles in the tests.
The real problem is that some cycles involving enhanced generators will not be breakable by the GC. I think some programs that used to work okay are now going to start leaking memory because objects will accumulate in gc.garbage.
Yep, unless we .close() generators after adding them to gc.garbage(), which *might* be an option. Although, I suppose if it *were* an option, then why doesn't GC already have some sort of ability to do this? (i.e. run __del__ methods on items in gc.garbage, then remove them if their refcount drops to 1 as a result). [...pause to spend 5 minutes working it out in pseudocode...] Okay, I think I see why you can't do it. You could guarantee that all relevant __del__ methods get called, but it's bloody difficult to end up with only unreachable items in gc.garbage afterwards. I think gc would have to keep a new list for items reachable from finalizers, that don't themselves have finalizers. Then, before creating gc.garbage, you walk the finalizers and call their finalization (__del__) methods. Then, you put any remaining items that are in either the finalizer list or the reachable-from-finalizers list into gc.garbage. This approach might need a new type slot, but it seems like it would let us guarantee that finalizers get called, even if the object ends up in garbage as a result. In the case of generators, however, close() guarantees that the generator releases all its references, and so can no longer be part of a cycle. Thus, it would guarantee eventual cleanup of all generators. And, it would lift the general limitation on __del__ methods. Hm. Sounds too good to be true. Surely if this were possible, Uncle Timmy would've thought of it already, no? Guess we'll have to wait and see what he thinks.
Now, I could be wrong about all this. I've have not been following the PEP 343 discussion too closely. Maybe Guido has some clever idea. Also, I find it difficult to hold in my head a complete model of how the GC now works. It's an incredibly subtle piece of code. Perhaps Tim can comment.
I'm hoping Uncle Timmy can work his usual algorithmic magic here and provide us with a brilliant but impossible-for-mere-mortals-to-understand solution. (The impossible-to-understand part being optional, of course. :) )

At 10:15 PM 6/18/2005 -0400, Phillip J. Eby wrote:
Okay, I think I see why you can't do it. You could guarantee that all relevant __del__ methods get called, but it's bloody difficult to end up with only unreachable items in gc.garbage afterwards. I think gc would have to keep a new list for items reachable from finalizers, that don't themselves have finalizers. Then, before creating gc.garbage, you walk the finalizers and call their finalization (__del__) methods. Then, you put any remaining items that are in either the finalizer list or the reachable-from-finalizers list into gc.garbage.
This approach might need a new type slot, but it seems like it would let us guarantee that finalizers get called, even if the object ends up in garbage as a result. In the case of generators, however, close() guarantees that the generator releases all its references, and so can no longer be part of a cycle. Thus, it would guarantee eventual cleanup of all generators. And, it would lift the general limitation on __del__ methods.
Hm. Sounds too good to be true. Surely if this were possible, Uncle Timmy would've thought of it already, no? Guess we'll have to wait and see what he thinks.
Or maybe not. After sleeping on it, I realized that the problems are all in when and how often __del__ is called. The idea I had above would end up calling __del__ twice on non-generator objects. For generators it's not a problem because the first call ends up ensuring that the second call is a no-op. However, the *order* of __del__ calls makes a difference, even for generators. What good is a finally: clause if all the objects reachable from it have been finalized already, anyway? Ultimately, I'm thinking that maybe we were right not to allow try-finally to cross yield boundaries in generators. It doesn't seem like you can guarantee anything about the behavior in the presence of cycles, so what's the point? For a while I played around with the idea that maybe we could still support 'with:' in generators, though, because to implement that we could make frames call __exit__ on any pending 'with' blocks as part of their tp_clear operation. This would only work, however, if the objects with __exit__ methods don't have any references back to the frame. In essence, you'd need a way to put the __exit__ objects on a GC-managed list that wouldn't run until after all the tp_clear calls had finished. But even that is tough to make guarantees about. For example, can you guarantee in that case that a generator's 'with:' blocks are __exit__-ed in the proper order? Really, if we do allow 'with' and 'try-finally' to surround yield, I think we're going to have to tell people that it only works if you use a with or try-finally in some non-generator code to ensure that the generator.close() gets called, and that if you end up creating a garbage cycle, we either have to let it end up in gc.garbage, or just not execute its finally clause or __exit__ methods. Of course, this sort of happens right now for other things with __del__; if it's part of a cycle the __del__ method never gets called. The only difference is that it hangs around in gc.garbage, doing nothing useful. If it's garbage, it's not reachable from anywhere else, so it does nobody any good to have it around. So, maybe we should just say, "sucks to be you" and tp_clear anything that we'd otherwise have put in gc.garbage. :) In other words, since we're not going to call those __del__ methods anyway, maybe it just needs to be part of the language semantics that __del__ isn't guaranteed to be called, and a garbage collector that can't find a safe way to call it, doesn't have to. tp_dealloc for classic classes and heap types could then just skip calling __del__ if they've already been cleared... oh wait, how do you know you've been cleared? Argh. Another nice idea runs up on the rocks of reality. On the other hand, if you go ahead and run __del__ after tp_clear, the __del__ method will quickly run afoul of an AttributeError and die with only a minor spew to sys.stderr, thus encouraging people to get rid of their silly useless __del__ methods on objects that normally end up in cycles. :)

Sigh. Looks like Guido already used the time machine to bring up these ideas five years ago: http://mail.python.org/pipermail/python-dev/2000-March/002514.html And apparently you went back with him: http://mail.python.org/pipermail/python-dev/2000-March/002478.html So I give up, 'cause there's no way I can compete with you time travellers. :) Although I do wonder -- why was __cleanup__ never implemented? The only clue seems to be Guido's comment that he "[finds] having a separate __cleanup__ protocol cumbersome." It certainly seems to me that having a __cleanup__ that allows an object to handle itself being garbage would be handy, although it's only meaningful to have a __cleanup__ if you also have a __del__; otherwise, there would never be a reason to call it. Maybe that's the reason it was considered cumbersome. At 04:16 PM 6/19/2005 -0400, Phillip J. Eby wrote:
At 10:15 PM 6/18/2005 -0400, Phillip J. Eby wrote:
Okay, I think I see why you can't do it. You could guarantee that all relevant __del__ methods get called, but it's bloody difficult to end up with only unreachable items in gc.garbage afterwards. I think gc would have to keep a new list for items reachable from finalizers, that don't themselves have finalizers. Then, before creating gc.garbage, you walk the finalizers and call their finalization (__del__) methods. Then, you put any remaining items that are in either the finalizer list or the reachable-from-finalizers list into gc.garbage.
This approach might need a new type slot, but it seems like it would let us guarantee that finalizers get called, even if the object ends up in garbage as a result. In the case of generators, however, close() guarantees that the generator releases all its references, and so can no longer be part of a cycle. Thus, it would guarantee eventual cleanup of all generators. And, it would lift the general limitation on __del__ methods.
Hm. Sounds too good to be true. Surely if this were possible, Uncle Timmy would've thought of it already, no? Guess we'll have to wait and see what he thinks.
Or maybe not. After sleeping on it, I realized that the problems are all in when and how often __del__ is called. The idea I had above would end up calling __del__ twice on non-generator objects. For generators it's not a problem because the first call ends up ensuring that the second call is a no-op.
However, the *order* of __del__ calls makes a difference, even for generators. What good is a finally: clause if all the objects reachable from it have been finalized already, anyway?
Ultimately, I'm thinking that maybe we were right not to allow try-finally to cross yield boundaries in generators. It doesn't seem like you can guarantee anything about the behavior in the presence of cycles, so what's the point?
For a while I played around with the idea that maybe we could still support 'with:' in generators, though, because to implement that we could make frames call __exit__ on any pending 'with' blocks as part of their tp_clear operation. This would only work, however, if the objects with __exit__ methods don't have any references back to the frame. In essence, you'd need a way to put the __exit__ objects on a GC-managed list that wouldn't run until after all the tp_clear calls had finished.
But even that is tough to make guarantees about. For example, can you guarantee in that case that a generator's 'with:' blocks are __exit__-ed in the proper order?
Really, if we do allow 'with' and 'try-finally' to surround yield, I think we're going to have to tell people that it only works if you use a with or try-finally in some non-generator code to ensure that the generator.close() gets called, and that if you end up creating a garbage cycle, we either have to let it end up in gc.garbage, or just not execute its finally clause or __exit__ methods.
Of course, this sort of happens right now for other things with __del__; if it's part of a cycle the __del__ method never gets called. The only difference is that it hangs around in gc.garbage, doing nothing useful. If it's garbage, it's not reachable from anywhere else, so it does nobody any good to have it around. So, maybe we should just say, "sucks to be you" and tp_clear anything that we'd otherwise have put in gc.garbage. :)
In other words, since we're not going to call those __del__ methods anyway, maybe it just needs to be part of the language semantics that __del__ isn't guaranteed to be called, and a garbage collector that can't find a safe way to call it, doesn't have to. tp_dealloc for classic classes and heap types could then just skip calling __del__ if they've already been cleared... oh wait, how do you know you've been cleared? Argh. Another nice idea runs up on the rocks of reality.
On the other hand, if you go ahead and run __del__ after tp_clear, the __del__ method will quickly run afoul of an AttributeError and die with only a minor spew to sys.stderr, thus encouraging people to get rid of their silly useless __del__ methods on objects that normally end up in cycles. :)
_______________________________________________ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/pje%40telecommunity.com
participants (2)
-
Neil Schemenauer
-
Phillip J. Eby