DEBUG_SAVEALL feature for gc not in 2.0b1?
Neil sent me a patch a week or two ago that implemented a DEBUG_SAVEALL flag for the gc module. If set, it assigns all cyclic garbage to gc.garbage instead of deleting it, thus resurrecting the garbage so you can inspect it. This seems not to have made it into the CS repository. I think this is good mojo and deserves to be in the distribution, if not for the release, then for 2.1 at least. I've attached the patch Neil sent me (which includes code, doc and test updates). It's helped me track down one (stupid) cyclic trash bug in my own code. Neil, unless there are strong arguments to the contrary, I recommend you submit a patch to SF. Skip
On Fri, Sep 01, 2000 at 10:03:30AM -0500, Skip Montanaro wrote:
Neil sent me a patch a week or two ago that implemented a DEBUG_SAVEALL flag for the gc module.
I didn't submit the patch to SF yet because I am thinking of redesigning the gc module API. I really don't like the current bitmask interface for setting options. The redesign could wait for 2.1 but it would be nice to not have to change a published API. Does anyone have any ideas on a good interface for setting various GC options? There may be many options and they may change with the evolution of the collector. My current idea is to use something like: gc.get_option(<name>) gc.set_option(<name>, <value>, ...) with the module defining constants for options. For example: gc.set_option(gc.DEBUG_LEAK, 1) would enable leak debugging. Does this look okay? Should I try to get it done for 2.0? Neil
Neil Schemenauer wrote:
I didn't submit the patch to SF yet because I am thinking of redesigning the gc module API. I really don't like the current bitmask interface for setting options.
Why? There's nothing wrong with it.
Does anyone have any ideas on a good interface for setting various GC options? There may be many options and they may change with the evolution of the collector. My current idea is to use something like:
gc.get_option(<name>)
gc.set_option(<name>, <value>, ...)
with the module defining constants for options. For example:
gc.set_option(gc.DEBUG_LEAK, 1)
would enable leak debugging. Does this look okay? Should I try to get it done for 2.0?
This is too much. Don't worry, it's perfect as is. Also, I support the idea of exporting the collected garbage for debugging -- haven't looked at the patch though. Is it possible to collect it subsequently? -- Vladimir MARANGOZOV | Vladimir.Marangozov@inrialpes.fr http://sirac.inrialpes.fr/~marangoz | tel:(+33-4)76615277 fax:76615252
On Fri, Sep 01, 2000 at 11:08:14PM +0200, Vladimir Marangozov wrote:
Also, I support the idea of exporting the collected garbage for debugging -- haven't looked at the patch though. Is it possible to collect it subsequently?
No. Once objects are in gc.garbage they are back under the users control. How do you see things working otherwise? Neil
Neil Schemenauer wrote:
On Fri, Sep 01, 2000 at 11:08:14PM +0200, Vladimir Marangozov wrote:
Also, I support the idea of exporting the collected garbage for debugging -- haven't looked at the patch though. Is it possible to collect it subsequently?
No. Once objects are in gc.garbage they are back under the users control. How do you see things working otherwise?
By putting them in gc.collected_garbage. The next collect() should be able to empty this list if the DEBUG_SAVEALL flag is not set. Do you see any problems with this? -- Vladimir MARANGOZOV | Vladimir.Marangozov@inrialpes.fr http://sirac.inrialpes.fr/~marangoz | tel:(+33-4)76615277 fax:76615252
On Fri, Sep 01, 2000 at 11:47:59PM +0200, Vladimir Marangozov wrote:
By putting them in gc.collected_garbage. The next collect() should be able to empty this list if the DEBUG_SAVEALL flag is not set. Do you see any problems with this?
I don't really see the point. If someone has set the SAVEALL flag then they are obviously debugging a program. I don't see much point in the GC cleaning up this garbage. The user can do it if they like. I have an idea for an alternate interface. What if there was a gc.handle_garbage hook which could be set to a function? The collector would pass garbage objects to this function one at a time. If the function returns true then it means that the garbage was handled and the collector should not call tp_clear. These handlers could be chained together like import hooks. The default handler would simply append to the gc.garbage list. If a debugging flag was set then all found garbage would be passed to this function rather than just uncollectable garbage. Skip, would a hook like this be useful to you? Neil
Neil> On Fri, Sep 01, 2000 at 11:47:59PM +0200, Vladimir Marangozov wrote: >> By putting them in gc.collected_garbage. The next collect() should be >> able to empty this list if the DEBUG_SAVEALL flag is not set. Do you >> see any problems with this? Neil> I don't really see the point. If someone has set the SAVEALL flag Neil> then they are obviously debugging a program. I don't see much Neil> point in the GC cleaning up this garbage. The user can do it if Neil> they like. Agreed. Neil> I have an idea for an alternate interface. What if there was a Neil> gc.handle_garbage hook which could be set to a function? The Neil> collector would pass garbage objects to this function one at a Neil> time. If the function returns true then it means that the garbage Neil> was handled and the collector should not call tp_clear. These Neil> handlers could be chained together like import hooks. The default Neil> handler would simply append to the gc.garbage list. If a Neil> debugging flag was set then all found garbage would be passed to Neil> this function rather than just uncollectable garbage. Neil> Skip, would a hook like this be useful to you? Sounds too complex for my feeble brain... ;-) What's the difference between "found garbage" and "uncollectable garbage"? What sort of garbage are you appending to gc.garbage now? I thought by the very nature of your garbage collector, anything it could free was otherwise "uncollectable". S
On Fri, Sep 01, 2000 at 09:03:51PM -0500, Skip Montanaro wrote:
What's the difference between "found garbage" and "uncollectable garbage"?
I use the term uncollectable garbage for objects that the collector cannot call tp_clear on because of __del__ methods. These objects are added to gc.garbage (actually, just the instances). If SAVEALL is enabled then all objects found are saved in gc.garbage and tp_clear is not called. Here is an example of how to use my proposed handle_garbage hook: class Vertex: def __init__(self): self.edges = [] def add_edge(self, e): self.edges.append(e) def __del__(self): do_something() class Edge: def __init__(self, vertex_in, vertex_out): self.vertex_in = vertex_in vertex_in.add_edget(self) self.vertex_out = vertex_out vertex_out.add_edget(self) This graph structure contains cycles and will not be collected by reference counting. It is also "uncollectable" because it contains a finalizer on a strongly connected component (ie. other objects in the cycle are reachable from the __del__ method). With the current garbage collector, instances of Edge and Vertex will appear in gc.garbage when found to be unreachable by the rest of Python. The application could then periodicly do: for obj in gc.garbage: if isinstance(obj, Vertex): obj.__dict__.clear() which would break the reference cycles. If a handle_garbage hook existed the application could do: def break_graph_cycle(obj, next=gc.handle_garbage): if isinstance(obj, Vertex): obj.__dict__.clear() return 1 else: return next(obj) gc.handle_garbage = break_graph_cycle If you had a leaking program you could use this hook to debug it: def debug_cycle(obj, next=gc.handle_garbage): print "garbage:", repr(obj) return gc.handle_garbage The hook seems to be more general than the gc.garbage list. Neil
What sort of garbage are you appending to gc.garbage now? I thought by the very nature of your garbage collector, anything it could free was otherwise "uncollectable".
S
Neil Schemenauer wrote:
On Fri, Sep 01, 2000 at 11:47:59PM +0200, Vladimir Marangozov wrote:
By putting them in gc.collected_garbage. The next collect() should be able to empty this list if the DEBUG_SAVEALL flag is not set. Do you see any problems with this?
I don't really see the point. If someone has set the SAVEALL flag then they are obviously debugging a program. I don't see much point in the GC cleaning up this garbage. The user can do it if they like.
The point is that we have two types of garbage: collectable and uncollectable. Uncollectable garbage is already saved in gc.garbage with or without debugging. Uncollectable garbage is the most harmful. Fixing the program to avoid that garbage is supposed to have top-ranked priority. The discussion now goes on taking that one step further, i.e. make sure that no cycles are created at all, ever. This is what Skip wants. Skip wants to have access to the collectable garbage and cleanup at best the code w.r.t. cycles. Fine, but collectable garbage is priority 2 and mixing the two types of garbage is not nice. It is not nice because the collector can deal with collectable garbage, but gives up on the uncollectable one. This distinction in functionality is important. That's why I suggested to save the collectable garbage in gc.collected. In this context, the name SAVEALL is a bit misleading. Uncollectable garbage is already saved. What's missing is a flag & support to save the collectable garbage. SAVECOLLECTED is a name on target. Further, the collect() function should be able to clear gc.collected if it is not empty and if SAVEUNCOLLECTED is not set. This should not be perceived as a big deal, though. I see it as a nicety for overall consistency.
I have an idea for an alternate interface. What if there was a gc.handle_garbage hook which could be set to a function? The collector would pass garbage objects to this function one at a time.
This is too much. The idea here is to detect garbage earlier, but given that one can set gc.threshold(1,0,0), thus invoking the collector on every allocation, one gets the same effect with DEBUG_LEAK. There's little to no added value. Such hook may also exercise the latest changes Jeremy checked in: if an exception is raised after GC, Python will scream at you with a fatal error. I don't think it's a good idea to mix Python and C too much for such a low-level machinery as the garbage collector. -- Vladimir MARANGOZOV | Vladimir.Marangozov@inrialpes.fr http://sirac.inrialpes.fr/~marangoz | tel:(+33-4)76615277 fax:76615252
Vlad> The discussion now goes on taking that one step further, i.e. Vlad> make sure that no cycles are created at all, ever. This is what Vlad> Skip wants. Skip wants to have access to the collectable garbage Vlad> and cleanup at best the code w.r.t. cycles. If I read my (patched) version of gcmodule.c correctly, with the gc.DEBUG_SAVEALL bit set, gc.garbage *does* acquire all garbage, not just the stuff with __del__ methods. In delete_garbage I see if (debug & DEBUG_SAVEALL) { PyList_Append(garbage, op); } else { ... usual collection business here ... } Skip
Skip Montanaro wrote:
If I read my (patched) version of gcmodule.c correctly, with the gc.DEBUG_SAVEALL bit set, gc.garbage *does* acquire all garbage, not just the stuff with __del__ methods.
Yes. And you don't know which objects are collectable and which ones are not by this collector. That is, SAVEALL transforms the collector in a cycle detector. The collectable and uncollectable objects belong to two disjoint sets. I was arguing about this distinction, because collectable garbage is not considered garbage any more, uncollectable garbage is the real garbage left, but if you think this distinction doesn't serve you any purpose, fine. -- Vladimir MARANGOZOV | Vladimir.Marangozov@inrialpes.fr http://sirac.inrialpes.fr/~marangoz | tel:(+33-4)76615277 fax:76615252
Vlad> Skip Montanaro wrote: >> >> If I read my (patched) version of gcmodule.c correctly, with the >> gc.DEBUG_SAVEALL bit set, gc.garbage *does* acquire all garbage, not >> just the stuff with __del__ methods. Vlad> Yes. And you don't know which objects are collectable and which Vlad> ones are not by this collector. That is, SAVEALL transforms the Vlad> collector in a cycle detector. Which is precisely what I want. I'm trying to locate cycles in a long-running program. In that environment collectable and uncollectable garbage are just as bad since I still use 1.5.2 in production. Skip
Skip Montanaro wrote:
Vlad> Skip Montanaro wrote: >> >> If I read my (patched) version of gcmodule.c correctly, with the >> gc.DEBUG_SAVEALL bit set, gc.garbage *does* acquire all garbage, not >> just the stuff with __del__ methods.
Vlad> Yes. And you don't know which objects are collectable and which Vlad> ones are not by this collector. That is, SAVEALL transforms the Vlad> collector in a cycle detector.
Which is precisely what I want.
All right! Since I haven't seen any votes, here's a +1. I'm willing to handle Neil's patch at SF and let it in after some minor cleanup that we'll discuss on the patch manager. Any objections or other opinions on this? -- Vladimir MARANGOZOV | Vladimir.Marangozov@inrialpes.fr http://sirac.inrialpes.fr/~marangoz | tel:(+33-4)76615277 fax:76615252
Vladimir.Marangozov@inrialpes.fr (Vladimir Marangozov):
The point is that we have two types of garbage: collectable and uncollectable.
I don't think these are the right terms. The collector can collect the "uncollectable" garbage all right -- what it can't do is *dispose* of it. So it should be called "undisposable" or "unrecyclable" or "undigestable" something. Greg Ewing, Computer Science Dept, +--------------------------------------+ University of Canterbury, | A citizen of NewZealandCorp, a | Christchurch, New Zealand | wholly-owned subsidiary of USA Inc. | greg@cosc.canterbury.ac.nz +--------------------------------------+
"Neil" == Neil Schemenauer <nascheme@enme.ucalgary.ca> writes:
Neil> On Fri, Sep 01, 2000 at 11:08:14PM +0200, Vladimir Marangozov wrote: >> Also, I support the idea of exporting the collected garbage for >> debugging -- haven't looked at the patch though. Is it possible >> to collect it subsequently? Neil> No. Once objects are in gc.garbage they are back under the users Neil> control. How do you see things working otherwise? Can't you just turn off gc.DEBUG_SAVEALL and reinitialize gc.garbage to []? Skip
participants (4)
-
Greg Ewing
-
Neil Schemenauer
-
Skip Montanaro
-
Vladimir.Marangozov@inrialpes.fr