Re: [Patches] Reference cycle collection for Python
[I don't like to cross-post to patches and python-dev, but I think this belongs in patches because it's a followup to Neil's post there and also in -dev because of its longer-term importance.] Thanks for the new patches, Neil! We had a visitor here at CNRI today, Eric Tiedemann <est@hyperreal.org>, who had a look at your patches before. Eric knows his way around the Scheme, Lisp and GC literature, and presented a variant on your approach which takes the bite out of the recursive passes. Eric had commented earlier on Neil's previous code, and I had used the morning to make myself familiar with Neil's code. This was relatively easy because Neil's code is very clear. Today, Eric proposed to do away with Neil's hash table altogether -- as long as we're wasting memory, we might as well add 3 fields to each container object rather than allocating the same amount in a separate hash table. Eric expects that this will run faster, although this obviously needs to be tried. Container types are: dict, list, tuple, class, instance; plus potentially user-defined container types such as kjbuckets. I have a feeling that function objects should also be considered container types, because of the cycle involving globals. Eric's algorithm, then, consists of the following parts. Each container object has three new fields: gc_next, gc_prev, and gc_refs. (Eric calls the gc_refs "refcount-zero".) We color objects white (initial), gray (root), black (scanned root). (The terms are explained later; we believe we don't actually need bits in the objects to store the color; see later.) All container objects are chained together in a doubly-linked list -- this is the same as Neil's code except Neil does it only for dicts. (Eric postulates that you need a list header.) When GC is activated, all objects are colored white; we make a pass over the entire list and set gc_refs equal to the refcount for each object. Next, we make another pass over the list to collect the internal references. Internal references are (just like in Neil's version) references from other container types. In Neil's version, this was recursive; in Eric's version, we don't need recursion, since the list already contains all containers. So we simple visit the containers in the list in turn, and for each one we go over all the objects it references and subtract one from *its* gc_refs field. (Eric left out the little detail that we ened to be able to distinguish between container and non-container objects amongst those references; this can be a flag bit in the type field.) Now, similar to Neil's version, all objects for which gc_refs == 0 have only internal references, and are potential garbage; all objects for which gc_refs > 0 are "roots". These have references to them from other places, e.g. from globals or stack frames in the Python virtual machine. We now start a second list, to which we will move all roots. The way to do this is to go over the first list again and to move each object that has gc_refs > 0 to the second list. Objects placed on the second list in this phase are considered colored gray (roots). Of course, some roots will reference some non-roots, which keeps those non-roots alive. We now make a pass over the second list, where for each object on the second list, we look at every object it references. If a referenced object is a container and is still in the first list (colored white) we *append* it to the second list (colored gray). Because we append, objects thus added to the second list will eventually be considered by this same pass; when we stop finding objects that sre still white, we stop appending to the second list, and we will eventually terminate this pass. Conceptually, objects on the second list that have been scanned in this pass are colored black (scanned root); but there is no need to to actually make the distinction. (How do we know whether an object pointed to is white (in the first list) or gray or black (in the second)? We could use an extra bitfield, but that's a waste of space. Better: we could set gc_refs to a magic value (e.g. 0xffffffff) when we move the object to the second list. During the meeting, I proposed to set the back pointer to NULL; that might work too but I think the gc_refs field is more elegant. We could even just test for a non-zero gc_refs field; the roots moved to the second list initially all have a non-zero gc_refs field already, and for the objects with a zero gc_refs field we could indeed set it to something arbitrary.) Once we reach the end of the second list, all objects still left in the first list are garbage. We can destroy them in a similar to the way Neil does this in his code. Neil calls PyDict_Clear on the dictionaries, and ignores the rest. Under Neils assumption that all cycles (that he detects) involve dictionaries, that is sufficient. In our case, we may need a type-specific "clear" function for containers in the type object. We discussed more things, but not as thoroughly. Eric & Eric stressed the importance of making excellent statistics available about the rate of garbage collection -- probably as data structures that Python code can read rather than debugging print statements. Eric T also sketched an incremental version of the algorithm, usable for real-time applications. This involved keeping the gc_refs field ("external" reference counts) up-to-date at all times, which would require two different versions of the INCREF/DECREF macros: one for adding/deleting a reference from a container, and another for adding/deleting a root reference. Also, a 4th color (red) was added, to distinguish between scanned roots and scanned non-roots. We decided not to work this out in more detail because the overhead cost appeared to be much higher than for the previous algorithm; instead, we recommed that for real-time requirements the whole GC is disabled (there should be run-time controls for this, not just compile-time). We also briefly discussed possibilities for generational schemes. The general opinion was that we should first implement and test the algorithm as sketched above, and then changes or extensions could be made. I was pleasantly surprised to find Neil's code in my inbox when we came out of the meeting; I think it would be worthwhile to compare and contrast the two approaches. (Hm, maybe there's a paper in it?) The rest of the afternoon was spent discussing continuations, coroutines and generators, and the fundamental reason why continuations are so hard (the C stack getting in the way everywhere). But that's a topic for another mail, maybe. --Guido van Rossum (home page: http://www.python.org/~guido/)
Very briefly: [Guido]
... Today, Eric proposed to do away with Neil's hash table altogether -- as long as we're wasting memory, we might as well add 3 fields to each container object rather than allocating the same amount in a separate hash table. Eric expects that this will run faster, although this obviously needs to be tried.
No, it doesn't <wink>: it will run faster.
Container types are: dict, list, tuple, class, instance; plus potentially user-defined container types such as kjbuckets. I have a feeling that function objects should also be considered container types, because of the cycle involving globals.
Note that the list-migrating steps you sketch later are basically the same as (but hairier than) the ones JimF and I worked out for M&S-on-RC a few years ago, right down to using appending to effect a breadth-first traversal without requiring recursion -- except M&S doesn't have to bother accounting for sources of refcounts. Since *this* scheme does more work per item per scan, to be as fast in the end it has to touch less stuff than M&S. But the more kinds of types you track, the more stuff this scheme will have to chase. The tradeoffs are complicated & unclear, so I'll just raise an uncomfortable meta-point <wink>: you balked at M&S the last time around because of the apparent need for two link fields + a bit or two per object of a "chaseable type". If that's no longer perceived as being a showstopper, M&S should be reconsidered too. I happen to be a fan of both approaches <wink>. The worst part of M&S-on-RC (== the one I never had a good answer for) is that a non-cooperating extension type E can't be chased, hence objects reachable only from objects of type E never get marked, so are vulnerable to bogus collection. In the Neil/Toby scheme, objects of type E merely act as sources of "external" references, so the scheme fails safe (in the sense of never doing a bogus collection due to non-cooperating types). Hmm ... if both approaches converge on keeping a list of all chaseable objects, and being careful of uncoopoerating types, maybe the only real difference in the end is whether the root set is given explicitly (as in traditional M&S) or inferred indirectly (but where "root set" has a different meaning in the scheme you sketched).
... In our case, we may need a type-specific "clear" function for containers in the type object.
I think definitely, yes. full-speed-sideways<wink>-ly y'rs - tim
Guido van Rossum wrote:
Thanks for the new patches, Neil!
Thanks from me too! I notice, however, that hash_resize() still uses a malloc call instead of PyMem_NEW. Neil, please correct this in your version immediately ;-)
We had a visitor here at CNRI today, Eric Tiedemann <est@hyperreal.org>, who had a look at your patches before. Eric knows his way around the Scheme, Lisp and GC literature, and presented a variant on your approach which takes the bite out of the recursive passes.
Avoiding the recursion is valuable, as long we're optimizing the implementation of one particular scheme. It doesn't bother me that Neil's scheme is recursive, because I still perceive his code as a proof of concept. You're presenting here another scheme based on refcounts arithmetic, generalized for all container types. The linked list implementation of this generalized scheme is not directly related to the logic. I have some suspitions on the logic, so you'll probably want to elaborate a bit more on it, and convince me that this scheme would actually work.
Today, Eric proposed to do away with Neil's hash table altogether -- as long as we're wasting memory, we might as well add 3 fields to each container object rather than allocating the same amount in a separate hash table.
I cannot agree so easily with this statement, but you should have expecting this from me :-) If we're about to opimize storage, I have good reasons to believe that we don't need 3 additional slots per container (but 1 for gc_refs, yes). We could certainly envision allocating the containers within memory pools of 4K (just as it is done in pymalloc, and close to what we have for ints & floats). These pools would be labaled as "container's memory", they would obviously be under our control, and we'd have additional slots per pool, not per object. As long as we isolate the containers from the rest, we can enumerate them easily by walking though the pools. But I'm willing to defer this question for now, as it involves the object allocators (the builtin allocators + PyObject_NEW for extension types E -- user objects of type E would be automatically taken into account for GC if there's a flag in the type struct which identifies them as containers).
Eric expects that this will run faster, although this obviously needs to be tried.
Definitely, although I trust Eric & Tim :-)
Container types are: dict, list, tuple, class, instance; plus potentially user-defined container types such as kjbuckets. I have a feeling that function objects should also be considered container types, because of the cycle involving globals.
+ other extension container types. And I insist. Don't forget that we're planning to merge types and classes...
Eric's algorithm, then, consists of the following parts.
Each container object has three new fields: gc_next, gc_prev, and gc_refs. (Eric calls the gc_refs "refcount-zero".)
We color objects white (initial), gray (root), black (scanned root). (The terms are explained later; we believe we don't actually need bits in the objects to store the color; see later.)
All container objects are chained together in a doubly-linked list -- this is the same as Neil's code except Neil does it only for dicts. (Eric postulates that you need a list header.)
When GC is activated, all objects are colored white; we make a pass over the entire list and set gc_refs equal to the refcount for each object.
Step 1: for all containers, c->gc_refs = c->ob_refcnt
Next, we make another pass over the list to collect the internal references. Internal references are (just like in Neil's version) references from other container types. In Neil's version, this was recursive; in Eric's version, we don't need recursion, since the list already contains all containers. So we simple visit the containers in the list in turn, and for each one we go over all the objects it references and subtract one from *its* gc_refs field. (Eric left out the little detail that we ened to be able to distinguish between container and non-container objects amongst those references; this can be a flag bit in the type field.)
Step 2: c->gc_refs = c->gc_refs - Nb_referenced_containers_from_c I guess that you realize that after this step, gc_refs can be zero or negative. I'm not sure that you collect "internal" references here (references from other container types). A list referencing 20 containers, being itself referenced by one container + one static variable + two times from the runtime stack, has an initial refcount == 4, so we'll end up with gc_refs == -16. A tuple referencing 1 list, referenced once by the stack, will end up with gc_refs == 0. Neil's scheme doesn't seem to have this "property".
Now, similar to Neil's version, all objects for which gc_refs == 0 have only internal references, and are potential garbage; all objects for which gc_refs > 0 are "roots". These have references to them from other places, e.g. from globals or stack frames in the Python virtual machine.
Agreed, some roots have gc_refs > 0 I'm not sure that all of them have it, though... Do they?
We now start a second list, to which we will move all roots. The way to do this is to go over the first list again and to move each object that has gc_refs > 0 to the second list. Objects placed on the second list in this phase are considered colored gray (roots).
Step 3: Roots with gc_refs > 0 go to the 2nd list. All c->gc_refs <= 0 stay in the 1st list.
Of course, some roots will reference some non-roots, which keeps those non-roots alive. We now make a pass over the second list, where for each object on the second list, we look at every object it references. If a referenced object is a container and is still in the first list (colored white) we *append* it to the second list (colored gray). Because we append, objects thus added to the second list will eventually be considered by this same pass; when we stop finding objects that sre still white, we stop appending to the second list, and we will eventually terminate this pass. Conceptually, objects on the second list that have been scanned in this pass are colored black (scanned root); but there is no need to to actually make the distinction.
Step 4: Closure on reachable containers which are all moved to the 2nd list. (Assuming that the objects are checked only via their type, without involving gc_refs)
(How do we know whether an object pointed to is white (in the first list) or gray or black (in the second)?
Good question? :-)
We could use an extra bitfield, but that's a waste of space. Better: we could set gc_refs to a magic value (e.g. 0xffffffff) when we move the object to the second list.
I doubt that this would work for the reasons mentioned above.
During the meeting, I proposed to set the back pointer to NULL; that might work too but I think the gc_refs field is more elegant. We could even just test for a non-zero gc_refs field; the roots moved to the second list initially all have a non-zero gc_refs field already, and for the objects with a zero gc_refs field we could indeed set it to something arbitrary.)
Not sure that "arbitrary" is a good choice if the differentiation is based solely on gc_refs.
Once we reach the end of the second list, all objects still left in the first list are garbage. We can destroy them in a similar to the way Neil does this in his code. Neil calls PyDict_Clear on the dictionaries, and ignores the rest. Under Neils assumption that all cycles (that he detects) involve dictionaries, that is sufficient. In our case, we may need a type-specific "clear" function for containers in the type object.
Couldn't this be done in the object's dealloc function? Note that both Neil's and this scheme assume that garbage _detection_ and garbage _collection_ is an atomic operation. I must say that I don't care of having some living garbage if it doesn't hurt my work. IOW, the used criterion for triggering the detection phase _may_ eventually differ from the one used for the collection phase. But this is where we reach the incremental approaches, implying different reasoning as a whole. My point is that the introduction of a "clear" function depends on the adopted scheme, whose logic depends on pertinent statistics on memory consumption of the cyclic garbage. To make it simple, we first need stats on memory consumption, then we can discuss objectively on how to implement some particular GC scheme. I second Eric on the need for excellent statistics.
The general opinion was that we should first implement and test the algorithm as sketched above, and then changes or extensions could be made.
I'd like to see it discussed first in conjunction with (1) the possibility of having a proprietary malloc, (2) the envisioned type/class unification. Perhaps I'm getting too deep, but once something gets in, it's difficult to take it out, even when a better solution is found subsequently. Although I'm enthousiastic about this work on GC, I'm not in a position to evaluate the true benefits of the proposed schemes, as I still don't have a basis for evaluating how much garbage my program generates and whether it hurts the interpreter compared to its overal memory consumption.
I was pleasantly surprised to find Neil's code in my inbox when we came out of the meeting; I think it would be worthwhile to compare and contrast the two approaches. (Hm, maybe there's a paper in it?)
I'm all for it! -- Vladimir MARANGOZOV | Vladimir.Marangozov@inrialpes.fr http://sirac.inrialpes.fr/~marangoz | tel:(+33-4)76615277 fax:76615252
"VM" == Vladimir Marangozov <marangoz@python.inrialpes.fr> writes:
[">>" == Guido explaining Eric Tiedemann's GC design]
Next, we make another pass over the list to collect the internal references. Internal references are (just like in Neil's version) references from other container types. In Neil's version, this was recursive; in Eric's version, we don't need recursion, since the list already contains all containers. So we simple visit the containers in the list in turn, and for each one we go over all the objects it references and subtract one from *its* gc_refs field. (Eric left out the little detail that we ened to be able to distinguish between container and non-container objects amongst those references; this can be a flag bit in the type field.)
VM> Step 2: c->gc_refs = c->gc_refs - VM> Nb_referenced_containers_from_c VM> I guess that you realize that after this step, gc_refs can be VM> zero or negative. I think Guido's explanation is slightly ambiguous. When he says, "subtract one from *its" gc_refs field" he means subtract one from the _contained_ object's gc_refs field. VM> I'm not sure that you collect "internal" references here VM> (references from other container types). A list referencing 20 VM> containers, being itself referenced by one container + one VM> static variable + two times from the runtime stack, has an VM> initial refcount == 4, so we'll end up with gc_refs == -16. The strategy is not that the container's gc_refs is decremented once for each object it contains. Rather, the container decrements each contained object's gc_refs by one. So you should never end of with gc_refs < 0.
During the meeting, I proposed to set the back pointer to NULL; that might work too but I think the gc_refs field is more elegant. We could even just test for a non-zero gc_refs field; the roots moved to the second list initially all have a non-zero gc_refs field already, and for the objects with a zero gc_refs field we could indeed set it to something arbitrary.)
I believe we discussed this further and concluded that setting the back pointer to NULL would not work. If we make the second list doubly-linked (like the first one), it is trivial to end GC by swapping the first and second lists. If we've zapped the NULL pointer, then we have to go back and re-set them all. Jeremy
On Wed, Mar 01, 2000 at 06:07:07PM +0100, Vladimir Marangozov wrote:
Guido van Rossum wrote:
Once we reach the end of the second list, all objects still left in the first list are garbage. We can destroy them in a similar to the way Neil does this in his code. Neil calls PyDict_Clear on the dictionaries, and ignores the rest. Under Neils assumption that all cycles (that he detects) involve dictionaries, that is sufficient. In our case, we may need a type-specific "clear" function for containers in the type object.
Couldn't this be done in the object's dealloc function?
No, I don't think so. The object still has references to it. You have to be careful about how you break cycles so that memory is not accessed after it is freed. Neil -- "If elected mayor, my first act will be to kill the whole lot of you, and burn your town to cinders!" -- Groundskeeper Willie
We now have two implementations of Eric Tiedemann's idea: Neil and I both implemented it. It's too soon to post the patch sets (both are pretty rough) but I've got another design question. Once we've identified a bunch of objects that are only referring to each other (i.e., one or more cycles) we have to dispose of them. The question is, how? We can't just call free on each of the objects; some may not be allocated with malloc, and some may contain pointers to other malloc'ed memory that also needs to be freed. So we have to get their destructors involved. But how? Calling ob->ob_type->tp_dealloc(ob) for an object who reference count is unsafe -- this will destroy the object while there are still references to it! Those references are all coming from other objects that are part of the same cycle; those objects will also be deallocated and they will reference the deallocated objects (if only to DECREF them). Neil uses the same solution that I use when finalizing the Python interpreter -- find the dictionaries and call PyDict_Clear() on them. (In his unpublished patch, he also clears the lists using PyList_SetSlice(list, 0, list->ob_size, NULL). He's also generalized so that *every* object can define a tp_clear function in its type object.) As long as every cycle contains at least one dictionary or list object, this will break cycles reliably and get rid of all the garbage. (If you wonder why: clearing the dict DECREFs the next object(s) in the cycle; if the last dict referencing a particular object is cleared, the last DECREF will deallocate that object, which will in turn DECREF the objects it references, and so forth. Since none of the objects in the cycle has incoming references from outside the cycle, we can prove that this will delete all objects as long as there's a dict or list in each cycle. However, there's a snag. It's the same snag as what finalizing the Python interpreter runs into -- it has to do with __del__ methods and the undefined order in which the dictionaries are cleared. For example, it's quite possible that the first dictionary we clear is the __dict__ of an instance, so this zaps all its instance variables. Suppose this breaks the cycle, so then the instance itself gets DECREFed to zero. Its deallocator will be called. If it's got a __del__, this __del__ will be called -- but all the instance variables have already been zapped, so it will fail miserably! It's also possible that the __dict__ of a class involved in a cycle gets cleared first, in which case the __del__ no longer "exists", and again the cleanup is skipped. So the question is: What to *do*? My solution is to make an extra pass over all the garbage objects *before* we clear dicts and lists, and for those that are instances and have __del__ methods, call their __del__ ("by magic", as Tim calls it in another post). The code in instance_dealloc() already does the right thing here: it calls __del__, then discovers that the reference count is > 0 ("I'm not dead yet" :-), and returns without freeing the object. (This is also why I want to introduce a flag ensuring that __del__ gets called by instance_dealloc at most once: later when the instance gets DECREFed to 0, instance_dealloc is called again and will correctly free the object; but we don't want __del__ called again.) [Note for Neil: somehow I forgot to add this logic to the code; in_del_called isn't used! The change is obvious though.] This still leaves a problem for the user: if two class instances reference each other and both have a __del__, we can't predict whose __del__ is called first when they are called as part of cycle collection. The solution is to write each __del__ so that it doesn't depend on the other __del__. Someone (Tim?) in the past suggested a different solution (probably found in another language): for objects that are collected as part of a cycle, the destructor isn't called at all. The memory is freed (since it's no longer reachable), but the destructor is not called -- it is as if the object lives on forever. This is theoretically superior, but not practical: when I have an object that creates a temp file, I want to be able to reliably delete the temp file in my destructor, even when I'm part of a cycle! --Guido van Rossum (home page: http://www.python.org/~guido/)
[Guido about ways to cleanup cyclic garbage] FYI, I'm using a special protocol for disposing of cyclic garbage: the __cleanup__ protocol. The purpose of this call is probably similar to Neil's tp_clear: it is intended to let objects break possible cycles in their own storage scope, e.g. instances can delete instance variables which they know can cause cyclic garbage. The idea is simple: give all power to the objects rather than try to solve everything with one magical master plan. The mxProxy package has details on the protocol. The __cleanup__ method is called by the Proxy when the Proxy is about to be deleted. If all references to an object go through the Proxy, the __cleanup__ method call can easily break cycles to have the refcount reach zero in which case __del__ is called. Since the object knows about this scheme it can take precautions to make sure that __del__ still works after __cleanup__ was called. Anyway, just a thought... there are probably many ways to do all this. -- Marc-Andre Lemburg ______________________________________________________________________ Business: http://www.lemburg.com/ Python Pages: http://www.lemburg.com/python/
[Guido]
... Someone (Tim?) in the past suggested a different solution (probably found in another language): for objects that are collected as part of a cycle, the destructor isn't called at all. The memory is freed (since it's no longer reachable), but the destructor is not called -- it is as if the object lives on forever.
Stroustrup has written in favor of this for C++. It's exactly the kind of overly slick "good argument" he would never accept from anyone else <0.1 wink>.
This is theoretically superior, but not practical: when I have an object that creates a temp file, I want to be able to reliably delete the temp file in my destructor, even when I'm part of a cycle!
A member of the C++ committee assured me Stroustrup is overwhelmingly opposed on this. I don't even agree it's theoretically superior: it relies on the fiction that gc "may never occur", and that's just silly in practice. You're moving down the Java path. I can't possibly do a better job of explaining the Java rules than the Java Language Spec. does for itself. So pick that up and study section 12.6 (Finalization of Class Instances). The end result makes little sense to users, but is sufficient to guarantee that Java itself never blows up. Note, though, that there is NO good answer to finalizers in cycles! The implementation cannot be made smart enough to both avoid trouble and "do the right thing" from the programmer's POV, because the latter is unknowable. Somebody has to lose, one way or another. Rather than risk doing a wrong thing, the BDW collector lets cycles with finalizers leak. But it also has optional hacks to support exceptions for use with C++ (which sometimes creates self-cycles) and Java. See http://reality.sgi.com/boehm_mti/finalization.html for Boehm's best concentrated <wink> thoughts on the subject. The only principled approach I know of comes out of the Scheme world. Scheme has no finalizers, of course. But it does have gc, and the concept of "guardians" was invented to address all gc finalization problems in one stroke. It's extremely Scheme-like in providing a perfectly general mechanism with no policy whatsoever. You (the Scheme programmer) can create guardian objects, and "register" other objects with a guardian. At any time, you can ask a guardian whether some object registered with it is "ready to die" (i.e., the only thing keeping it alive is its registration with the guardian). If so, you can ask it to give you one. Everything else is up to you: if you want to run a finalizer, your problem. If there are cycles, also your problem. Even if there are simple non-cyclic dependencies, your problem. Etc. So those are the extremes: BDW avoids blame by refusing to do anything. Java avoids blame by exposing an impossibly baroque implementation-driven finalization model. Scheme avoids blame by refusing to do anything "by magic", but helps you to shoot yourself with the weapon of your choice. That bad news is that I don't know of a scheme *not* at an extreme! It's extremely un-Pythonic to let things leak (despite that it has let things leak for a decade <wink>), but also extremely un-Pythonic to make some wild-ass guess. So here's what I'd consider doing: explicit is better than implicit, and in the face of ambiguity refuse the temptation to guess. If a trash cycle contains a finalizer (my, but that has to be rare. in practice, in well-designed code!), don't guess, but make it available to the user. A gc.guardian() call could expose such beasts, or perhaps a callback could be registered, invoked when gc finds one of these things. Anyone crazy enough to create cyclic trash with finalizers then has to take responsibility for breaking the cycle themself. This puts the burden on the person creating the problem, and they can solve it in the way most appropriate to *their* specific needs. IOW, the only people who lose under this scheme are the ones begging to lose, and their "loss" consists of taking responsibility. when-a-problem-is-impossible-to-solve-favor-sanity<wink>-ly y'rs - tim
On Fri, 3 Mar 2000, Tim Peters wrote:
... Note, though, that there is NO good answer to finalizers in cycles! The
"Note" ?? Not just a note, but I'd say an axiom :-) By definition, you have two objects referring to each other in some way. How can you *definitely* know how to break the link between them? Do you call A's finalizer or B's first? If they're instances, do you just whack their __dict__ and hope for the best?
... So here's what I'd consider doing: explicit is better than implicit, and in the face of ambiguity refuse the temptation to guess. If a trash cycle contains a finalizer (my, but that has to be rare. in practice, in well-designed code!), don't guess, but make it available to the user. A gc.guardian() call could expose such beasts, or perhaps a callback could be registered, invoked when gc finds one of these things. Anyone crazy enough to create cyclic trash with finalizers then has to take responsibility for breaking the cycle themself. This puts the burden on the person creating the problem, and they can solve it in the way most appropriate to *their* specific needs. IOW, the only people who lose under this scheme are the ones begging to lose, and their "loss" consists of taking responsibility.
I'm not sure if Tim is saying the same thing, but I'll write down a concreate idea for cleaning garbage cycles. First, a couple observations: * Some objects can always be reliably "cleaned": lists, dicts, tuples. They just drop their contents, with no invocations against any of them. Note that an instance without a __del__ has no opinion on how it is cleaned. (this is related to Tim's point about whether a cycle has a finalizer) * The other objects may need to *use* their referenced objects in some way to clean out cycles. Since the second set of objects (possibly) need more care during their cleanup, we must concentrate on how to solve their problem. Back up a step: to determine where an object falls, let's define a tp_clean type slot. It returns an integer and takes one parameter: an operation integer. Py_TPCLEAN_CARE_CHECK /* check whether care is needed */ Py_TPCLEAN_CARE_EXEC /* perform the careful cleaning */ Py_TPCLEAN_EXEC /* perform a non-careful cleaning */ Given a set of objects that require special cleaning mechanisms, there is no way to tell where to start first. So... just pick the first one. Call its tp_clean type slot with CARE_EXEC. For instances, this maps to __clean__. If the instance does not have a __clean__, then tp_clean returns FALSE meaning that it could not clean this object. The algorithm moves on to the next object in the set. If tp_clean returns TRUE, then the object has been "cleaned" and is moved to the "no special care needed" list of objects, awaiting its reference count to hit zero. Note that objects in the "care" and "no care" lists may disappear during the careful-cleaning process. If the careful-cleaning algorithm hits the end of the careful set of objects and the set is non-empty, then throw an exception: GCImpossibleError. The objects in this set each said they could not be cleaned carefully AND they were not dealloc'd during other objects' cleaning. [ it could be possible to define a *dynamic* CARE_EXEC that will succeed if you call it during a second pass; I'm not sure this is a Good Thing to allow, however. ] This also implies that a developer should almost *always* consider writing a __clean__ method whenever they write a __del__ method. That method MAY be called when cycles need to be broken; the object should delete any non-essential variables in such a way that integrity is retained (e.g. it fails gracefully when methods are called and __del__ won't raise an error). For example, __clean__ could call a self.close() to shut down its operation. Whatever... you get the idea. At the end of the iteration of the "care" set, then you may have objects remaining in the "no care" set. By definition, these objects don't care about their internal references to other objects (they don't need them during deallocation). We iterate over this set, calling tp_clean(EXEC). For lists, dicts, and tuples, the tp_clean(EXEC) call simply clears out the references to other objects (but does not dealloc the object!). Again: objects in the "no care" set will go away during this process. By the end of the iteration over the "no care" set, it should be empty. [ note: the iterations over these sets should probably INCREF/DECREF across the calls; otherwise, the object could be dealloc'd during the tp_clean call. ] [ if the set is NOT empty, then tp_clean(EXEC) did not remove all possible references to other objects; not sure what this means. is it an error? maybe you just force a tp_dealloc on the remaining objects. ] Note that the tp_clean mechanism could probably be used during the Python finalization, where Python does a bunch of special-casing to clean up modules. Specifically: a module does not care about its contents during its deallocation, so it is a "no care" object; it responds to tp_clean(EXEC) by clearing its dictionary. Class objects are similar: they can clear their dict (which contains a module reference which usually causes a loop) during tp_clean(EXEC). Module cleanup is easy once objects with CARE_CHECK have been handled -- all that funny logic in there is to deal with "care" objects. Cheers, -g -- Greg Stein, http://www.lyra.org/
[Tim]
Note, though, that there is NO good answer to finalizers in cycles! The
[Greg Stein]
"Note" ?? Not just a note, but I'd say an axiom :-)
An axiom is accepted without proof: we have plenty of proof that there's no thoroughly good answer (i.e., every language that has ever addressed this issue -- along with every language that ever will <wink>).
By definition, you have two objects referring to each other in some way. How can you *definitely* know how to break the link between them? Do you call A's finalizer or B's first? If they're instances, do you just whack their __dict__ and hope for the best?
Exactly. The *programmer* may know the right thing to do, but the Python implementation can't possibly know. Facing both facts squarely constrains the possibilities to the only ones that are all of understandable, predictable and useful. Cycles with finalizers must be a Magic-Free Zone else you lose at least one of those three: even Guido's kung fu isn't strong enough to outguess this. [a nice implementation sketch, of what seems an overly elaborate scheme, if you believe cycles with finalizers are rare in intelligently designed code) ] Provided Guido stays interested in this, he'll make his own fun. I'm just inviting him to move in a sane direction <0.9 wink>. One caution:
... If the careful-cleaning algorithm hits the end of the careful set of objects and the set is non-empty, then throw an exception: GCImpossibleError.
Since gc "can happen at any time", this is very severe (c.f. Guido's objection to making resurrection illegal). Hand a trash cycle back to the programmer instead, via callback or request or whatever, and it's all explicit without more cruft in the implementation. It's alive again when they get it back, and they can do anything they want with it (including resurrecting it, or dropping it again, or breaking cycles -- anything). I'd focus on the cycles themselves, not on the types of objects involved. I'm not pretending to address the "order of finalization at shutdown" question, though (although I'd agree they're deeply related: how do you follow a topological sort when there *isn't* one? well, you don't, because you can't). realistically y'rs - tim
On Fri, 3 Mar 2000, Tim Peters wrote:
... [a nice implementation sketch, of what seems an overly elaborate scheme, if you believe cycles with finalizers are rare in intelligently designed code) ]
Nah. Quite simple to code up, but a bit longer to explain in English :-) The hardest part is finding the cycles, but Guido already posted a long explanation about that. Once that spits out the doubly-linked list of objects, then you're set. 1) scan the list calling tp_clean(CARE_CHECK), shoving "care needed" objects to a second list 2) scan the care-needed list calling tp_clean(CARE_EXEC). if TRUE is returned, then the object was cleaned and moves to the "no care" list. 3) assert len(care-needed list) == 0 4) scan the no-care list calling tp_clean(EXEC) 5) (questionable) assert len(no-care list) == 0 The background makes it longer. The short description of the algorithm is easy. Step (1) could probably be merged right into one of the scans in the GC algorithm (e.g. during the placement into the "these are cyclical garbage" list)
Provided Guido stays interested in this, he'll make his own fun. I'm just inviting him to move in a sane direction <0.9 wink>.
hehe... Agreed.
One caution:
... If the careful-cleaning algorithm hits the end of the careful set of objects and the set is non-empty, then throw an exception: GCImpossibleError.
Since gc "can happen at any time", this is very severe (c.f. Guido's objection to making resurrection illegal).
GCImpossibleError would simply be a subclass of MemoryError. Makes sense to me, and definitely allows for its "spontaneity."
Hand a trash cycle back to the programmer instead, via callback or request or whatever, and it's all explicit without more cruft in the implementation. It's alive again when they get it back, and they can do anything they want with it (including resurrecting it, or dropping it again, or breaking cycles -- anything). I'd focus on the cycles themselves, not on the types of objects involved. I'm not pretending to address the "order of finalization at shutdown" question, though (although I'd agree they're deeply related: how do you follow a topological sort when there *isn't* one? well, you don't, because you can't).
I disagree. I don't think a Python-level function is going to have a very good idea of what to do. IMO, this kind of semantics belong down in the interpreter with a specific, documented algorithm. Throwing it out to Python won't help -- that function will still have to use a "standard pattern" for getting the cyclical objects to toss themselves. I think that standard pattern should be a language definition. Without a standard pattern, then you're saying the application will know what to do, but that is kind of weird -- what happens when an unexpected cycle arrives? Cheers, -g -- Greg Stein, http://www.lyra.org/
On Sat, 4 Mar 2000, Greg Stein wrote:
I disagree. I don't think a Python-level function is going to have a very good idea of what to do <snip>
Much better then the Python interpreter... <snip>
Throwing it out to Python won't help <snip> what happens when an unexpected cycle arrives?
Don't delete it. It's as simple as that, since it's a bug. -- Moshe Zadka <mzadka@geocities.com>. http://www.oreilly.com/news/prescod_0300.html
On Sat, 4 Mar 2000, Moshe Zadka wrote:
On Sat, 4 Mar 2000, Greg Stein wrote:
I disagree. I don't think a Python-level function is going to have a very good idea of what to do <snip>
Much better then the Python interpreter...
If your function receives two instances (A and B), what are you going to do? How can you know what their policy is for cleaning up in the face of a cycle? I maintain that you would call the equivalent of my proposed __clean__. There isn't much else you'd be able to do, unless you had a completely closed system, you expected cycles between specific types of objects, and you knew a way to clean them up. Even then, you would still be calling something like __clean__ to let the objects do whatever they needed. I'm suggesting that __clean__ should be formalized (as part of tp_clean). Throwing the handling "up to Python" isn't going to do much for you. Seriously... I'm all for coding more stuff in Python rather than C, but this just doesn't feel right. Getting the objects GC'd is a language feature, and a specific pattern/method/recommendation is best formulated as an interpreter mechanism.
<snip>
Throwing it out to Python won't help <snip> what happens when an unexpected cycle arrives?
Don't delete it. It's as simple as that, since it's a bug.
The point behind this stuff is to get rid of it, rather than let it linger on. If the objects have finalizers (which is how we get to this step!), then it typically means there is a resource they must release. Getting the object cleaned and dealloc'd becomes quite important. Cheers, -g p.s. did you send in a patch for the instance_contains() thing yet? -- Greg Stein, http://www.lyra.org/
[Tim sez "toss insane cycles back on the user"] [Greg Stein]
I disagree. I don't think a Python-level function is going to have a very good idea of what to do.
You've already assumed that Python coders know exactly what to do, else they couldn't have coded the new __clean__ method your proposal relies on. I'm taking what strikes me as the best part of Scheme's Guardian idea: don't assume *anything* about what users "should" do to clean up their trash. Leave it up to them: their problem, their solution. I think finalizers in trash cycles should be so rare in well-written code that it's just not worth adding much of anything in the implementation to cater to it.
IMO, this kind of semantics belong down in the interpreter with a specific, documented algorithm. Throwing it out to Python won't help -- that function will still have to use a "standard pattern" for getting the cyclical objects to toss themselves.
They can use any pattern they want, and if the pattern doesn't *need* to be coded in C as part of the implementation, it shouldn't be.
I think that standard pattern should be a language definition.
I distrust our ability to foresee everything users may need over the next 10 years: how can we know today that the first std pattern you dreamed up off the top of your head is the best approach to an unbounded number of problems we haven't yet seen a one of <wink>?
Without a standard pattern, then you're saying the application will know what to do, but that is kind of weird -- what happens when an unexpected cycle arrives?
With the hypothetical gc.get_cycle() function I mentioned before, they should inspect objects in the list they get back, and if they find they don't know what to do with them, they can still do anything <wink> they want. Examples include raising an exception, dialing my home pager at 3am to insist I come in to look at it, or simply let the list go away (at which point the objects in the list will again become a trash cycle containing a finalizer). If several distinct third-party modules get into this act, I *can* see where it could become a mess. That's why Scheme "guardians" is plural: a given module could register its "problem objects" in advance with a specific guardian of its own, and query only that guardian later for things ready to die. This probably can't be implemented in Python, though, without support for weak references (or lots of brittle assumptions about specific refcount values). agreeably-disagreeing-ly y'rs - tim
I'm beginning to believe that handing cycles with finalizers to the user is better than calling __del__ with a different meaning, and I tentatively withdraw my proposal to change the rules for when __del__ is called (even when __init__ fails; I haven't had any complaints about that either). There seem to be two competing suggestions for solutions: (1) call some kind of __cleanup__ (Marc-Andre) or tp_clean (Greg) method of the object; (2) Tim's proposal of an interface to ask the garbage collector for a trash cycle with a finalizer (or for an object with a finalizer in a trash cycle?). Somehow Tim's version looks less helpful to me, because it *seems* that whoever gets to handle the cycle (the main code of the program?) isn't necessarily responsible for creating it (some library you didn't even know was used under the covers of some other library you called). Of course, it's also posssible that a trash cycle is created by code outside the responsibility of the finalizer. But still, I have a hard time understanding how Tim's version would be used. Greg or Marc-Andre's version I understand. What keeps nagging me though is what to do when there's a finalizer but no cleanup method. I guess the trash cycle remains alive. Is this acceptable? (I guess so, because we've given the programmer a way to resolve the trash: provide a cleanup method.) If we detect individual cycles (the current algorithm doesn't do that yet, though it seems easy enough to do another scan), could we special-case cycles with only one finalizer and no cleaner-upper? (I'm tempted to call the finalizer because it seems little harm can be done -- but then of course there's the problem of the finalizer being called again when the refcount really goes to zero. :-( )
Exactly. The *programmer* may know the right thing to do, but the Python implementation can't possibly know. Facing both facts squarely constrains the possibilities to the only ones that are all of understandable, predictable and useful. Cycles with finalizers must be a Magic-Free Zone else you lose at least one of those three: even Guido's kung fu isn't strong enough to outguess this.
[a nice implementation sketch, of what seems an overly elaborate scheme, if you believe cycles with finalizers are rare in intelligently designed code) ]
Provided Guido stays interested in this, he'll make his own fun. I'm just inviting him to move in a sane direction <0.9 wink>.
My current tendency is to go with the basic __cleanup__ and nothing more, calling each instance's __cleanup__ before clobbering directories and lists -- which should break all cycles safely.
One caution:
... If the careful-cleaning algorithm hits the end of the careful set of objects and the set is non-empty, then throw an exception: GCImpossibleError.
Since gc "can happen at any time", this is very severe (c.f. Guido's objection to making resurrection illegal).
Not quite. Cycle detection is presumably only called every once in a while on memory allocation, and memory *allocation* (as opposed to deallocation) is allowed to fail. Of course, this will probably run into various coding bugs where allocation failure isn't dealt with properly, because in practice this happens so rarely...
Hand a trash cycle back to the programmer instead, via callback or request or whatever, and it's all explicit without more cruft in the implementation. It's alive again when they get it back, and they can do anything they want with it (including resurrecting it, or dropping it again, or breaking cycles -- anything).
That was the idea with calling the finalizer too: it would be called between INCREF/DECREF, so the object would be considered alive for the duration of the finalizer call. Here's another way of looking at my error: for dicts and lists, I would call a special *clear* function; but for instances, I would call *dealloc*, however intending it to perform a *clear*. I wish we didn't have to special-case finalizers on class instances (since each dealloc function is potentially a combination of a finalizer and a deallocation routine), but the truth is that they *are* special -- __del__ has no responsibility for deallocating memory, only for deallocating external resources (such as temp files). And even if we introduced a tp_clean protocol that would clear dicts and lists and call __cleanup__ for instances, we'd still want to call it first for instances, because an instance depends on its __dict__ for its __cleanup__ to succeed (but the __dict__ doesn't depend on the instance for its cleanup). Greg's 3-phase tp_clean protocol seems indeed overly elaborate but I guess it deals with such dependencies in the most general fashion.
I'd focus on the cycles themselves, not on the types of objects involved. I'm not pretending to address the "order of finalization at shutdown" question, though (although I'd agree they're deeply related: how do you follow a topological sort when there *isn't* one? well, you don't, because you can't).
In theory, you just delete the last root (a C global pointing to sys.modules) and you run the garbage collector. It might be more complicated in practiceto track down all roots. Another practical consideration is that now there are cycles of the form <function object> <=> <module dict> which suggests that we should make function objects traceable. Also, modules can cross-reference, so module objects should be made traceable. I don't think that this will grow the sets of traced objects by too much (since the dicts involved are already traced, and a typical program has way fewer functions and modules than it has class instances). On the other hand, we may also have to trace (un)bound method objects, and these may be tricky because they are allocated and deallocated at high rates (once per typical method call). Back to the drawing board... --Guido van Rossum (home page: http://www.python.org/~guido/)
Guido> What keeps nagging me though is what to do when there's a Guido> finalizer but no cleanup method. I guess the trash cycle remains Guido> alive. Is this acceptable? (I guess so, because we've given the Guido> programmer a way to resolve the trash: provide a cleanup method.) That assumes the programmer even knows there's a cycle, right? I'd like to see this scheme help provide debugging assistance. If a cycle is discovered but the programmer hasn't declared a cleanup method for the object it wants to cleanup, a default cleanup method is called if it exists (e.g. sys.default_cleanup), which would serve mostly as an alert (print magic hex values to stderr, popup a Tk bomb dialog, raise the blue screen of death, ...) as opposed to actually breaking any cycles. Presumably the programmer would define sys.default_cleanup during development and leave it undefined during production. Skip Montanaro | http://www.mojam.com/ skip@mojam.com | http://www.musi-cal.com/
[Guido]
I'm beginning to believe that handing cycles with finalizers to the user is better than calling __del__ with a different meaning,
You won't be sorry: Python has the chance to be the first language that's both useful and sane here!
and I tentatively withdraw my proposal to change the rules for when __del__is called (even when __init__ fails; I haven't had any complaints about that either).
Well, everyone liked the parenthetical half of that proposal, although Jack's example did point out a real surprise with it.
There seem to be two competing suggestions for solutions: (1) call some kind of __cleanup__ (Marc-Andre) or tp_clean (Greg) method of the object; (2) Tim's proposal of an interface to ask the garbage collector for a trash cycle with a finalizer (or for an object with a finalizer in a trash cycle?).
Or a maximal strongly-connected component, or *something* -- unsure.
Somehow Tim's version looks less helpful to me, because it *seems* that whoever gets to handle the cycle (the main code of the program?) isn't necessarily responsible for creating it (some library you didn't even know was used under the covers of some other library you called).
Yes, to me too. This is the Scheme "guardian" idea in a crippled form (Scheme supports as many distinct guardians as the programmer cares to create), and even in its full-blown form it supplies "a perfectly general mechanism with no policy whatsoever". Greg convinced me (although I haven't admitted this yet <wink>) that "no policy whatsoever" is un-Pythonic too. *Some* policy is helpful, so I won't be pushing the guardian idea any more (although see immediately below for an immediate backstep on that <wink>).
... What keeps nagging me though is what to do when there's a finalizer but no cleanup method. I guess the trash cycle remains alive. Is this acceptable? (I guess so, because we've given the programmer a way to resolve the trash: provide a cleanup method.)
BDW considers it better to leak than to risk doing a wrong thing, and I agree wholeheartedly with that. GC is one place you want to have a "100% language". This is where something like a guardian can remain useful: while leaking is OK because you've given them an easy & principled alternative, leaking without giving them a clear way to *know* about it is not OK. If gc pushes the leaked stuff off to the side, the gc module should (say) supply an entry point that returns all the leaked stuff in a list. Then users can *know* they're leaking, know how badly they're leaking, and examine exactly the objects that are leaking. Then they've got the info they need to repair their program (or at least track down the 3rd-party module that's leaking). As with a guardian, they *could* also build a reclamation scheme on top of it, but that would no longer be the main (or even an encouraged) thrust.
If we detect individual cycles (the current algorithm doesn't do that yet, though it seems easy enough to do another scan), could we special-case cycles with only one finalizer and no cleaner-upper? (I'm tempted to call the finalizer because it seems little harm can be done -- but then of course there's the problem of the finalizer being called again when the refcount really goes to zero. :-( )
"Better safe than sorry" is my immediate view on this -- you can't know that the finalizer won't resurrect the cycle, and "finalizer called iff refcount hits 0" is a wonderfully simple & predictable rule. That's worth a lot to preserve, unless & until it proves to be a disaster in practice. As to the details of cleanup, I haven't succeeded in making the time to understand all the proposals. But I've done my primary job here if I've harassed everyone into not repeating the same mistakes all previous languages have made <0.9 wink>.
... I wish we didn't have to special-case finalizers on class instances (since each dealloc function is potentially a combination of a finalizer and a deallocation routine), but the truth is that they *are* special -- __del__ has no responsibility for deallocating memory, only for deallocating external resources (such as temp files).
And the problem is that __del__ can do anything whatsoever than can be expressed in Python, so there's not a chance in hell of outguessing it.
... Another practical consideration is that now there are cycles of the form
<function object> <=> <module dict>
which suggests that we should make function objects traceable. Also, modules can cross-reference, so module objects should be made traceable. I don't think that this will grow the sets of traced objects by too much (since the dicts involved are already traced, and a typical program has way fewer functions and modules than it has class instances). On the other hand, we may also have to trace (un)bound method objects, and these may be tricky because they are allocated and deallocated at high rates (once per typical method call).
This relates to what I was trying to get at with my response to your gc implementation sketch: mark-&-sweep needs to chase *everything*, so the set of chased types is maximal from the start. Adding chased types to the "indirectly infer what's unreachable via accounting for internal refcounts within the transitive closure" scheme can end up touching nearly as much as a full M-&-S pass per invocation. I don't know where the break-even point is, but the more stuff you chase in the latter scheme the less often you want to run it. About high rates, so long as a doubly-linked list allows efficient removal of stuff that dies via refcount exhaustion, you won't actually *chase* many bound method objects (i.e., they'll usually go away by themselves). Note in passing that bound method objects often showed up in cycles in IDLE, although you usually managed to break those in other ways.
Back to the drawing board...
Good! That means you're making real progress <wink>. glad-someone-is-ly y'rs - tim
[Tim Peters]
...If a trash cycle contains a finalizer (my, but that has to be rare. in practice, in well-designed code!),
This shows something Tim himself has often said -- he never programmed a GUI. It's very hard to build a GUI (especially with Tkinter) which is cycle-less, but the classes implementing the GUI often have __del__'s to break system-allocated resources. So, it's not as rare as we would like to believe, which is the reason I haven't given this answer. which-is-not-the-same-thing-as-disagreeing-with-it-ly y'rs, Z. -- Moshe Zadka <mzadka@geocities.com>. http://www.oreilly.com/news/prescod_0300.html
[Tim]
...If a trash cycle contains a finalizer (my, but that has to be rare. in practice, in well-designed code!),
[Moshe Zadka]
This shows something Tim himself has often said -- he never programmed a GUI. It's very hard to build a GUI (especially with Tkinter) which is cycle-less, but the classes implementing the GUI often have __del__'s to break system-allocated resources.
So, it's not as rare as we would like to believe, which is the reason I haven't given this answer.
I wrote Cyclops.py when trying to track down leaks in IDLE. The extraordinary thing we discovered is that "even real gc" would not have reclaimed the cycles. They were legitimately reachable, because, indeed, "everything points to everything else". Guido fixed almost all of them by explicitly calling new "close" methods. I believe IDLE has no __del__ methods at all now. Tkinter.py currently contains two. so-they-contained-__del__-but-weren't-trash-ly y'rs - tim
On Fri, Mar 03, 2000 at 08:38:43PM -0500, Tim Peters wrote:
So here's what I'd consider doing: explicit is better than implicit, and in the face of ambiguity refuse the temptation to guess.
I like Marc's suggestion. Here is my proposal: Allow classes to have a new method, __cleanup__ or whatever you want to call it. When tp_clear is called for an instance, it checks for this method. If it exists, call it, otherwise delete the container objects from the instance's dictionary. When collecting cycles, call tp_clear for instances first. Its simple and allows the programmer to cleanly break cycles if they insist on creating them and using __del__ methods. Neil
nascheme@enme.ucalgary.ca wrote:
On Fri, Mar 03, 2000 at 08:38:43PM -0500, Tim Peters wrote:
So here's what I'd consider doing: explicit is better than implicit, and in the face of ambiguity refuse the temptation to guess.
I like Marc's suggestion. Here is my proposal:
Allow classes to have a new method, __cleanup__ or whatever you want to call it. When tp_clear is called for an instance, it checks for this method. If it exists, call it, otherwise delete the container objects from the instance's dictionary. When collecting cycles, call tp_clear for instances first.
Its simple and allows the programmer to cleanly break cycles if they insist on creating them and using __del__ methods.
Right :-) -- Marc-Andre Lemburg ______________________________________________________________________ Business: http://www.lemburg.com/ Python Pages: http://www.lemburg.com/python/
participants (9)
-
Greg Stein -
Guido van Rossum -
Jeremy Hylton -
M.-A. Lemburg -
Moshe Zadka -
nascheme@enme.ucalgary.ca -
Skip Montanaro -
Tim Peters -
Vladimir Marangozov