
Since the spring of 2003, I have been developing Crux, which is a computer program for phylogenetic inferencing (bioinformatics) research. In March of 2004, I switched Crux to using Python from having used a different embeddable interpreter. For the most part, I have been very happy with Python, but Python's garbage collector has been a major source of frustration. Below, I describe my trials and tribulations with Python's GC. I also offer some suggestions for changes to Python; if any of the proposed changes receive favorable feedback, I am willing to help develop patches to Python. Naturally, if I am somehow abusing Python, and there are better ways to do things, I'd be happy to hear how to improve Crux. The important aspect of Crux is that it deals with trees. These trees are unrooted (there is no up or down), and multifurcating (nodes can have an arbitrary number of neighboring nodes). Thus, the trees are self-referential, and without the cyclic GC capabilities of Python, there would be little hope of making these trees integrate well with Python. Following is a diagram that illustrates the linkage between various objects for a simple tree. Crux exposes all of the components of the trees as Python objects. All lines in the diagram represent bi-directional references (except for the T-->N line). Every object refers to the tree object; those lines are left out in order to reduce clutter. T: Tree N N N: Node \ / E: Edge R R R: Ring \ / E E \ / R---------R / \ / \ / \ / \ | \ / | | \ / | | T--->N | | | | \ | / \ | / \----R----/ | E | R | N At the C (not Python object) level, the R-E-R construct is actually a set of structures that are allocated/deallocated as a single unit. Edges are *always* connected to two rings, so there's no point in allocating these separately. Also, lone ring objects actually are rings with one member; they refer to themselves (prev and next pointers). That should be enough information to understand the problems I encountered. 1) I originally had lone rings refer to themselves (ob_refcnt started out at 3; 2 self-references and one reference held by the associated edge). This didn't work. It appears that the cyclic GC does not actually calculate the number of live references to objects (references that are reachable by traversing all objects accessible from the root set); instead it assumes that if tp_clear() doesn't drop the reference count to a number that equals the number of references from live objects, there must still be references from live objects. Unfortunately, visit()ing self does not work, so there is no way to convince Python that all references are from unreachable objects. Working around this in Crux requires a lot of extra reference counting complexity, because there are three different cases for reference counts, depending on how many members there are in a ring (1, 2, or 3+ members). 2) This issue is really just a different manifestation of issue (1). At the C (not Python object) level, each node only actually stores a pointer to a single member of the associated ring. Given a single ring member, it is possible to traverse the ring and reach all other ring members. As mentioned in issue (1), the cyclic GC expects tp_traverse() to call visit() once for each reference held. It is not enough for a node to visit() one ring member; it must visit() all ring members, in order for the GC to count how many references are from unreachable objects, versus reachable from the root set. In summary, issues (1) and (2) are due to how the cyclic GC does the "marking" phase of mark/sweep GC. My expectation of mark/sweep GC is that it should be sufficient to assure that all objects reachable from the root set are visit()ed at least once; it should not be important how many times each unreachable object is visit()ed. I don't have a deep enough understanding of the Python interpreter to give a detailed suggestion for improvement. I suspect that determining the root set is not an easy operation; if this is the case, then I think we're stuck with the current design. If determining the root set *is* easy (or even possible), then I would suggest using a standard mark/sweep collector, where unreachable objects are scheduled for destruction. tp_traverse(), tp_clear(), and tp_dealloc() would retain the same structure; the only difference would be the logic that determines which objects can be destroyed. 3) A strange thing can happen when tp_clear() is called. Suppose that an edge object is being cleared, and it drops its references to the associated rings. If ob_refcnt of one of the rings drops to 0 as a result, Python will tp_dealloc() the ring *right* *now*, without ever calling tp_clear() on the ring. That means that I have to keep track of whether tp_clear() has been called on objects, if it is at all important that tp_clear() be called, so that I can manually do so in tp_dealloc(), if necessary. It is in my opinion reasonable to have cleanup code in tp_clear(), with the assumption that it will be called precisely once, but Python makes me do extra work to make sure that this happens. This should be pretty easy to change. A single bit per object is needed to keep track of whether tp_clear() has been called. I think this only needs to be done for object types that support cyclic GC. 4) There is no way to control the order in which objects are tp_dealloc()ed. This is a problem for the R-E-R construct, since at a low level, these objects are always linked together. What I would like to do is defer tp_dealloc() on the edge until after both rings have been deleted. Instead, I am forced to use a reference-counted deletion function. Not calling self->ob_type->tp_free() on the edge in tp_dealloc() until later is not a reasonable option, because this defers deletion of the edge until a later round of garbage collection. This could be addressed in the Python interpreter by paying heed to the return value of tp_dealloc(). If the return value is non-zero, move the object to the end of the list of objects to be destroyed, so that destruction is tried later. This allows the module to impose its own destruction ordering. I look forward to feedback. Thank you, Jason Evans

1) I originally had lone rings refer to themselves (ob_refcnt started out at 3; 2 self-references and one reference held by the associated edge).
I would have thought the refcount should have been 4: two self references, plus one from E, plus one from N (maybe I don't understand what make a ring "lone", though).
This didn't work. It appears that the cyclic GC does not actually calculate the number of live references to objects (references that are reachable by traversing all objects accessible from the root set);
That's correct. It doesn't need to.
instead it assumes that if tp_clear() doesn't drop the reference count to a number that equals the number of references from live objects, there must still be references from live objects.
That's also correct. tp_clear is called for objects that are not reachable from the root set, anymore, so it *should* break all non-reachable cycles.
Unfortunately, visit()ing self does not work, so there is no way to convince Python that all references are from unreachable objects.
Wrong. If tp_clear is called, you already know for certain that "self" is garbage. So you need to clear all references - regardless of whether you think you should: trust us, we know what we do.
Working around this in Crux requires a lot of extra reference counting complexity, because there are three different cases for reference counts, depending on how many members there are in a ring (1, 2, or 3+ members).
Why is that? Just DECREF every PyObject* in "self", then set the PyObject* to NULL. No need for any additional reference counts.
2) This issue is really just a different manifestation of issue (1).
At the C (not Python object) level, each node only actually stores a pointer to a single member of the associated ring. Given a single ring member, it is possible to traverse the ring and reach all other ring members. As mentioned in issue (1), the cyclic GC expects tp_traverse() to call visit() once for each reference held. It is not enough for a node to visit() one ring member; it must visit() all ring members, in order for the GC to count how many references are from unreachable objects, versus reachable from the root set.
Hmmm. Even if a ring member holds only a single reference to another ring member, would not calling visit() on this member in turn invoke visit() on the other one, which would, in turn, invoke visit() on the next one, and so on?
In summary, issues (1) and (2) are due to how the cyclic GC does the "marking" phase of mark/sweep GC. My expectation of mark/sweep GC is that it should be sufficient to assure that all objects reachable from the root set are visit()ed at least once; it should not be important how many times each unreachable object is visit()ed.
Correct. Indeed, cyclic GC makes sure visit is not called more often than needed.
I don't have a deep enough understanding of the Python interpreter to give a detailed suggestion for improvement. I suspect that determining the root set is not an easy operation; if this is the case, then I think we're stuck with the current design. If determining the root set *is* easy (or even possible), then I would suggest using a standard mark/sweep collector, where unreachable objects are scheduled for destruction.
That is already the case. tp_clear is the procedure that performs the destruction.
tp_traverse(), tp_clear(), and tp_dealloc() would retain the same structure; the only difference would be the logic that determines which objects can be destroyed.
I don't see a need for change. The logic that determines the objects to be destroyed is already "precise", determining only unreachable objects.
3) A strange thing can happen when tp_clear() is called. Suppose that an edge object is being cleared, and it drops its references to the associated rings. If ob_refcnt of one of the rings drops to 0 as a result, Python will tp_dealloc() the ring *right* *now*, without ever calling tp_clear() on the ring. That means that I have to keep track of whether tp_clear() has been called on objects, if it is at all important that tp_clear() be called, so that I can manually do so in tp_dealloc(), if necessary. It is in my opinion reasonable to have cleanup code in tp_clear(), with the assumption that it will be called precisely once, but Python makes me do extra work to make sure that this happens.
Your expectation is wrong. tp_clear is not the place to perform cleanup. It is the place to break cycles. tp_dealloc is the place to perform cleanup.
4) There is no way to control the order in which objects are tp_dealloc()ed.
Correct. Unfortunately, there is no way to change that. For cyclic structures, there is no natural "starting point". Python picks an arbitrary starting point, assuming that the order does not matter. If class objects have an __del__, it assumes that the order does matter, and gives up - putting the objects into gc.garbage.
This could be addressed in the Python interpreter by paying heed to the return value of tp_dealloc(). If the return value is non-zero, move the object to the end of the list of objects to be destroyed, so that destruction is tried later. This allows the module to impose its own destruction ordering.
I don't understand. tp_dealloc does not have a return value, so what does "pay heed" mean? Regards, Martin

[Martin v. Löwis]
Hmmm. Even if a ring member holds only a single reference to another ring member, would not calling visit() on this member in turn invoke visit() on the other one, which would, in turn, invoke visit() on the next one, and so on?
Yes, but not recursively (which that description appears to imply), and not because of what's passed *to* visit(). More precisely, visit(some_pointer) does different things during different phases of gc (different phases pass different gcmodule.c callback functions to tp_traverse() implementations). None of them lead to recursive calls to the callback. Maybe a bit ironically, refcount-based deletion can require unbounded stack depth (hence the "trashcan" mechanism to avoid blowing the C stack), but Python's cyclic gc isn't recursive at all. I'm guessing that Jason is confusing the update_refs() and subtract_refs() phases of Python's gc with the "mark" phase of a mark/sweep collector. If so, the visit() callbacks used by those are actually irrelevant to the set of objects whose tp_traverse() is called during those phases. The tp_traverse() of *every* object in the generation being collected is invoked by that pair of phases, and the set of objects passed *to* the visit callbacks during those phases has no effect on the set of objects whose tp_traverse() gets called during those phases. So one of Jason's objects will get its tp_traverse() called by those phases if and only if it's in the generation currently being collected, and it's currently tracked by gc. It doesn't matter whether, e.g., anything points at it, or whether it points at anything (although if nothing anywhere points at it, it would already have been destroyed via refcounting).

On Sun, Oct 10, 2004 at 11:36:11PM +0200, "Martin v. L?wis" wrote:
1) I originally had lone rings refer to themselves (ob_refcnt started out at 3; 2 self-references and one reference held by the associated edge).
I would have thought the refcount should have been 4: two self references, plus one from E, plus one from N (maybe I don't understand what make a ring "lone", though).
You understand correctly, except that the R-E-R construct starts out not being attached to any nodes. The "lone" ring term was my attempt to differentiate this: R from these: R-R R---R R-R \ / | | R R-R
Working around this in Crux requires a lot of extra reference counting complexity, because there are three different cases for reference counts, depending on how many members there are in a ring (1, 2, or 3+ members).
Why is that? Just DECREF every PyObject* in "self", then set the PyObject* to NULL. No need for any additional reference counts.
The low level tree code is implemented completely separately from the Python object wrapper code. This means that, for example, the Python "Node" object does not actually store PyObject* pointers to Ring objects; instead it uses the low level tree code to find out what Ring objects are attached to it. Crux was designed this way in order to be able to implement various low level algorithms such that they never have to call interpreter-related code. As such, reference breaking code must actually tear down the low level tree in such a way that it is always "consistent" (simply setting pointers to NULL doesn't fit well with this way of doing things).
Hmmm. Even if a ring member holds only a single reference to another ring member, would not calling visit() on this member in turn invoke visit() on the other one, which would, in turn, invoke visit() on the next one, and so on?
Yes, this is true. The problem I ran into had to do with the node logically holding references to all the ring objects, but only reporting one of them. That is, the node's tp_traverse function was not reporting all references, though it was reporting enough of them to guarantee that all reachable objects were visited. At this point, it is clear to me that Python's cyclic GC does not do mark/sweep GC, and that this won't change, so there's probably not much point in discussing this issue further.
3) A strange thing can happen when tp_clear() is called. Suppose that an edge object is being cleared, and it drops its references to the associated rings. If ob_refcnt of one of the rings drops to 0 as a result, Python will tp_dealloc() the ring *right* *now*, without ever calling tp_clear() on the ring. That means that I have to keep track of whether tp_clear() has been called on objects, if it is at all important that tp_clear() be called, so that I can manually do so in tp_dealloc(), if necessary. It is in my opinion reasonable to have cleanup code in tp_clear(), with the assumption that it will be called precisely once, but Python makes me do extra work to make sure that this happens.
Your expectation is wrong. tp_clear is not the place to perform cleanup. It is the place to break cycles. tp_dealloc is the place to perform cleanup.
What if breaking cycles and cleanup are the same thing? In the case of Crux, they are one and the same, since the low level tree must be torn down one piece at a time. Apparently I was mistaken in viewing tp_clear and tp_dealloc as two stages of destruction.
4) There is no way to control the order in which objects are tp_dealloc()ed.
Correct. Unfortunately, there is no way to change that. For cyclic structures, there is no natural "starting point". Python picks an arbitrary starting point, assuming that the order does not matter. If class objects have an __del__, it assumes that the order does matter, and gives up - putting the objects into gc.garbage.
This could be addressed in the Python interpreter by paying heed to the return value of tp_dealloc(). If the return value is non-zero, move the object to the end of the list of objects to be destroyed, so that destruction is tried later. This allows the module to impose its own destruction ordering.
I don't understand. tp_dealloc does not have a return value, so what does "pay heed" mean?
I should have said tp_clear here. In Modules/gcmodule.c:delete_garbage(), the clear() call does not pay attention to the return value. It would be possible to check the return value, and defer deallocation if the return value were non-zero, and ob_refcnt were 1. (However, it would probably have to be a separate list of objects, contrary to what I claimed before.) Thanks for your comments, Martin. I'm starting to understand the Python GC world view a bit better. Jason

On Oct 11, 2004, at 2:59 AM, Jason Evans wrote:
Working around this in Crux requires a lot of extra reference counting complexity, because there are three different cases for reference counts, depending on how many members there are in a ring (1, 2, or 3+ members).
Why is that? Just DECREF every PyObject* in "self", then set the PyObject* to NULL. No need for any additional reference counts.
The low level tree code is implemented completely separately from the Python object wrapper code. This means that, for example, the Python "Node" object does not actually store PyObject* pointers to Ring objects; instead it uses the low level tree code to find out what Ring objects are attached to it. Crux was designed this way in order to be able to implement various low level algorithms such that they never have to call interpreter-related code. As such, reference breaking code must actually tear down the low level tree in such a way that it is always "consistent" (simply setting pointers to NULL doesn't fit well with this way of doing things).
You've now reached the point where the Python-GC-ease-of-implementation breaks down. We have encountered the same problem - the GC is fine as long as you're solely extending Python, but if you're embedding it, chances are you'll encounter some interesting issues like this along the way. It's not that you can't make it work - you can, but need to do a lot more work yourself. Basically, if I understand you correctly (and I may not be), there are times when you do not want tp_dealloc to actually destroy the low-level data structure, because somebody else (not living in the Python reference-counted world) isn't done with it yet. The problem is in knowing when those times are, and when they aren't. In this case, you may need to implement basic reference counting in your own low-level data structures, so you know when you're really done with a ring and ready to destroy it. The other option is to attempt to make sure that you're not creating cycles (in python) and avoid interfacing with GC entirely, but that may not be possible in your case. -- Nick

[Jason Evans] ...
The low level tree code is implemented completely separately from the Python object wrapper code. This means that, for example, the Python "Node" object does not actually store PyObject* pointers to Ring objects; instead it uses the low level tree code to find out what Ring objects are attached to it.
I think you're in for a rude awakening if you expect any language's gc support to help you manage arbitrary blobs of memory obtained from the system malloc(). If all the nodes in your object graph derived from PyObject, Python's gc would do it all for you. If you want to plant pointers in and among arbitrary C structs, then you'll have to implement your own gc, or hope that you can bash a general gc thingie (like BDW -- and I'm not sure there is anything else of that sort usefully available, so "like" is probably optimistic) into guessing the layout of your structs.
Crux was designed this way in order to be able to implement various low level algorithms such that they never have to call interpreter-related code.
I don't know why that was thought to be desirable, or even exactly what it means. One plausible meaning is that you don't want to tie the "various low level algorithms" to Python, because you hope to reuse them in other, non-Python contexts. And, in addition to that, you don't want, or don't know how, to abstract the necessary operations into macros or functions that are pluggable for the different contexts you have in mind. As a very simple example, using Py_INCREF in a function doesn't tie that function to Python -- if you want to use it in a non-Python context, #define Py_INCREF(x) gets rid of such calls. But I'm guessing too much about what you might mean, so I'll stop there.
As such, reference breaking code must actually tear down the low level tree in such a way that it is always "consistent" (simply setting pointers to NULL doesn't fit well with this way of doing things).
Martin had in mind that the pointers were to properly refcounted PyObject*, in which case his advice was sound. [Martin]
Hmmm. Even if a ring member holds only a single reference to another ring member, would not calling visit() on this member in turn invoke visit() on the other one, which would, in turn, invoke visit() on the next one, and so on?
[Jason]
Yes, this is true.
It's false, but I covered that one yesterday. What's passed back to visit() has nothing to do with which objects' tp_traverse()s get called. Registering ("tracking") PyObject*s of gc'able types is what determines which objects' tp_traverse()s get called. Python's gc doesn't try to determine what's reachable, it determines what isn't reachable from pointers it *hasn't* been told about. It infers "reachable from outside" from the refcounts: if all the refcounts on an object are not accounted for by the objects gc *has* been told about, then that object is directly reachable from something gc hasn't been told about, so gc dare not touch it. The transitive closure of what's reachable from such objects is the set of all objects gc knows about that are reachable from outside. gc leaves those alone. Everything else gc knows about is necessarily cyclic trash, or reachable only from cyclic trash. It can't know anything about objects that aren't tracked by gc, so in particular can't know anything about pointers that don't point at a PyObject.
The problem I ran into had to do with the node logically holding references to all the ring objects, but only reporting one of them.
If a ring object doesn't derive from PyObject, then visit() must not be passed a pointer to a ring object. Doing so would account for many catastrophic meanings of "doesn't work" <wink>.
That is, the node's tp_traverse function was not reporting all references, though it was reporting enough of them to guarantee that all reachable objects were visited.
No, the visit() callback has nothing to do with which objects are traversed. gc determines the set of objects to traverse, before any object's tp_traverse() has been called, as a "generational" subset of all objects that have registered with gc. The pointers passed back to visit() callbacks have no effect on that set: they can't add anything to, or remove anything from, that set. The set of objects to traverse is predetermined.
At this point, it is clear to me that Python's cyclic GC does not do mark/sweep GC,
RIght, although we're only talking about CPython here (a particular implementation of Python),
and that this won't change,
In CPython that's a safe bet.
so there's probably not much point in discussing this issue further.
We can help you use Python's gc, but not if you're unwilling or unable to change anything about what you do.
... What if breaking cycles and cleanup are the same thing? In the case of Crux, they are one and the same, since the low level tree must be torn down one piece at a time.
Again Martin had in mind that all your objects derived from PyObject. If that were so, then Python's gc would guarantee not to tear down anything until every cycle it was part of became unreachable, in which case internal invariants no longer matter (since they're no longer reachable form anything except other unreachable trash). Python's gc is very nice to work with, for Python objects. If you can't make your objects PyObjects, then you get all the pain of implementing a form of gc that does work with your objects. You really should look at how core objects implement tp_traverse() and tp_clear(), to see how easy life *can* be, even in the presence of much hairier object graphs than you're working with.
I should have said tp_clear here. In Modules/gcmodule.c:delete_garbage(), the clear() call does not pay attention to the return value.
Heh. So true! See http://mail.python.org/pipermail/python-dev/2003-April/034435.html
It would be possible to check the return value, and defer deallocation if the return value were non-zero, and ob_refcnt were 1. (However, it would probably have to be a separate list of objects, contrary to what I claimed before.)
I'm not sure, but I think your desire for this was still based on misunderstanding of what Python's gc does.

On Mon, Oct 11, 2004 at 09:21:57PM -0400, Tim Peters wrote:
[Jason Evans]
I think you're in for a rude awakening if you expect any language's gc support to help you manage arbitrary blobs of memory obtained from the system malloc().
I don't expect that. What I do expect is for there to be clean and efficient mechanisms for integrating custom Python object types into Python. Python more or less meets this expectation.
If all the nodes in your object graph derived from PyObject, Python's gc would do it all for you. If you want to plant pointers in and among arbitrary C structs, then you'll have to implement your own gc, or hope that you can bash a general gc thingie (like BDW -- and I'm not sure there is anything else of that sort usefully available, so "like" is probably optimistic) into guessing the layout of your structs.
No, this is not necessary. What is necessary is for my code to accurately maintain and report all references to Python objects during traversal. The Crux code does so, and works as designed.
Crux was designed this way in order to be able to implement various low level algorithms such that they never have to call interpreter-related code.
I don't know why that was thought to be desirable, or even exactly what it means. One plausible meaning is that you don't want to tie the "various low level algorithms" to Python, because you hope to reuse them in other, non-Python contexts. And, in addition to that, you don't want, or don't know how, to abstract the necessary operations into macros or functions that are pluggable for the different contexts you have in mind.
First, "hoping" to be able use my code in other contexts is what allowed me to transition from using a different embeddable interpreter to using Python inside of two days. Had I melded all of the scripting glue code with the low level tree code, it would not have been nearly as easy to transition to using Python. Second, I have the scripting language glue and low level tree code separated for two reasons: 1) performance, and 2) modularization. There's no question of whether I want to use cpp magic to be able to munge two large piles of code into one huge pile of code, nor whether I'm capable of doing so. To do so would simply take the code in a bad direction.
[Martin]
Hmmm. Even if a ring member holds only a single reference to another ring member, would not calling visit() on this member in turn invoke visit() on the other one, which would, in turn, invoke visit() on the next one, and so on?
[Jason]
Yes, this is true.
It's false, but I covered that one yesterday.
It is true, given my interpretation of the original context of the dialogue. We were discussing whether it would work for a node object to hold a reference to a single ring object, and rely on the ring objects to report their neighbors. Indeed this would work.
The problem I ran into had to do with the node logically holding references to all the ring objects, but only reporting one of them.
If a ring object doesn't derive from PyObject, then visit() must not be passed a pointer to a ring object. Doing so would account for many catastrophic meanings of "doesn't work" <wink>.
If you want to see the Crux source code, I'll be happy to provide you with a source archive. In the meanwhile, please rest assured that the code doesn't suck nearly as badly as you are insinuating here.
We can help you use Python's gc, but not if you're unwilling or unable to change anything about what you do.
Prior to switching to using Python, Crux integrated with a language that implemented atomic mark and sweep garbage collection. I have had to change a tremendous amount about Crux in order to use a reference counted interpreter. In fact, I made some changes to Crux this morning that were a direct result of your original response to my email (ring objects now report references to themselves, which makes reference counting of ring objects much simpler). Many of the issues I ran into had to do with the impedence mismatch between my previous experience and the way Python works. I appreciate you taking the time to respond to me, especially since you've helped me gain a deeper understanding of how Python's GC works. However, please get it out of your mind that I am incompetent and/or set in my ways.
I should have said tp_clear here. In Modules/gcmodule.c:delete_garbage(), the clear() call does not pay attention to the return value.
Heh. So true! See
http://mail.python.org/pipermail/python-dev/2003-April/034435.html
It would be possible to check the return value, and defer deallocation if the return value were non-zero, and ob_refcnt were 1. (However, it would probably have to be a separate list of objects, contrary to what I claimed before.)
I'm not sure, but I think your desire for this was still based on misunderstanding of what Python's gc does.
Of the original issues I brought up, I think this one potentially still has merit, though I'm not sure that it's of sufficiently general utility. The idea is to allow extension modules some control over the order in which objects are deallocated, so that it is possible to tear down interconnections in a particular order. However, it occurs to me that since tp_clear can leave references in place, it might be possible to control the object destruction order by leaving a DAG of references in place, and letting Python deallocate objects in dependency order. Does this work? Thanks, Jason

[Tim Peters]
I think you're in for a rude awakening if you expect any language's gc support to help you manage arbitrary blobs of memory obtained from the system malloc().
[Jason Evans]
I don't expect that. What I do expect is for there to be clean and efficient mechanisms for integrating custom Python object types into Python. Python more or less meets this expectation.
OK ... but you must want more than just that, else this thread wouldn't have started <wink>.
If all the nodes in your object graph derived from PyObject, Python's gc would do it all for you. If you want to plant pointers in and among arbitrary C structs, then you'll have to implement your own gc, or hope that you can bash a general gc thingie (like BDW -- and I'm not sure there is anything else of that sort usefully available, so "like" is probably optimistic) into guessing the layout of your structs.
No, this is not necessary. What is necessary is for my code to accurately maintain and report all references to Python objects during traversal. The Crux code does so, and works as designed.
Isn't it also necessary for your code to clean up object graphs where the nodes are *not* Python objects? That's what I thought you were saying to Martin, If I misunderstood that, I apologize. Almost everything else I said was based on that belief.
Crux was designed this way in order to be able to implement various low level algorithms such that they never have to call interpreter-related code.
I don't know why that was thought to be desirable, or even exactly what it means. One plausible meaning is that you don't want to tie the "various low level algorithms" to Python, because you hope to reuse them in other, non-Python contexts. And, in addition to that, you don't want, or don't know how, to abstract the necessary operations into macros or functions that are pluggable for the different contexts you have in mind.
First, "hoping" to be able use my code in other contexts is what allowed me to transition from using a different embeddable interpreter to using Python inside of two days. Had I melded all of the scripting glue code with the low level tree code, it would not have been nearly as easy to transition to using Python.
So the "you don't want to tie ... to Python" guess was right (please realize that you haven't *said* so before), and the "don't want to abstract ... into macros or functions" part of the other guess was on target (ditto). I think you're reading judgments into my attempts to extract answers. It's fine by me if you don't want to make everything a PyObject, but then I'm also trying to explain consequences of that choice wrt what Python's gc will and won't do for you.
Second, I have the scripting language glue and low level tree code separated for two reasons: 1) performance, and 2) modularization. There's no question of whether I want to use cpp magic to be able to munge two large piles of code into one huge pile of code, nor whether I'm capable of doing so. To do so would simply take the code in a bad direction.
Maybe. I resist some here because adding just enough PyObject hair to enable a C struct to work smoothly with Python's native gc doesn't usually count as a "large pile" of code by any metric I use, and really does make gc life easy. [Martin]
Hmmm. Even if a ring member holds only a single reference to another ring member, would not calling visit() on this member in turn invoke visit() on the other one, which would, in turn, invoke visit() on the next one, and so on?
[Jason]
Yes, this is true.
It's false, but I covered that one yesterday.
It is true, given my interpretation of the original context of the dialogue. We were discussing whether it would work for a node object to hold a reference to a single ring object, and rely on the ring objects to report their neighbors. Indeed this would work.
Regardless of context, in Python's gc there is no sense in which calling a visit() callback "in turn" invokes visit() on anything. If there is a sense here in which that is true, the context must be something other than Python's gc. If the ring objects are not PyObject themselves (I thought it was clear that they weren't, but maybe that was wrong), then Python's gc gives them no way to "report their neighbors".
The problem I ran into had to do with the node logically holding references to all the ring objects, but only reporting one of them.
If a ring object doesn't derive from PyObject, then visit() must not be passed a pointer to a ring object. Doing so would account for many catastrophic meanings of "doesn't work" <wink>.
If you want to see the Crux source code, I'll be happy to provide you with a source archive.
I'd like to say "yes", but I'll never make time to look at it. I would have been happier here if you had confirmed or denied that ring objects don't derive from PyObject. I pressed before on that "doesn't work" isn't helpful, and that never got clarified. Possible meanings include segfaults, error messages, Python exceptions. and memory leaks.
In the meanwhile, please rest assured that the code doesn't suck nearly as badly as you are insinuating here.
I'm not insinuating anything about the code, I'm trying to guess what you meant in your original post by "doesn't work" this-and-that. Passing a non-PyObject* to visit() remains a possible cause for "doesn't work" in my head, because you haven't said anything to rule it out. If you in fact *had* been passing a non-PyObject* to visit(), then the paragraph above could have been helpful ("ah, that's why it didn't work!").
We can help you use Python's gc, but not if you're unwilling or unable to change anything about what you do.
Prior to switching to using Python, Crux integrated with a language that implemented atomic mark and sweep garbage collection. I have had to change a tremendous amount about Crux in order to use a reference counted interpreter. In fact, I made some changes to Crux this morning that were a direct result of your original response to my email (ring objects now report references to themselves, which makes reference counting of ring objects much simpler).
So Ring objects do derive from PyObject*. Forget everything I said then <wink>.
Many of the issues I ran into had to do with the impedence mismatch between my previous experience and the way Python works.
I sure believe that. Python's cyclic gc is unusual in several respects.
I appreciate you taking the time to respond to me, especially since you've helped me gain a deeper understanding of how Python's GC works. However, please get it out of your mind that I am incompetent and/or set in my ways.
Those aren't in my head. I do think you're so steeped in the details of your code that you're not nearly as successful at communicating the *salient* points as you think you are, though. I'm so steeped in the details of Python's gc that a similar claim could easily be made of me. I don't expect it to be clear the first time, though, so play 20-questions trying to tease out what you really need to know about it. If it's any consolation, this process has been painful on both sides <wink>.
I should have said tp_clear here. In Modules/gcmodule.c:delete_garbage(), the clear() call does not pay attention to the return value.
Heh. So true! See
http://mail.python.org/pipermail/python-dev/2003-April/034435.html
It would be possible to check the return value, and defer deallocatin if the return value were non-zero, and ob_refcnt were 1. (However, it would probably have to be a separate list of objects, contrary to what I claimed before.)
I'm not sure, but I think your desire for this was still based on misunderstanding of what Python's gc does.
Of the original issues I brought up, I think this one potentially still has merit, though I'm not sure that it's of sufficiently general utility. The idea is to allow extension modules some control over the order in which objects are deallocated, so that it is possible to tear down interconnections in a particular order.
Well, think about it some more then. "A separate list of objects" is very easy to create efficiently in Python's gc internals -- each gc'able object is in *some* doubly-linked list, so the cost of a new list is just a new header node to identify it, and moving a gc'able object from one list to another is 6 pointer stores.
However, it occurs to me that since tp_clear can leave references in place, it might be possible to control the object destruction order by leaving a DAG of references in place, and letting Python deallocate objects in dependency order. Does this work?
It could. Here it should help to know that Python's cyclic gc never calls an object's tp_dealloc directly. The only way tp_dealloc ever triggers in Python is as a side effect of a refcount falling to zero -- no exceptions. So, indeed, "all" cyclic gc does is break cycles, and the only places garbage actually gets recycled in gcmodule.c are as side effects of the Py_DECREF(op); lines in delete_garbage() and release_weakrefs(). Weakrefs are probably irrelevant to you at this point, so there's only one Py_DECREF() in the codebase you care about. So, e.g, if you have a simple cycle like A -> B -> C -> ... -> Z -> A and the tp_clear() calls break only the link from Z to A, that's all they *need* to do, and then A's tp_dealloc() will be called first, then B's tp_dealloc, ..., and finally Z's tp_dealloc. You should be aware that long chains of that sort can end up overflowing the C stack, depeding on how tp_dealloc is coded. That's in CPython, of course. Jython uses Java's native gc, and I suppose (but don't know that) IronPython uses .NET's.

On Mon, Oct 11, 2004 at 11:34:34PM -0400, Tim Peters wrote:
[Tim Peters]
I think you're in for a rude awakening if you expect any language's gc support to help you manage arbitrary blobs of memory obtained from the system malloc().
[Jason Evans]
I don't expect that. What I do expect is for there to be clean and efficient mechanisms for integrating custom Python object types into Python. Python more or less meets this expectation.
OK ... but you must want more than just that, else this thread wouldn't have started <wink>.
I initially started this thread thinking that I had some useful suggestions about how Python's GC might be modified. Had I realized in advance that my suggestions were based on my own misconceptions about Python's GC, I would not have bothered the people in this forum. I apologize for reducing the signal to noise ratio of python-dev.
Many of the issues I ran into had to do with the impedence mismatch between my previous experience and the way Python works.
I sure believe that. Python's cyclic gc is unusual in several respects.
Indeed. This brings up a question about documentation: as far as I know, I've read all of the available documentation regarding cyclic GC (the "Supporting cyclic garbage collection" section of "Extending and Embedding the Python Interpreter"), but those docs came nowhere near giving me an adequate understanding of the design rationale behind the cyclic GC. Did I overlook some documentation? Of course, it may be that I am actually at a cognitive disadvantage to someone who has only experienced Python's memory management strategy...
I appreciate you taking the time to respond to me, especially since you've helped me gain a deeper understanding of how Python's GC works. However, please get it out of your mind that I am incompetent and/or set in my ways.
Those aren't in my head. I do think you're so steeped in the details of your code that you're not nearly as successful at communicating the *salient* points as you think you are, though. I'm so steeped in the details of Python's gc that a similar claim could easily be made of me. I don't expect it to be clear the first time, though, so play 20-questions trying to tease out what you really need to know about it. If it's any consolation, this process has been painful on both sides <wink>.
My critical error in trying to communicate the salient points of the Crux code was that I didn't realize the importance of comminicating the fact that the trees are implemented as a combination of separated low level tree data structures and Python wrappers to the low level implementation. Thanks for your help, Tim. Jason

On Wed, Oct 13, 2004, Jason Evans wrote:
Indeed. This brings up a question about documentation: as far as I know, I've read all of the available documentation regarding cyclic GC (the "Supporting cyclic garbage collection" section of "Extending and Embedding the Python Interpreter"), but those docs came nowhere near giving me an adequate understanding of the design rationale behind the cyclic GC. Did I overlook some documentation?
Of course, it may be that I am actually at a cognitive disadvantage to someone who has only experienced Python's memory management strategy...
As Tim said earlier, though less pungently, "Read the source, Luke." -- Aahz (aahz@pythoncraft.com) <*> http://www.pythoncraft.com/ WiFi is the SCSI of the 21st Century -- there are fundamental technical reasons for sacrificing a goat. (with no apologies to John Woods)

Aahz <aahz@pythoncraft.com> writes:
On Wed, Oct 13, 2004, Jason Evans wrote:
Indeed. This brings up a question about documentation: as far as I know, I've read all of the available documentation regarding cyclic GC (the "Supporting cyclic garbage collection" section of "Extending and Embedding the Python Interpreter"), but those docs came nowhere near giving me an adequate understanding of the design rationale behind the cyclic GC. Did I overlook some documentation?
Of course, it may be that I am actually at a cognitive disadvantage to someone who has only experienced Python's memory management strategy...
As Tim said earlier, though less pungently, "Read the source, Luke."
Indeed, but also: don't be scared! Modules/gcmodule.c isn't that long, and much of what it is is comments. Mind you, it was probably easier to understand before it worked properly with weak references... Cheers, mwh -- Unfortunately, nigh the whole world is now duped into thinking that silly fill-in forms on web pages is the way to do user interfaces. -- Erik Naggum, comp.lang.lisp

On Wed, Oct 13, 2004 at 07:54:04AM -0700, Jason Evans wrote:
This brings up a question about documentation: as far as I know, I've read all of the available documentation regarding cyclic GC (the "Supporting cyclic garbage collection" section of "Extending and Embedding the Python Interpreter"), but those docs came nowhere near giving me an adequate understanding of the design rationale behind the cyclic GC. Did I overlook some documentation?
It's a pretty old document and perhaps a little out of date but here's something I wrote while working on the Python GC: http://arctrix.com/nas/python/gc/ Looking back, I realize that the constraints were pretty extreme which probably accounts for the unique implementation. A few off the top of my head: * binary compatible with previous version of Python * optional * small overhead for programs that don't create reference cycles * __del__ methods should still work Cheers, Neil

[Jason Evans]
Since the spring of 2003, I have been developing Crux, which is a computer program for phylogenetic inferencing (bioinformatics) research. In March of 2004, I switched Crux to using Python from having used a different embeddable interpreter. For the most part, I have been very happy with Python, but Python's garbage collector has been a major source of frustration.
Then you're not using it correctly -- it's easy if you are. You didn't show actual code, though, and I don't have time to guess what you might have done.
... The important aspect of Crux is that it deals with trees.
The details shouldn't matter; Python's cyclic gc handles arbitrary graph structures, including multiple self-loops if you want them. The Python core creates graph structures of boggling complexity, so it might help you to study how core objects implement their tp_traverse and tp_clear methods. They're mostly very simple. Where they're not very simple, it's usually because someone had a silly idea of "optimization" in mind. ...
1) I originally had lone rings refer to themselves (ob_refcnt started out at 3; 2 self-references and one reference held by the associated edge). This didn't work.
"Didn't work" leaves the reader guessing about everything. Showing code, and being specific about both what you expected and what you observed, are necessary. python-dev isn't really the right place for that, though.
It appears that the cyclic GC does not actually calculate the number of live references to objects (references that are reachable by traversing all objects accessible from the root set);
Python's gc has no "root set" concept, so thinking in those terms won't help. It won't grow a root-set concept, either: "foreign" extension modules (like Tcl/Tk, or library packages written in Fortran) can hold what would normally be considered root-set objects for Python, and that's fine. They're not required to cooperate with Python's memory management beyond respecting Python's refcount rules. Forcing modules (via some new API) to identify root-set objects they own would be backward-incompatible (not to mention a major new burden on extension authors and users), so won't happen.
instead it assumes that if tp_clear() doesn't drop the reference count to a number that equals the number of references from live objects, there must still be references from live objects. Unfortunately, visit()ing self does not work, so there is no way to convince Python that all references are from unreachable objects.
Again I can't guess what "does not work" means. But I can assure you it "would work" if I wrote it <wink>. It's easy: an object's tp_traverse must call the visit() callback exactly once for each non-NULL PyObject* *directly* (in one step) reachable from the object, and that's all. If self is reachable from self via a refcounted pointer held by self, then self's tp_traverse() implementation must call visit(self, arg). But I think you're getting off track even *caring* what your pointers point at. If some object type T has three Py_Object* pointers pa, pb, pc, then using Python 2.4's Py_VISIT macro, T's tp_traverse should be coded static int T_traverse(T *self, visitproc visit, void *arg) { Py_VISIT(self->pa); Py_VISIT(self->pb); Py_VISIT(self->pc); return 0; } It doesn't matter what pa (etc) points at. If pa points to self, fine; if it doesn't, also fine.
Working around this in Crux requires a lot of extra reference counting complexity, because there are three different cases for reference counts, depending on how many members there are in a ring (1, 2, or 3+ members).
Please show code. If, e.g., you have a vector v of length n holding the PyObject* "members", then int i; for (i = 0; i < self->n; ++i) Py_VISIT(self->v[i]); return 0; is an appropriate tp_traverse(). For example, that's exactly what the tp_traverse() for Python's builtin list objects does, and a Python list L can certainly be an element of itself:
L = [] L.append(L) L[0] is L True
There's no special-casing needed for the number of containees, or for the nature of what they do or don't point at. It's also irrelevant what can be reached *from* the direct containees.
2) This issue is really just a different manifestation of issue (1).
At the C (not Python object) level, each node only actually stores a pointer to a single member of the associated ring.
Please show code. If the C object contains exactly one pointer, then its tp_traverse should call visit() exactly once; etc. The correct C-level code has entirely to do with what the C struct contains, and nothing to do with the view presented at the Python level.
Given a single ring member, it is possible to traverse the ring and reach all other ring members.
That's irrelevant to gc. tp_traverse() is only responsible for revealing the objects *directly* reachable from self, or, more precisely, for revealing the PyObject* pointers on which self owns a reference. gc computes the transitive closure itself; tp_traverse() isn't responsible for that.
As mentioned in issue (1), the cyclic GC expects tp_traverse() to call visit() once for each reference held.
Yes.
It is not enough for a node to visit() one ring member;
Don't know what it means for "a node to visit()". If the C struct has exactly one pointer, then it is enough for visit() to be invoked exactly once with that pointer. It would be incorrect not to visit() that pointer, and it would be incorrect to visit() more pointers than just that one.
it must visit() all ring members,
If and only if for each ring member M, self contains a pointer directly to M. Showing code is much better than English. View your C-level objects as a directed graph. tp_traverse at a node `self` is reponsible for revealing self's outgoing edges, no less and no more than that.
in order for the GC to count how many references are from unreachable objects, versus reachable from the root set.
In summary, issues (1) and (2) are due to how the cyclic GC does the "marking" phase of mark/sweep GC.
It's not a mark/sweep collector.
My expectation of mark/sweep GC is
You expectation is irrelevant <wink>, since it's not a mark/sweep collector.
that it should be sufficient to assure that all objects reachable from the root set are visit()ed at least once; it should not be important how many times each unreachable object is visit()ed.
Forget roots sets and mark/sweep collectors. If you want to know how Python's gc actually works, read the comments in gcmodule.c's collect() function, and drill down from there as deep as your curiosity takes you.
I don't have a deep enough understanding of the Python interpreter to give a detailed suggestion for improvement. I suspect that determining the root set is not an easy operation; if this is the case, then I think we're stuck with the current design.
We are, but I don't feel "stuck" with it -- it's a very good design.
If determining the root set *is* easy (or even possible), then I would suggest using a standard mark/sweep collector, where unreachable objects are scheduled for destruction.
The root set is impossible to determine without active cooperation from extension modules. Python's gc can infer the existence of roots, but not their identies; it can infer the transitive closure of what's reachable from the root set intersected with the set of objects that have registered with gc, but knows nothing about the roots themselves.
tp_traverse(), tp_clear(), and tp_dealloc() would retain the same structure; the only difference would be the logic that determines which objects can be destroyed.
3) A strange thing can happen when tp_clear() is called. Suppose that an edge object is being cleared, and it drops its references to the associated rings. If ob_refcnt of one of the rings drops to 0 as a result, Python will tp_dealloc() the ring *right* *now*, without ever calling tp_clear() on the ring.
That depends on how the tp_dealloc() for the ring is coded, so is up to you. The only *necessary* thing for tp_clear() to do is to clear enough pointers to break cycles. Some people do only that much. For example, Python tuples don't even implement tp_clear, because a cycle involving tuples necessarily involves non-tuple objects too. Other people write tp_clear() so that it can be called from their tp_dealloc() function too. That's up to them. If you want to do the latter, it's good to write tp_clear() in such a way that calling it more than once is harmless; e.g., if (self->pa) { Py_DECREF(self->pa); self->pa = NULL; } etc or, using 2.4's Py_CLEAR macro, Py_CLEAR(self->pa); etc.
That means that I have to keep track of whether tp_clear() has been called on objects, if it is at all important that tp_clear() be called, so that I can manually do so in tp_dealloc(),
As above, make your tp_clear() idempotent and then you don't have to keep track of anything. It's usually good practice for a tp_clear() to be robust against NULL pointers anyway.
if necessary. It is in my opinion reasonable to have cleanup code in tp_clear(), with the assumption that it will be called precisely once, but Python makes me do extra work to make sure that this happens.
You won't get a guarantee that tp_clear() will be called exactly once. You have a guarantee that tp_dealloc() will be called no more than once. It's actually unusual to find a tp_dealloc() that calls its type's tp_clear(), but I think it's a nice pattern when you can get away with it. In such cases, tp_clear() should be coded so that it can be called multiple times, and that's generally very easy to do.
This should be pretty easy to change. A single bit per object is needed to keep track of whether tp_clear() has been called. I think this only needs to be done for object types that support cyclic GC.
There's no such thing as "one bit" in reality: because of malloc memory padding, "one more bit" would require adding 4 bytes to every gc'able object. That's an extraordinary expense. Luckily, it's not needed.
4) There is no way to control the order in which objects are tp_dealloc()ed.
It's true that there's no direct way to control the order.
This is a problem for the R-E-R construct, since at a low level, these objects are always linked together.
Many things are always linked together in the core too. Why that's "a problem" for you isn't clear. It's not a problem in the core, for example.
What I would like to do is defer tp_dealloc() on the edge until after both rings have been deleted. Instead, I am forced to use a reference-counted deletion function.
Sorry, don't know what that means, and (as above) don't know why you care.
Not calling self->ob_type->tp_free() on the edge in tp_dealloc() until later is not a reasonable option, because this defers deletion of the edge until a later round of garbage collection.
This could be addressed in the Python interpreter by paying heed to the return value of tp_dealloc(). If the return value is non-zero, move the object to the end of the list of objects to be destroyed,
Except there is no list of objects to be destroyed.
so that destruction is tried later. This allows the module to impose its own destruction ordering.
That part would need clearer motivation, preferably in the form of a PEP.
participants (7)
-
"Martin v. Löwis"
-
Aahz
-
Jason Evans
-
Michael Hudson
-
Neil Schemenauer
-
Nick Bastin
-
Tim Peters