
I've found what I consider a major problem with the garbage collector in the Python 2.3 CVS tree. Here is a small kernel that demonstrates the problem:
lst = [] for i in range(100000): lst.append( (1,) )
The key ingredients are:
1) A method is called on a container (rather than __setitem__ or __setattr__).
2) A new object is allocated while the method object lives on the Python VM stack, as shown by the disassembled bytecodes:
40 LOAD_FAST 1 (lst) 43 LOAD_ATTR 3 (append) 46 LOAD_CONST 2 (1) 49 BUILD_TUPLE 1 52 CALL_FUNCTION 1
These ingredients combine in the following way to trigger quadratic-time behavior in the Python garbage collector:
* First, the LOAD_ATTR on "lst" for "append" is called, and a PyCFunction is returned from this code in descrobject.c:method_get:
return PyCFunction_New(descr->d_method, obj);
Thus, a _new_ PyCFunction is allocated every time the method is requested.
* This new method object is added to generation 0 of the garbage collector, which holds a reference to "lst".
* The BUILD_TUPLE call may then trigger a garbage collection cycle.
* Since the "append" method is in generation 0, the reference traversal must also follow all objects within "lst", even if "lst" is in generation 1 or 2. This traversal requires time linear in the number of objects in "lst", thus increasing the overall time complexity of the code to quadratic in the number of elements in "lst".
Also note that this is a much more general problem than this small example. It can affect many types of objects in addition to methods, including descriptors, iterator objects, and any other object that contains a "back reference".
So, what can be done about this.... One simple solution would be to not traverse some "back references" if we are collecting objects in generation 0. This will avoid traversing virtually all of these ephemoral objects that will trigger such expensive behavior. If they live long enough to pass through to generation one or two, then clearly they should be traversed.
So, what do all of you GC gurus think? Provided that my analysis is sound, I can rapidly propose a patch to demonstrate this approach if there is sufficient positive sentiment.
There is a bug open on sourceforge on this issue, so feel free to reply via python-dev or via the bug -- I read both. As usual sourceforge is buggered, so I have not been able to update the bug with the contents of this e-mail.
http://sourceforge.net/tracker/?func=detail&atid=105470&aid=572567&a...
Regards, -Kevin
PS: I have not looked into why this doesn't happen in Python 2.2.x or before. I suspect that it must be related to the recent GC changes in methodobject.py. I'm not motivated to spend much time looking into this, because the current GC behavior is technically correct, though clearly sub-optimal.
-- Kevin Jacobs The OPAL Group - Enterprise Systems Architect Voice: (216) 986-0710 x 19 E-mail: jacobs@theopalgroup.com Fax: (216) 986-0714 WWW: http://www.theopalgroup.com

I had a different ideas to solve this performance problem and perhaps others. It's only half baked, but I thought it was at least worth mentioning in an e-mail. The premise is that the garbage collector tracks a lot of objects that will never participate in cycles and can never participate in cycles. The idea is to avoid tracking objects until it becomes possible for them to participate in a collectible cycle.
For example, an object referenced from a local variable will never be collected until after the frame releases its reference. So what if we did not track objects that were stored in local variables? To make this work, we would need to change the SETLOCAL macro in ceval to track the object that it was DECREFing. There are a lot of little details that would make this complicated unfortunately. All new container objects are tracked, so we would need to untrack ones that are stored in local variables. To track objects on DECREF, we would also need to ask if the object type was GC-enabled.
Another kind of object that is never going to participate in a cycle, I think, is an object that lives only temporarily on the ceval stack. For example, a bound method object created on the stack in order to be called. If it's never bound to another object as an attribute or stored in local variable, it can never participate in the cycle.
How hard would it be to add logic that avoided tracking objects until it was plausible that they would participate in a cycle?
Jeremy

On Fri, 28 Jun 2002, Jeremy Hylton wrote:
I had a different ideas to solve this performance problem and perhaps others. It's only half baked, but I thought it was at least worth mentioning in an e-mail. The premise is that the garbage collector tracks a lot of objects that will never participate in cycles and can never participate in cycles. The idea is to avoid tracking objects until it becomes possible for them to participate in a collectible cycle.
Hi Jeremy,
You have an interesting idea, though I'd state the premise slightly differently. How about:
The premise is that the garbage collector tracks a lot of objects that will never participate in collectible cycles, because untraceable references are held. The idea is to avoid tracking these objects until it becomes possible for them to participate in a collectible cycle.
(virtually any object _can_ participate in a cycle -- most just never do)
Offhand, I am not sure if my idea of ignoring certain references in generation 0 or your idea will work better in practice. Both require adding more intelligence to the garbage collection system via careful annotations. I wouldn't be surprised if the optimal approach involved both methods.
How hard would it be to add logic that avoided tracking objects until it was plausible that they would participate in a [collectable] cycle?
I can work up a patch that does this. Can anyone else think of places where this makes sense, other than frame objects and the ceval stack?
Also, any thoughts on my approach? I have a hard time thinking of any situation that generates enough cyclic garbage where delaying collection until generation 1 would be a serious problem.
-Kevin
PS: The bug Tim spotted makes a big difference too.
-- Kevin Jacobs The OPAL Group - Enterprise Systems Architect Voice: (216) 986-0710 x 19 E-mail: jacobs@theopalgroup.com Fax: (216) 986-0714 WWW: http://www.theopalgroup.com

"KJ" == Kevin Jacobs jacobs@penguin.theopalgroup.com writes:
KJ> You have an interesting idea, though I'd state the premise KJ> slightly differently. How about:
KJ> The premise is that the garbage collector tracks a lot of KJ> objects that will never participate in collectible cycles, KJ> because untraceable references are held. The idea is to avoid KJ> tracking these objects until it becomes possible for them to KJ> participate in a collectible cycle.
I guess the untraced reference to the current frame is the untraceable reference you refer to. The crucial issue is that local variables of the current frame can't be collected, so there's little point in tracking and traversing the objects they refer to.
I agree, of course, that the concern is collectible cycles.
KJ> (virtually any object _can_ participate in a cycle -- most just KJ> never do)
Right. There ought to be some way to exploit that.
KJ> Also, any thoughts on my approach? I have a hard time thinking KJ> of any situation that generates enough cyclic garbage where KJ> delaying collection until generation 1 would be a serious KJ> problem.
If I take your last statement literally, it sounds like we ought to avoid doing anything until an object gets to generation 1 <0.7 wink>.
Your suggestion seems to be that we should treat references from older generations to newer generations as external roots. So a cycle that spans generations will not get collected until everything is in the same generation. Indeed, that does not seem harmful.
On the other hand, it's hard to reconcile an intuitive notion of generation with what we're doing by running GC over and over as you add more elements to your list. It doesn't seem right that your list becomes an "old" object just because a single function allocates 100k young objects. That is, I wish the notion of generations accommodated a baby boom in a generation.
Jeremy

On Fri, 28 Jun 2002, Jeremy Hylton wrote:
Your suggestion seems to be that we should treat references from older generations to newer generations as external roots. So a cycle that spans generations will not get collected until everything is in the same generation. Indeed, that does not seem harmful.
Not really -- I'm saying that certain types of containers tend to hold references to other, much larger, containers. These small containers tend to be ephemoral -- they appear and disappear quickly -- but sometimes are unlucky enough to be around when a collection is triggered. In my example, the small containers were bound-method objects, which store back-references to their class instance, a huge list, which will live in generation 2 very quickly.
I do not advocate making objects store which generation they belong to, but rather to delay the traversal of certain containers until after generation 0. This means that they've been around the block a few times, and may have fallen in with a bad cyclical crowd.
This annotation should be added to objects that tend to shadow other containers, like bound-methods, iterators, generators, descriptors, etc.
In some tests using real workloads, I've found that upwards of 99% of these ephemoral objects never make it to a generation 1 collection anyway.
Haulin' garbage, -Kevin
-- Kevin Jacobs The OPAL Group - Enterprise Systems Architect Voice: (216) 986-0710 x 19 E-mail: jacobs@theopalgroup.com Fax: (216) 986-0714 WWW: http://www.theopalgroup.com

On 28 Jun 2002 at 16:07, Kevin Jacobs wrote:
In some tests using real workloads, I've found that upwards of 99% of these ephemoral objects never make it to a generation 1 collection anyway.
Um, the objects are ephemeral. You were probably thinking of Uncle Timmy.
-- Gordon http://www.mcmillan-inc.com/

[Jeremy, to Kevin Jacobs]
... Your suggestion seems to be that we should treat references from older generations to newer generations as external roots.
That's the way it works now: an object in gen N *is* an external root wrt any object it references in gen I with I < N.
So a cycle that spans generations will not get collected until everything is in the same generation.
Right, and that's what happens (already). When gen K is collected, all gens <= K are smushed into gen K at the start, and all trash cycles are collected except for those that contain at least one object in gen K+1 or higher.
Indeed, that does not seem harmful.
It hasn't been so far <wink>, although you can certainly construct cases where it causes an inconvenient delay in trash collection.
On the other hand, it's hard to reconcile an intuitive notion of generation with what we're doing by running GC over and over as you add more elements to your list. It doesn't seem right that your list becomes an "old" object just because a single function allocates 100k young objects. That is, I wish the notion of generations accommodated a baby boom in a generation.
I don't think you do. Pushing the parent object into an older generation is exactly what's supposed to save us from needing to scan all its children every time a gen0 collection occurs.
Under 2.2.1, Kevin's test case pushes "the list" into gen2 early on, and those of the list's children that existed at that time are never scanned again until another gen2 collection occurs. For a reason I still haven't determined, under current CVS "the whole list" is getting scanned by move_root_reachable() every time a gen0 collection occurs. It's also getting scanned by both subtract_refs() and move_root_reachable() every time a gen1 collection occurs. I'm not yet sure whether the mystery is why this happens in 2.3, or why it doesn't happen in 2.2.1 <0.5 wink>.

[Tim]
... I'm not yet sure whether the mystery is why this happens in 2.3, or why it doesn't happen in 2.2.1 <0.5 wink>.
Knock that down to 0.1 wink <0.3 wink>: Kevin's problem goes away in current CVS if I change the guard in visit_decref() to
if (IS_TRACKED(op) && !IS_MOVED(op)) ^^^^^^^^^^^^^^^^ added this
I've no real idea why, as 2.2.1 didn't need this to prevent "the list" from getting continually pulled back into a younger generation.
Without this change in current CVS, it looks like, in each gen0 collection:
a. The bound method object in gen0 knocks "the list"'s gc_refs down to -124 when visit_decref() is called by the bound method object traverse via subtract_refs(). Therefore IS_MOVED("the list") is no longer true.
b. move_root_reachable() then moves "the list" into the list of reachable things, because visit_move's has-always-been-there
if (IS_TRACKED(op) && !IS_MOVED(op)) {
guard doesn't believe "the list" has already been moved. vist_move then restores the list's gc_refs to the magic -123.
c. move_root_reachable() goes on to scan all of "the list"'s entries too.
d. "the list" itself gets moved into gen1, just because it's in the list of reachable things.
e. The next gen0 collection starts at #a again, and does the same stuff all over again.
Adding the new guard in visit_decref() breaks this at #a: IS_MOVED("the list") remains true, and so #b doesn't move "the list" into the set of reachable objects again, and so the list stays in whichever older generation it was in, and doesn't get scanned again (until the next gen2 traversal).
The mystery to me now is why the a,b,c,d,e loop didn't happen in 2.2.1.

[Tim]
... The mystery to me now is why the a,b,c,d,e loop didn't happen in 2.2.1.
Because 2.2.1 has a bug in PyCFunction_New(), which ends with
op->m_self = self; PyObject_GC_Init(op); return (PyObject *)op;
But also in 2.2.1,
/* This is here for the sake of backwards compatibility. Extensions that * use the old GC API will still compile but the objects will not be * tracked by the GC. */ #define PyGC_HEAD_SIZE 0 #define PyObject_GC_Init(op) #define PyObject_GC_Fini(op) #define PyObject_AS_GC(op) (op) #define PyObject_FROM_GC(op) (op)
IOW, PyObject_GC_Init(op) is a nop in 2.2.1, and the bound method object never gets tracked. Therefore the a,b,c,d,e loop never gets started.
In current CVS, the function ends with
op->m_self = self; _PyObject_GC_TRACK(op); return (PyObject *)op;
and a world of fun follows <wink>.

Another idea would be exploit the fact that we know most of the root objects (e.g. sys.modules and the current stack of frames). I haven't figured out a good use for this knowledge though.
Neil

On Fri, 28 Jun 2002, Neil Schemenauer wrote:
Another idea would be exploit the fact that we know most of the root objects (e.g. sys.modules and the current stack of frames). I haven't figured out a good use for this knowledge though.
If the root objects cannot be reached by the GC traversal, you get the approach that Jeremy is suggesting. (Though I just looked, and frame objects aren't exempt from tracking)
-Kevin
-- Kevin Jacobs The OPAL Group - Enterprise Systems Architect Voice: (216) 986-0710 x 19 E-mail: jacobs@theopalgroup.com Fax: (216) 986-0714 WWW: http://www.theopalgroup.com

"KJ" == Kevin Jacobs jacobs@penguin.theopalgroup.com writes:
KJ> On Fri, 28 Jun 2002, Neil Schemenauer wrote:
Another idea would be exploit the fact that we know most of the root objects (e.g. sys.modules and the current stack of frames). I haven't figured out a good use for this knowledge though.
KJ> If the root objects cannot be reached by the GC traversal, you KJ> get the approach that Jeremy is suggesting. (Though I just KJ> looked, and frame objects aren't exempt from tracking)
Right. My suggestion is to not track a set of objects that otherwise would be tracked -- the current frame and its local variables.
Jeremy

[Kevin Jacobs, working hard!]
I don't know what causes this. The little time I've been able to spend on it ended up finding an obvious buglet in some new-in-2.3 gcmodule code:
for (i = 0; i <= generation; i++) generations[generation].count = 0;
That was certainly intended to index by "i", not by "generation".
Fixing that makes the gc.DEBUG_STATS output less surprising, and cuts down on the number of collections, but doesn't really cure anything.
Note that bound methods in 2.2 also create new objects, etc; that was good deduction, but not yet good enough <wink>.

On Fri, 28 Jun 2002, Tim Peters wrote:
I don't know what causes this. The little time I've been able to spend on it ended up finding an obvious buglet in some new-in-2.3 gcmodule code:
for (i = 0; i <= generation; i++) generations[generation].count = 0;
That was certainly intended to index by "i", not by "generation".
Good catch! I missed that in spite of reading those lines 20 times.
Note that bound methods in 2.2 also create new objects, etc; that was good deduction, but not yet good enough <wink>.
That is why I added my "PS" about not looking into why it didn't blow up in Python 2.2. In reality, I did look, but only for 30 seconds, and then decided I didn't want to know.
-Kevin
-- Kevin Jacobs The OPAL Group - Enterprise Systems Architect Voice: (216) 986-0710 x 19 E-mail: jacobs@theopalgroup.com Fax: (216) 986-0714 WWW: http://www.theopalgroup.com

[Kevin Jacobs]
Nice job, Kevin! You learned a lot in a hurry here. I'll try to fill in some blanks.
... lst = [] for i in range(100000): lst.append( (1,) )
The key ingredients are:
A method is called on a container (rather than __setitem__ or __setattr__).
A new object is allocated while the method object lives on the Python VM stack, as shown by the disassembled bytecodes:
40 LOAD_FAST 1 (lst) 43 LOAD_ATTR 3 (append) 46 LOAD_CONST 2 (1) 49 BUILD_TUPLE 1 52 CALL_FUNCTION 1
These ingredients combine in the following way to trigger quadratic-time behavior in the Python garbage collector:
First, the LOAD_ATTR on "lst" for "append" is called, and a PyCFunction is returned from this code in descrobject.c:method_get:
return PyCFunction_New(descr->d_method, obj);
Thus, a _new_ PyCFunction is allocated every time the method is requested.
In outline, so far all that has been true since 0 AP (After Python).
- This new method object is added to generation 0 of the garbage collector, which holds a reference to "lst".
It's a bug in 2.2.1 that the method object isn't getting added to gen0. It is added in current CVS.
The BUILD_TUPLE call may then trigger a garbage collection cycle.
Since the "append" method is in generation 0,
Yes.
the reference traversal must also follow all objects within "lst", even if "lst" is in generation 1 or 2.
According to me, it's a bug that it does so in current CVS, and a bug that's been in cyclic gc forever. This kind of gc scheme isn't "supposed to" chase old objects (there's no point to doing so -- if there is a reclaimable cycle in the young generation, the cycle is necessarily composed of pure young->young pointers, so chasing a cross-generation pointer can't yield any useful info). It's not a *semantic* error if it chases old objects too, but it does waste time, and can (as it does here) yank old objects back to a younger generation. I attached a brief patch to your bug report that stops this.
This traversal requires time linear in the number of objects in "lst", thus increasing the overall time complexity of the code to quadratic in the number of elements in "lst".
Yes. Do note that this class of program is quadratic-time anyway, just because the rare gen2 traversals have to crawl over an ever-increasing lst too. BTW, the "range(100000)" in your test program also gets crawled over every time a gen2 collection occurs! That's why Neil made them rare <wink>.
Also note that this is a much more general problem than this small example.
Sure, although whether it's still "a real problem" after the patch is open to cost-benefit ridicule <wink>.
It can affect many types of objects in addition to methods, including descriptors, iterator objects, and any other object that contains a "back reference".
So, what can be done about this.... One simple solution would be to not traverse some "back references" if we are collecting objects in generation 0.
This will avoid traversing virtually all of these ephemoral objects that will trigger such expensive behavior. If they live long enough to pass through to generation one or two, then clearly they should be traversed.
So, what do all of you GC gurus think? Provided that my analysis is sound, I can rapidly propose a patch to demonstrate this approach if there is sufficient positive sentiment.
Seeing a patch is the only way I'd understand your intent. You can understand my intent by reading my patch <wink>.
...
PS: I have not looked into why this doesn't happen in Python 2.2.x or before.
It's a bug in 2.2.1 (well, two bugs, if Neil accepts my claim that the patch I put up "fixes a bug" too). In 2.1, method objects hadn't yet been added to cyclic gc.

On Sat, 29 Jun 2002, Tim Peters wrote:
Nice job, Kevin! You learned a lot in a hurry here. I'll try to fill in some blanks.
Thanks for the great sleuthing, Tim. I missed a few critical details about how the GC system was intended to work. It was not initially clear that most GC traversals were not recursive. i.e., I had assumed that functions like update_refs and subtract_refs did a DFS through all reachable references, instead of a shallow 1-level search. Of course, it all makes much more sense now.
Here are the results of my test program (attached to the SF bug) with and without your patch installed (2.3a0+ and 2.3a0-, respectively) and GC enabled:
N 20000 40000 80000 160000 240000 320000 480000 640000 Ver. -------- -------- -------- -------- -------- -------- -------- -------- 1.5.2 316450/s 345590/s 349609/s 342895/s 351352/s 353734/s 345362/s 350978/s 2.0 183723/s 192671/s 174146/s 151661/s 154592/s 127181/s 114903/s 99469/s 2.2.1 228553/s 234018/s 197809/s 166019/s 171306/s 137840/s 122835/s 105785/s 2.3a0- 164968/s 111752/s 68220/s 38129/s 26098/s 19678/s 13488/s 10396/s 2.3a0+ 291286/s 287168/s 284857/s 233244/s 196731/s 170759/s 135541/s 129851/s
There is still room for improvement, but overall I'm happy with the performance of 2.3a0+.
So, what do all of you GC gurus think? Provided that my analysis is sound, I can rapidly propose a patch to demonstrate this approach if there is sufficient positive sentiment.
Seeing a patch is the only way I'd understand your intent. You can understand my intent by reading my patch <wink>.
When functioning correctly, the current garbage collector already does what I was suggesting (in more generality, to boot). No need for a patch.
Thanks again, Tim. It was a lively chase through some of the strange and twisted innards of my favorite language.
Off to write boring code again, -Kevin
-- Kevin Jacobs The OPAL Group - Enterprise Systems Architect Voice: (216) 986-0710 x 19 E-mail: jacobs@theopalgroup.com Fax: (216) 986-0714 WWW: http://www.theopalgroup.com
participants (6)
-
Gordon McMillan
-
Jeremy Hylton
-
jeremy@zope.com
-
Kevin Jacobs
-
Neil Schemenauer
-
Tim Peters