[Python-Dev] Garbage collector problem

Tim Peters tim.one@comcast.net
Sat, 29 Jun 2002 08:22:28 -0400


[Kevin Jacobs]

Nice job, Kevin!  You learned a lot in a hurry here.  I'll try to fill in
some blanks.

> ...
> lst = []
> for i in range(100000):
>   lst.append( (1,) )
>
> The key ingredients are:
>
>   1) A method is called on a container (rather than __setitem__ or
>      __setattr__).
>
>   2) A new object is allocated while the method object lives on the Python
>      VM stack, as shown by the disassembled bytecodes:
>
>          40 LOAD_FAST                1 (lst)
>          43 LOAD_ATTR                3 (append)
>          46 LOAD_CONST               2 (1)
>          49 BUILD_TUPLE              1
>          52 CALL_FUNCTION            1
>
> These ingredients combine in the following way to trigger quadratic-time
> behavior in the Python garbage collector:
>
>   * First, the LOAD_ATTR on "lst" for "append" is called, and a
>     PyCFunction is returned from this code in descrobject.c:method_get:
>
>         return PyCFunction_New(descr->d_method, obj);
>
>     Thus, a _new_ PyCFunction is allocated every time the method is
>     requested.

In outline, so far all that has been true since 0 AP (After Python).

>   * This new method object is added to generation 0 of the garbage
>     collector, which holds a reference to "lst".

It's a bug in 2.2.1 that the method object isn't getting added to gen0.  It
is added in current CVS.

>   * The BUILD_TUPLE call may then trigger a garbage collection cycle.
>
>   * Since the "append" method is in generation 0,

Yes.

>     the reference traversal must also follow all objects within "lst",
>     even if "lst" is in generation 1 or 2.

According to me, it's a bug that it does so in current CVS, and a bug that's
been in cyclic gc forever.  This kind of gc scheme isn't "supposed to" chase
old objects (there's no point to doing so -- if there is a reclaimable cycle
in the young generation, the cycle is necessarily composed of pure
young->young pointers, so chasing a cross-generation pointer can't yield any
useful info).  It's not a *semantic* error if it chases old objects too, but
it does waste time, and can (as it does here) yank old objects back to a
younger generation.  I attached a brief patch to your bug report that stops
this.

>     This traversal requires time linear in the number of
>     objects in "lst", thus increasing the overall time complexity of the
>     code to quadratic in the number of elements in "lst".

Yes.  Do note that this class of program is quadratic-time anyway, just
because the rare gen2 traversals have to crawl over an ever-increasing lst
too.  BTW, the "range(100000)" in your test program also gets crawled over
every time a gen2 collection occurs!  That's why Neil made them rare <wink>.

> Also note that this is a much more general problem than this
> small example.

Sure, although whether it's still "a real problem" after the patch is open
to cost-benefit ridicule <wink>.

> It can affect many types of objects in addition to methods, including
> descriptors, iterator objects, and any other object that contains a "back
> reference".
>
> So, what can be done about this.... One simple solution would be to not
> traverse some "back references" if we are collecting objects in generation
> 0.
>
> This will avoid traversing virtually all of these ephemoral
> objects that will trigger such expensive behavior.  If they live long
> enough to pass through to generation one or two, then clearly they
> should be traversed.
>
> So, what do all of you GC gurus think?  Provided that my analysis
> is sound, I can rapidly propose a patch to demonstrate this approach if
> there is sufficient positive sentiment.

Seeing a patch is the only way I'd understand your intent.  You can
understand my intent by reading my patch <wink>.

> ...
>
> PS: I have not looked into why this doesn't happen in Python 2.2.x or
>     before.

It's a bug in 2.2.1 (well, two bugs, if Neil accepts my claim that the patch
I put up "fixes a bug" too).  In 2.1, method objects hadn't yet been added
to cyclic gc.