[Python-Dev] Re: [Patches] Reference cycle collection for Python

Vladimir Marangozov Vladimir.Marangozov@inrialpes.fr
Wed, 1 Mar 2000 18:07:07 +0100 (CET)

Guido van Rossum wrote:
> Thanks for the new patches, Neil!

Thanks from me too!
I notice, however, that hash_resize() still uses a malloc call
instead of PyMem_NEW. Neil, please correct this in your version
immediately ;-)

> We had a visitor here at CNRI today, Eric Tiedemann
> <est@hyperreal.org>, who had a look at your patches before.  Eric
> knows his way around the Scheme, Lisp and GC literature, and presented
> a variant on your approach which takes the bite out of the recursive
> passes.

Avoiding the recursion is valuable, as long we're optimizing the
implementation of one particular scheme. It doesn't bother me that
Neil's scheme is recursive, because I still perceive his code as a
proof of concept.

You're presenting here another scheme based on refcounts arithmetic,
generalized for all container types. The linked list implementation
of this generalized scheme is not directly related to the logic.

I have some suspitions on the logic, so you'll probably want to elaborate
a bit more on it, and convince me that this scheme would actually work.

> Today, Eric proposed to do away with Neil's hash table altogether --
> as long as we're wasting memory, we might as well add 3 fields to each
> container object rather than allocating the same amount in a separate
> hash table.

I cannot agree so easily with this statement, but you should have expecting
this from me :-)  If we're about to opimize storage, I have good reasons
to believe that we don't need 3 additional slots per container (but 1 for
gc_refs, yes).

We could certainly envision allocating the containers within memory pools
of 4K (just as it is done in pymalloc, and close to what we have for
ints & floats). These pools would be labaled as "container's memory",
they would obviously be under our control, and we'd have additional slots
per pool, not per object. As long as we isolate the containers from the
rest, we can enumerate them easily by walking though the pools.

But I'm willing to defer this question for now, as it involves the object
allocators (the builtin allocators + PyObject_NEW for extension types E --
user objects of type E would be automatically taken into account for GC
if there's a flag in the type struct which identifies them as containers).

> Eric expects that this will run faster, although this obviously needs
> to be tried.

Definitely, although I trust Eric & Tim :-)

> Container types are: dict, list, tuple, class, instance; plus
> potentially user-defined container types such as kjbuckets.  I have a
> feeling that function objects should also be considered container
> types, because of the cycle involving globals.

+ other extension container types. And I insist.
Don't forget that we're planning to merge types and classes...

> Eric's algorithm, then, consists of the following parts.
> Each container object has three new fields: gc_next, gc_prev, and
> gc_refs.  (Eric calls the gc_refs "refcount-zero".)
> We color objects white (initial), gray (root), black (scanned root).
> (The terms are explained later; we believe we don't actually need bits
> in the objects to store the color; see later.)
> All container objects are chained together in a doubly-linked list --
> this is the same as Neil's code except Neil does it only for dicts.
> (Eric postulates that you need a list header.)
> When GC is activated, all objects are colored white; we make a pass
> over the entire list and set gc_refs equal to the refcount for each
> object.

Step 1:  for all containers, c->gc_refs = c->ob_refcnt

> Next, we make another pass over the list to collect the internal
> references.  Internal references are (just like in Neil's version)
> references from other container types.  In Neil's version, this was
> recursive; in Eric's version, we don't need recursion, since the list
> already contains all containers.  So we simple visit the containers in
> the list in turn, and for each one we go over all the objects it
> references and subtract one from *its* gc_refs field.  (Eric left out
> the little detail that we ened to be able to distinguish between
> container and non-container objects amongst those references; this can
> be a flag bit in the type field.)

Step 2:  c->gc_refs = c->gc_refs - Nb_referenced_containers_from_c

I guess that you realize that after this step, gc_refs can be zero
or negative.

I'm not sure that you collect "internal" references here (references
from other container types). A list referencing 20 containers, being
itself referenced by one container + one static variable + two times
from the runtime stack, has an initial refcount == 4, so we'll end
up with gc_refs == -16.

A tuple referencing 1 list, referenced once by the stack, will end up
with gc_refs == 0.

Neil's scheme doesn't seem to have this "property".

> Now, similar to Neil's version, all objects for which gc_refs == 0
> have only internal references, and are potential garbage; all objects
> for which gc_refs > 0 are "roots".  These have references to them from
> other places, e.g. from globals or stack frames in the Python virtual
> machine.

Agreed, some roots have gc_refs > 0
I'm not sure that all of them have it, though... Do they?

> We now start a second list, to which we will move all roots.  The way
> to do this is to go over the first list again and to move each object
> that has gc_refs > 0 to the second list.  Objects placed on the second
> list in this phase are considered colored gray (roots).

Step 3: Roots with gc_refs > 0 go to the 2nd list.
        All c->gc_refs <= 0 stay in the 1st list.

> Of course, some roots will reference some non-roots, which keeps those
> non-roots alive.  We now make a pass over the second list, where for
> each object on the second list, we look at every object it references.
> If a referenced object is a container and is still in the first list
> (colored white) we *append* it to the second list (colored gray).
> Because we append, objects thus added to the second list will
> eventually be considered by this same pass; when we stop finding
> objects that sre still white, we stop appending to the second list,
> and we will eventually terminate this pass.  Conceptually, objects on
> the second list that have been scanned in this pass are colored black
> (scanned root); but there is no need to to actually make the
> distinction.

Step 4: Closure on reachable containers which are all moved to the 2nd list.

(Assuming that the objects are checked only via their type, without
involving gc_refs)

> (How do we know whether an object pointed to is white (in the first
> list) or gray or black (in the second)?

Good question? :-)

> We could use an extra  bitfield, but that's a waste of space.
> Better: we could set gc_refs to a magic value (e.g. 0xffffffff) when
> we move the object to the second list.

I doubt that this would work for the reasons mentioned above.

> During the meeting, I proposed to set the back pointer to NULL; that
> might work too but I think the gc_refs field is more elegant. We could
> even just test for a non-zero gc_refs field; the roots moved to the
> second list initially all have a non-zero gc_refs field already, and
> for the objects with a zero gc_refs field we could indeed set it to
> something arbitrary.)

Not sure that "arbitrary" is a good choice if the differentiation
is based solely on gc_refs.

> Once we reach the end of the second list, all objects still left in
> the first list are garbage.  We can destroy them in a similar to the
> way Neil does this in his code.  Neil calls PyDict_Clear on the
> dictionaries, and ignores the rest.  Under Neils assumption that all
> cycles (that he detects) involve dictionaries, that is sufficient.  In
> our case, we may need a type-specific "clear" function for containers
> in the type object.

Couldn't this be done in the object's dealloc function?

Note that both Neil's and this scheme assume that garbage _detection_
and garbage _collection_ is an atomic operation. I must say that
I don't care of having some living garbage if it doesn't hurt my work.
IOW, the used criterion for triggering the detection phase _may_ eventually
differ from the one used for the collection phase. But this is where we
reach the incremental approaches, implying different reasoning as a
whole. My point is that the introduction of a "clear" function depends
on the adopted scheme, whose logic depends on pertinent statistics on
memory consumption of the cyclic garbage.

To make it simple, we first need stats on memory consumption, then we
can discuss objectively on how to implement some particular GC scheme.
I second Eric on the need for excellent statistics.

> The general opinion was that we should first implement and test the
> algorithm as sketched above, and then changes or extensions could be
> made.

I'd like to see it discussed first in conjunction with (1) the possibility
of having a proprietary malloc, (2) the envisioned type/class unification.
Perhaps I'm getting too deep, but once something gets in, it's difficult
to take it out, even when a better solution is found subsequently. Although
I'm enthousiastic about this work on GC, I'm not in a position to evaluate
the true benefits of the proposed schemes, as I still don't have a basis
for evaluating how much garbage my program generates and whether it hurts
the interpreter compared to its overal memory consumption.

> I was pleasantly surprised to find Neil's code in my inbox when we
> came out of the meeting; I think it would be worthwhile to compare and
> contrast the two approaches.  (Hm, maybe there's a paper in it?)

I'm all for it!

       Vladimir MARANGOZOV          | Vladimir.Marangozov@inrialpes.fr
http://sirac.inrialpes.fr/~marangoz | tel:(+33-4)76615277 fax:76615252