[Python-Dev] GC Changes

Jeremy Hylton jeremy at alum.mit.edu
Mon Oct 1 15:45:04 CEST 2007


On 10/1/07, Justin Tulloss <tulloss2 at uiuc.edu> wrote:
> Hello,
>
> I've been doing some tests on removing the GIL, and it's becoming clear that
> some basic changes to the garbage collector may be needed in order for this
> to happen efficiently. Reference counting as it stands today is not very
> scalable.
>
> I've been looking into a few options, and I'm leaning towards the
> implementing IBMs recycler GC (
> http://www.research.ibm.com/people/d/dfb/recycler-publications.html
> ) since it is very similar to what is in place now from the users'
> perspective. However, I haven't been around the list long enough to really
> understand the feeling in the community on GC in the future of the
> interpreter. It seems that a full GC might have a lot of benefits in terms
> of performance and scalability, and I think that the current gc module is of
> the mark-and-sweep variety. Is the trend going to be to move away from
> reference counting and towards the mark-and-sweep implementation that
> currently exists, or is reference counting a firmly ingrained tradition?
>
> On a more immediately relevant note, I'm not certain I understand the full
> extent of the gc module. From what I've read, it sounds like it's fairly
> close to a fully functional GC, yet it seems to exist only as a
> cycle-detecting backup to the reference counting mechanism. Would somebody
> care to give me a brief overview on how the current gc module interacts with
> the interpreter, or point me to a place where that is done? Why isn't the
> mark-and-sweep mechanism used for all memory management?

There are probably lots of interesting things to say about the gc
cycle collector, but I'll just pick a few things that seem relevant.
First off, the gc doesn't have a root set.  It traces all the
container objects, subtracting known references from the refcount, and
is left with a root set, i.e. those objects that have some references
that can't be accounted for among the known container objects.  (see
update_refs and substract_refs)  In the end, we make three traversals
of the heap to detect the objects that appear to be unreachable.
(move_unreachable is the third.)

The cycle detection relies on having the reference counts correct, so
it doesn't really represent a move away from refcounting.

I skipped the generations.  The GC divides the heap into three
generations and tends to focus on the youngest generation.  So that
limits the portion of the heap that is scanned, but I don't understand
the magnitude of that effect in practice.

The current collector works in collaboration with ref counting.  In
particular, refcounting probably handles the majority of
deallocations.  If the cycle detection system were responsible for all
deallocations, the gc module would have a lot more work to do.

I do think it would be interesting to experiment with the recycler
approach, but I think it would take a lot of work to do a decent
experiment.  But please give it a shot!

Jeremy


More information about the Python-Dev mailing list