New subject: [Python-Dev] GC Changes

Oct. 1, 2007

      [This should be on python-ideas, so I'm replying to there instead of python-dev]

On 10/1/07, Justin Tulloss <tulloss2@uiuc.edu> wrote:
...
Hello,
I've been doing some tests on removing the GIL, and it's becoming clear that
some basic changes to the garbage collector may be needed in order for this
to happen efficiently. Reference counting as it stands today is not very
scalable.
I've been looking into a few options, and I'm leaning towards the
implementing IBMs recycler GC (
http://www.research.ibm.com/people/d/dfb/recycler-publications.html
) since it is very similar to what is in place now from the users'
perspective. However, I haven't been around the list long enough to really
understand the feeling in the community on GC in the future of the
interpreter. It seems that a full GC might have a lot of benefits in terms
of performance and scalability, and I think that the current gc module is of
the mark-and-sweep variety. Is the trend going to be to move away from
reference counting and towards the mark-and-sweep implementation that
currently exists, or is reference counting a firmly ingrained tradition?
Refcounting is fairly firmly ingrained in CPython, but there are
conservative GCs for C that mostly work, and other implementations
aren't so restricted.

The problem with Python is that it produces a *lot* of garbage.
Pystones on my box does around a million objects per second and fills
up available ram in about 10 seconds.  Not only do you need to collect
often enough to not fill up the ram, but for *good* performance you
need to collect often enough to keep your L1 cache hot.  That would
seem to demand a generational GC at least.

You might as well assume it'll be more expensive than refcounting[1].
The real advantage would be in scalability.  Concurrent, parallel GCs
are an active field of research though.  If you're really interested
you should research conservative GCs aimed at C in general, and only
minimally interact with CPython (such as to disable the custom
allocators.)

A good stepping off point is The Memory Management Reference (although
it looks like it hasn't been updated in the last few years).  If some
of my terms are unfamiliar to you, go start reading. ;)
http://www.memorymanagement.org/

[1] This statement is only in the context of CPython, of course.
There are certainly many situations where a tracing GC performs
better.

-- 
Adam Olsen, aka Rhamphoryncus

Re: [Python-ideas] [Python-Dev] GC Changes

Adam Olsen

Adam Olsen

Leonardo Santagada

Talin

Greg Ewing

Adam Olsen

tags

participants (4)