[This should be on python-ideas, so I'm replying to there instead of python-dev]
On 10/1/07, Justin Tulloss
Hello,
I've been doing some tests on removing the GIL, and it's becoming clear that some basic changes to the garbage collector may be needed in order for this to happen efficiently. Reference counting as it stands today is not very scalable.
I've been looking into a few options, and I'm leaning towards the implementing IBMs recycler GC ( http://www.research.ibm.com/people/d/dfb/recycler-publications.html ) since it is very similar to what is in place now from the users' perspective. However, I haven't been around the list long enough to really understand the feeling in the community on GC in the future of the interpreter. It seems that a full GC might have a lot of benefits in terms of performance and scalability, and I think that the current gc module is of the mark-and-sweep variety. Is the trend going to be to move away from reference counting and towards the mark-and-sweep implementation that currently exists, or is reference counting a firmly ingrained tradition?
Refcounting is fairly firmly ingrained in CPython, but there are conservative GCs for C that mostly work, and other implementations aren't so restricted. The problem with Python is that it produces a *lot* of garbage. Pystones on my box does around a million objects per second and fills up available ram in about 10 seconds. Not only do you need to collect often enough to not fill up the ram, but for *good* performance you need to collect often enough to keep your L1 cache hot. That would seem to demand a generational GC at least. You might as well assume it'll be more expensive than refcounting[1]. The real advantage would be in scalability. Concurrent, parallel GCs are an active field of research though. If you're really interested you should research conservative GCs aimed at C in general, and only minimally interact with CPython (such as to disable the custom allocators.) A good stepping off point is The Memory Management Reference (although it looks like it hasn't been updated in the last few years). If some of my terms are unfamiliar to you, go start reading. ;) http://www.memorymanagement.org/ [1] This statement is only in the context of CPython, of course. There are certainly many situations where a tracing GC performs better. -- Adam Olsen, aka Rhamphoryncus