[Python-Dev] Fwd: Removal of GIL through refcounting removal.

Sigurd Torkel Meldgaard stm at daimi.au.dk
Thu Oct 30 17:13:38 CET 2008


Hi to all Python developers

For a student project in a course on virtual machines, we are
evaluating the possibility to
experiment with removing the GIL from CPython

We have read the arguments against doing this at
http://www.python.org/doc/faq/library/#can-t-we-get-rid-of-the-global-interpreter-lock.

But we think it might be possible to do this with a different approach
than what has been tried till now.

The main reason for the necessity of the GIL is reference counting.

We believe that most of the slowdown in the free threading
implementation of Greg Stein was due to the need of atomic
refcounting, as this mail seems to confirm:
http://mail.python.org/pipermail/python-ideas/2007-April/000414.html

So we want to change CPython into having a "real" garbage collector -
removing all reference counting, and then the need for locks (or
atomic inc/dec ops) should be
highly alleviated.

Preferably the GC should be a high-performance one for instance a
generational one.

We believe that it can run quite a lot faster than ref-counting.

Shared datastructures would get their lock obviously.
Immutable objects (especially shared global objects, like True, False, Null)
would not.

Most of the interpreter structure would be per-thread, at that point.

We do not know how Greg Stein did his locking in the free threads
patch, but as a part of the course we learned there exists much faster
ways of locking than using OS-locks (faster for the uncontented case)
that are used in e.g. the HOT-SPOT java-compiler. This might make
"free threading" in python more attractive than some pessimists think.
(http://blogs.sun.com/dave/entry/biased_locking_in_hotspot)
In particular, we are talking about making the uncontended case go fast,
not about the independent part of stack-allocating the mutex
structure, which can only be done and is only needed in Java.

These ideas are similar to the ones used by Linux fast mutexes
(futexes), the implementation of mutexes in NPTL.

We have read this mail thread - so it seems that our idea surfaced,
but Greg didn't completely love it (he wanted to optimize refcounting
instead):
http://mail.python.org/pipermail/python-ideas/2007-April/000436.html

He was not totally negative however. His main objections are about:
- cache locality (He is in our opinion partially right, as seen in some
other paper time ago - any GC, copying GC in particular, doubles the
amount of used memory, so it's less cache-friendly). But still GCs are
overall competitive or faster than explicit management, and surely
much faster of refcounting.

We know it is the plan for PyPy to work in this way, and also that
Jython and Ironpython works like that (using the host vm's GC), so it
seems to be somehow agreeable with the python semantics (perhaps not
really with __del__ but they are not really nice anyway).

Was this ever tried for CPython?

Any other comments, encouragements or warnings on the project-idea?

Best regards: Paolo, Sigurd


More information about the Python-Dev mailing list