[pypy-dev] GIL removal in PyPy (was: Re: Threaded interpretation)

Paolo Giarrusso p.giarrusso at gmail.com
Sat Jan 3 07:18:22 CET 2009

On Fri, Jan 2, 2009 at 14:23, Armin Rigo <arigo at tunes.org> wrote:

> About PyPy, the lesson that we learned is different: it's that
> implementing a GIL-free model requires a bit of tweaking of the whole
> interpreter -- as opposed to doing it in a nicely separable manner.  So
> far we only implemented threads-with-a-GIL, which can be done in this
> separable way (i.e. without changing a single line of most of the
> interpreter).

> Given enough interest we can implement full GIL-free
> multithreading the slow way: by going over most features of the
> interpreter and thinking.

That'll take more time now than it would have taken to do it in the first place.

IMHO, introducing fine-grained locking is one of the things costing x
if done from the beginning and costing 10x to do afterwards. Switching
from refcounting to GC is another of those things (the factor here is
maybe even higher), and your EU report on GC acknowledges this when
talking about CPython's choice of refcounting. The location of all
pointers on the C stack would have to be registered to convert
So, I can understand what you mean by "nicely separable".

> Note that it would also be possible to do
> that in CPython (but it would be even more work).

That has been proven to be false :-).
It has been tried in 2004 (look for "Python free threading", I guess),
but free (i.e. GIL-less) threading requires atomic manipulation of
reference counts, and that gave a 2x overall slowdown; nowadays, the
relative impact would be even higher, just because . The reaction from
Guido van Rossum and/or the list (IIRC) was "ok, it can't be done,
we're not gonna have real multithreading, and there are better
programming models anyway".
There was, luckily, somebody on the mailing list who said "maybe we
should drop refcounting", but people didn't listen for some reason.

> It's just that doing so is not really "in the spirit" of PyPy :-)
> In our style, we would rather start by thinking harder about some
> unpolished ideas that float around about doing it in a separable
> manner anyway -- but this has not been tried at all.

That's interesting, can I get some pointers to these ideas?
I've heard about some work on this on the AOP front, but to me it
looks like their work is still not relevant, except for explaining us
that locking is a crosscutting concern, so I'd be interested about
something that could actually work.

And an _interpreter_ is not special at all for this, I think, it is
just another concurrent application (a compiler might be a different
case) - if you want to keep it simple, making everything in the data
space a monitor is the best bet. The problem is, that anything simple
will probably perform horribly and need to be optimized. And for me,
it's easier to write code in a much more sensible way from day one,
than optimizing it afterward.

=== Why AOP is broken for locking ===
I'm simply too used to advanced (and even not advanced) locking
pattern that can't be expressed in AOP. Given the following
independent snippets in an application:



how can an aspect introduce the calls to locks if they are omitted
from the source? I can't add an advice on a() or b(), the a()-b()
sequence must be atomic. And often I can't advice the containing
methods, unless it begins and ends with a locking call; obj might not
even be available at method entry. The only solution is to split each
critical sections out of the containing method into another one, but
that's not cheaper than AOP usage.

But the real challenge is how to implement, for instance, the
Collection.addAll(Collection b) Java method, if it has to lock both
dictionaries. Thread A might call a.addAll(b) while thread B calls
b.addAll(a) and any thread might attempt any concurrent modification
on any of a and b. So, which is the correct locking order to avoid

If I had really to face this problem (this particular example is
partially unrealistic - locking for addAll should be a responsibility
of the caller, probably), the only solution is to find some total
order on all collections and lock first the smaller object. In C, I
would compare their pointers, here I'd have to find some trick (like
comparing their identityHashCode() in Java).

I've been used to similar issues while coding in the Linux kernel, and
that's why I think that the locking concern should not be separated
from the application code. Unless one uses message passing, but I
can't see any way to implement multithreading by using message passing
that is not horribly inefficient.

The first Google hit on "AOP locking" is this article:
and the example it makes is just too simple and unrealistic - that
pattern might apply to a minority of real-world cases. The author
seems unaware of real issues in concurrent code.

Moreover, I'm concerned about the quality of the article. When
discussing the "classic" OOP solution, it makes a really basic mistake
about the usage of inheritance. What follows is a copy-n-paste, with
added emphasis:
"The traditional approach is to introduce an (abstract) base class,
from which ALL "WORKER" CLASSES INHERIT. This class defines a method
lock() and a method unlock() which must be called before and after the
actual work is done (semaphores, basically)."
In fact, the lock() and unlock() method are not part of the API of the
Worker classes, so one would just use an instance of such a Lock
class, not inherit from it. And even writing "semaphore" is slightly

This article is much better:
still, it shares the same fundamental limitations. All method in a
class share the same lock, and locking only happens on method
boundaries. In fact, the article seems to acknowledge that AOP is just
a cheap way to introduce really simplistic locking.

Having said that, I'm really interested to see how JBoss uses AOP for
locking. But those problems seem really fundamental to me, so probably
just the simple cases are handled through AOP.

Paolo Giarrusso

More information about the Pypy-dev mailing list