GIL removal in PyPy (was: Re: Threaded interpretation)
On Fri, Jan 2, 2009 at 14:23, Armin Rigo <arigo@tunes.org> wrote:
About PyPy, the lesson that we learned is different: it's that implementing a GIL-free model requires a bit of tweaking of the whole interpreter -- as opposed to doing it in a nicely separable manner. So far we only implemented threads-with-a-GIL, which can be done in this separable way (i.e. without changing a single line of most of the interpreter).
Given enough interest we can implement full GIL-free multithreading the slow way: by going over most features of the interpreter and thinking.
That'll take more time now than it would have taken to do it in the first place. IMHO, introducing fine-grained locking is one of the things costing x if done from the beginning and costing 10x to do afterwards. Switching from refcounting to GC is another of those things (the factor here is maybe even higher), and your EU report on GC acknowledges this when talking about CPython's choice of refcounting. The location of all pointers on the C stack would have to be registered to convert CPython. So, I can understand what you mean by "nicely separable".
Note that it would also be possible to do that in CPython (but it would be even more work).
That has been proven to be false :-). It has been tried in 2004 (look for "Python free threading", I guess), but free (i.e. GIL-less) threading requires atomic manipulation of reference counts, and that gave a 2x overall slowdown; nowadays, the relative impact would be even higher, just because . The reaction from Guido van Rossum and/or the list (IIRC) was "ok, it can't be done, we're not gonna have real multithreading, and there are better programming models anyway". There was, luckily, somebody on the mailing list who said "maybe we should drop refcounting", but people didn't listen for some reason.
It's just that doing so is not really "in the spirit" of PyPy :-) In our style, we would rather start by thinking harder about some unpolished ideas that float around about doing it in a separable manner anyway -- but this has not been tried at all.
That's interesting, can I get some pointers to these ideas? I've heard about some work on this on the AOP front, but to me it looks like their work is still not relevant, except for explaining us that locking is a crosscutting concern, so I'd be interested about something that could actually work. And an _interpreter_ is not special at all for this, I think, it is just another concurrent application (a compiler might be a different case) - if you want to keep it simple, making everything in the data space a monitor is the best bet. The problem is, that anything simple will probably perform horribly and need to be optimized. And for me, it's easier to write code in a much more sensible way from day one, than optimizing it afterward. === Why AOP is broken for locking === I'm simply too used to advanced (and even not advanced) locking pattern that can't be expressed in AOP. Given the following independent snippets in an application: LOCK(obj) obj.a() obj.b() UNLOCK(obj) LOCK(obj) obj.b() obj.c() UNLOCK(obj) how can an aspect introduce the calls to locks if they are omitted from the source? I can't add an advice on a() or b(), the a()-b() sequence must be atomic. And often I can't advice the containing methods, unless it begins and ends with a locking call; obj might not even be available at method entry. The only solution is to split each critical sections out of the containing method into another one, but that's not cheaper than AOP usage. But the real challenge is how to implement, for instance, the Collection.addAll(Collection b) Java method, if it has to lock both dictionaries. Thread A might call a.addAll(b) while thread B calls b.addAll(a) and any thread might attempt any concurrent modification on any of a and b. So, which is the correct locking order to avoid deadlock? If I had really to face this problem (this particular example is partially unrealistic - locking for addAll should be a responsibility of the caller, probably), the only solution is to find some total order on all collections and lock first the smaller object. In C, I would compare their pointers, here I'd have to find some trick (like comparing their identityHashCode() in Java). I've been used to similar issues while coding in the Linux kernel, and that's why I think that the locking concern should not be separated from the application code. Unless one uses message passing, but I can't see any way to implement multithreading by using message passing that is not horribly inefficient. The first Google hit on "AOP locking" is this article: http://www.voelter.de/data/articles/aop/aop.html and the example it makes is just too simple and unrealistic - that pattern might apply to a minority of real-world cases. The author seems unaware of real issues in concurrent code. Moreover, I'm concerned about the quality of the article. When discussing the "classic" OOP solution, it makes a really basic mistake about the usage of inheritance. What follows is a copy-n-paste, with added emphasis: "The traditional approach is to introduce an (abstract) base class, from which ALL "WORKER" CLASSES INHERIT. This class defines a method lock() and a method unlock() which must be called before and after the actual work is done (semaphores, basically)." In fact, the lock() and unlock() method are not part of the API of the Worker classes, so one would just use an instance of such a Lock class, not inherit from it. And even writing "semaphore" is slightly incorrect. This article is much better: http://www.kentlai.me/articles-journal/locks-via-spring-aop.html still, it shares the same fundamental limitations. All method in a class share the same lock, and locking only happens on method boundaries. In fact, the article seems to acknowledge that AOP is just a cheap way to introduce really simplistic locking. Having said that, I'm really interested to see how JBoss uses AOP for locking. But those problems seem really fundamental to me, so probably just the simple cases are handled through AOP. Regards -- Paolo Giarrusso
On Jan 3, 2009, at 4:18 AM, Paolo Giarrusso wrote:
There was, luckily, somebody on the mailing list who said "maybe we should drop refcounting", but people didn't listen for some reason.
You repeated this meme many times in your emails, so I thought that maybe you really didn't see the full picture. This is what I understand from the reasoning behind it. Dropping refcounting and move to free threading would completely break all C modules so they would have to be rewritten and would make the CPython API much more complex and integration with C libraries hard. That's why no one took it seriously. Think like this, breaking all c modules would make CPython as usable as haskell :), or just look at the number of libraries not available right now for Python 3.0. It is not some retarded choice made by GvR, but a pragmatic one. Python as a language used by millions of people can't completely change semantics from version to version. -- Leonardo Santagada santagada at gmail.com
Hi Leonardo, Leonardo Santagada wrote:
On Jan 3, 2009, at 4:18 AM, Paolo Giarrusso wrote:
There was, luckily, somebody on the mailing list who said "maybe we should drop refcounting", but people didn't listen for some reason.
You repeated this meme many times in your emails, so I thought that maybe you really didn't see the full picture. This is what I understand from the reasoning behind it.
Dropping refcounting and move to free threading would completely break all C modules so they would have to be rewritten and would make the CPython API much more complex and integration with C libraries hard. That's why no one took it seriously. Think like this, breaking all c modules would make CPython as usable as haskell :), or just look at the number of libraries not available right now for Python 3.0.
It is not some retarded choice made by GvR, but a pragmatic one. Python as a language used by millions of people can't completely change semantics from version to version.
While I understand the reasoning above, I don't agree with the last paragraph. Switching to a different GC doesn't lead to that huge a change in semantics. There are subtle difference about finalizers (see the blog posts about that: http://morepypy.blogspot.com/2008/02/python-finalizers-semantics-part-1.html http://morepypy.blogspot.com/2008/02/python-finalizers-semantics-part-2.html It definitely leads to a different C API, of course. Thus your above points remain valid. Cheers, Carl Friedrich
Hi Paolo, On Sat, Jan 03, 2009 at 07:18:22AM +0100, Paolo Giarrusso wrote:
Given enough interest we can implement full GIL-free multithreading the slow way: by going over most features of the interpreter and thinking.
That'll take more time now than it would have taken to do it in the first place.
Yes, so we probably won't do it (except if we can find another orthogonal technique to try, i.e. a separable way, which is only at the state of vague discussion so far).
Note that it would also be possible to do that in CPython (but it would be even more work).
That has been proven to be false :-).
Wrong. For example, Jython is such an example, showing that we "can" do it even in C (becauce anything possible in Java is possible in C, given enough efforts). I know it has been tried several times over the course of the CPython development, but never succeeded. That's why I'm saying it would be even more work in CPython than in PyPy. A bientot, Armin.
Hi Armin, On Sat, Jan 3, 2009 at 16:33, Armin Rigo <arigo@tunes.org> wrote:
Note that it would also be possible to do that in CPython (but it would be even more work).
That has been proven to be false :-).
Wrong. For example, Jython is such an example, showing that we "can" do it even in C (becauce anything possible in Java is possible in C, given enough efforts). I know it has been tried several times over the course of the CPython development, but never succeeded. That's why I'm saying it would be even more work in CPython than in PyPy.
Obviously what can be done in Java can be done in C, but atomic refcount manipulation is too expensive, so that, again, would require dropping refcounting; in fact Jython has no refcounting. Regards -- Paolo Giarrusso
participants (4)
-
Armin Rigo
-
Carl Friedrich Bolz
-
Leonardo Santagada
-
Paolo Giarrusso