[pypy-dev] Threaded interpretation (was: Re: compiler optimizations: collecting ideas)

Thu Dec 25 00:42:18 CET 2008

Hi Armin,

first, thanks for your answer.

On Tue, Dec 23, 2008 at 16:01, Armin Rigo <arigo at tunes.org> wrote:
> On the whole you're making quite some efforts to get Python fast,
> starting with a subset of Python and adding feature after feature until
> it is more or less complete, while benchmarking at every step. This is
> not a new approach: it has been tried before for Python.  Usually, this
> kind of project ends up being not used and forgotten, because it's
> "only" 80% or 90% compatible but not 99% -- and people care much more,
> on average, about 99% compatibility than about 50% performance
> improvement.  PyPy on the other hand starts from the path of 99%
> compatibility and then tries to improve performance (which started as
> 10000 times slower... and is now roughly 1.5 or 2 times slower).

> Just saying that the approach is completely different...  And I have not
> much interest in it -- because you change the language and have to start
> again from scratch.  A strong point of PyPy is that you don't have to;
> e.g. we have, in addition to the Python interpreter, a JavaScript, a
> Smalltalk, etc...
Never said we're gonna turn this into a full-featured Python
interpreter, and to rewrite all the libraries for it.

So, just a few clarifications:

1) this is a _student project_ which is currently "completed" and has
been handed in, has been written by two students and was our first
interpreter ever (and for one of us, the first really big C project).
I knew that locals are faster than structure fields, but I had
absolutely no idea of why and how much, before starting experimenting
with this.

2) it is intended to be a way to learn how to write it, and a proof of
concept about how Python can be made faster. The first two things I'll
try to optimize are the assignment to ->f_lasti and addition of
indirect threading (even if right now I'd guess an impact around 5%,
if anything, because of refcounting).
If I'll want to try something without refcounting, I'll guess I'd turn
to PyPy, but don't hold your breath for that. The fact that indirect
threading didn't work, that you're 1.5-2x slower than CPython, and
that you store locals in frame objects, they all show that the
abstraction overhead of the interpret is too high. Since you have
different type of frame objects, I guess you might use virtuals to
access them (even though I hope not), or that you have anyhow some
virtuals. And that'd be a problem as well.

3) still, I do believe that working on it was interesting to get
experience about how to optimize an interpreter. And the original idea
was to show that real multithreading (without a global interpreter
lock) cannot be done in Python just because of the big design mistakes
of CPython.

Regards
-- 
Paolo Giarrusso