[pypy-dev] Threaded interpretation (was: Re: compiler optimizations: collecting ideas)

Carl Friedrich Bolz cfbolz at gmx.de
Sun Dec 21 19:13:04 CET 2008

Hi Paolo,

Paolo Giarrusso wrote:
> after the completion of our student project, I have enough experience
> to say something more.
> We wrote in C an interpreter for a Python subset and we could make it
> much faster than the Python interpreter (50%-60% faster). That was due
> to the usage of indirect threading, tagged pointers and unboxed
> integers, and a real GC (a copying one, which is enough for the
> current benchmarks we have - a generational GC would be more realistic
> but we didn't manage to do it).

Interesting, but it sounds like you are comparing apples to oranges.
What sort of subset of Python are you implementing, i.e. what things
don't work? It has been shown time and time again that implementing only
a subset of Python makes it possible to get interesting speedups
compared to CPython. Then, as more and more features are implemented,
the difference gets smaller and smaller. This was true for a number of
Python implementations (e.g. IronPython).

I think to get really meaningful comparisons it would be good to modify
an existing Python implementation and compare that. Yes, I know this can
be a lot of work.

On your actual techniques used I don't have an opinion. I am rather sure
that a copying GC helped performance – it definitely did for PyPy.
Tagged pointers make PyPy slower, but then, we tag integers with 1, not
with 0. This could be changed, wouldn't even be too much work.

About better implementations of the bytecode dispatch I am unsure. Note
however, that a while ago we did measurements to see how large the
bytecode dispatch overhead is. I don't recall the exact number, but I
think it was below 10%. That means that even if you somehow manage to
reduce that to no overhead at all, you would still only get 10%
performance win.


Carl Friedrich

More information about the Pypy-dev mailing list