[pypy-dev] Threaded interpretation (was: Re: compiler optimizations: collecting ideas)

Antonio Cuni anto.cuni at gmail.com
Wed Dec 24 22:29:33 CET 2008

Paolo Giarrusso wrote:

>> the question is: is it possible for a full python interpreter to be
>> "efficient" as you define it?
> Well, my guess is "if Prolog, Scheme and so on can, why can't Python"?

a possible answer is that python is much more complex than prolog; for 
example, in PyPy we also have an rpython implementation of both prolog and 
scheme (though I don't know how much complete is the latter one).

I quickly counted the number of lines for the interpreters, excluding the 
builtin types/functions, and we have 28188 non-empty lines for python, 5376 
for prolog and 1707 for scheme.

I know that the number of lines does not mean anything, but I think it's a 
good hint about the relative complexities of the languages.  I also know that 
being more complex does not necessarily mean that it's impossible to write an 
"efficient" interpreter for it, it's an open question.

Thanks for the interesting email, but unfortunately I don't have time to 
answer right now (xmas is coming :-)), I just drop few quick notes:

> And while you don't look like that, the mention of "tracking the last
> line executed" seemed quite weird.
> And even tracking the last bytecode executed looks weird, even if it
> is not maybe. I'm inspecting CPython's Python/ceval.c, and the
> overhead for instruction dispatch looks comparable.
> The only real problem I'm getting right now is committing the last
> bytecode executed to memory. If I store it into a local, I have no
> problem at all, if I store it into the interpreter context, it's a
> store to memory, so it hurts performance a lot - I'm still wondering
> about the right road to go.

by "tracking the last bytecode executed" I was really referring to the 
equivalent of f_lasti; are you sure you can store it in a local and still 
implement sys.settrace()?

> Ok, just done it, the speedup given by indirect threading seems to be
> about 18% (see also above). More proper benchmarks are needed though.

that's interesting, thanks for having tried. I wonder I should try again with 
indirect threading in pypy soon or later.
Btw, are the sources for your project available somewhere?

> And as you say in the other mail, the overhead given by dispatch is
> quite more than 50% (maybe). 

no, it's less. 50% is the total speedup given by geninterp, which removes 
dispatch overhead but also other things, like storing variables on the stack 
and turning python level flow control into C-level flow control (so e.g. loops 
are expressed as C loops).

> Am I correct in assuming that
> "geninterpret"ing _basically_ pastes the opcode handlers together? I
> guess with your infrastructure, you can even embed easily the opcode
> parameters inside the handlers, it's just a trivial partial evaluation

that's (part of) what our JIT is doing/will do.  But it does much more than 
that, of course.

Merry Christmas to you and all pypyers on the list!


More information about the Pypy-dev mailing list