[pypy-dev] compiler optimizations: collecting ideas

Mon Nov 17 15:44:50 CET 2008

Sorry, I sent the unfinished mail by mistake.

>> 2) due to Python's semantics, it's not possible to just jump from one opcode
>> to the next, as we need to do a lot of bookkeeping, like remembering what
>> was the last line executed, etc.
>
> This is a more likely culprit.
> But... all languages that I know of (Java, compiled languages) just
> have a reverse map from bytecode positions to the original line
> information, to be used when and if they are needed (i.e. as debug
> info, or to unwind the stack when an exception is thrown I guess).
> Isn't that possible for Python?

Additionally, could you make some more examples?

> In any
[the unfinished sentence]
In any case, have you read "The Structure and Performance of Eﬃcient
Interpreters"?, by M. Ertl and D. Gregg? It's a paper that really
matters in practice, for fast interpreters (and if you don't believe
me, believe http://webkit.org/blog/189/announcing-squirrelfish/).

Basically, what it says is that in efficient interpreters, most of the
runtime cost is caused by indirect branches (they go as far as taking
that as a definition of "efficient interpreter"). And that applies
also to Scheme (which, I think, is not less dynamic than Python - any
arith instruction must still check the argument types and choose
between the fast path and throwing an error).

My first impression is that the problem was that the interpreter is
not yet optimized enough, and that comes from various hints:
- You do bookkeeping for the last executed line (unless tables prove
impossible to apply)
- We discussed about removing dictionary lookups for attributes little
time ago, and one PyPy developers said "it doesn't really matter, an
interpreter is so slow anyway".

This is the original quote from Armin Rigo
(http://codespeak.net/pipermail/pypy-dev/2008q4/004862.html):

"I don't know how much
speed-ups this would give; in fact it is rather unclear to me why people
think that dictionary lookups are seriously bad.  Of course they are
more expensive than just reading an item out of a list, but I don't
think it's worse than roughly 3 times slower; this number looks big in
the context of machine code, but insignificant in the context of a
complicated bytecode interpreter.  I'm ready to be proven wrong, of
course."

And anyway, Armin itself suggested there is space for optimizing the
interpreter.
-- 
Paolo Giarrusso