[pypy-dev] array performace?

Fri Jul 2 20:35:46 CEST 2010

Hi Paolo,

On 07/02/2010 02:08 PM, Paolo Giarrusso wrote:
>>> Unsupported claim is for example that fast interpreters are 10x
>>> slower than C.
> That's the only unsupported claim, but it comes from "The Structure
> and Performance of Eﬃcient Interpreters". I studied that as a
> student on VM, you are writing one, so I (unconsciously) guessed
> that everybody knows that paper - I know that's a completely broken
> way of writing, but I didn't spot it.

Even if something is claimed by a well-known paper, it doesn't 
necessarily have to be true. The paper considers a class of interpreters 
where each specific bytecode does very little work (the paper does not 
make this assumption explicit). This is not the case for Python at all, 
so I think that the conclusions of the paper don't apply directly.

This is explained quite clearly in the following paper:

Virtual-Machine Abstraction and Optimization Techniques by Stefan 
Brunthaler in Bytecode 2009.

[...]
> Well, at the abstraction level I'm speaking, it sounds like there in
> the end, the JIT will be able to do what is needed. I am not aware
> of the details. But then, at the end of that project, it seems to me
> that it should be possible to write the array module in pure Python
> using this new FFI interface and have the JIT look at it, shouldn't
> it? I do not concentrate on array specifically - rewriting a few
> modules at interpreter level is fine. But as a Python developer I
> should have no need for that.

That's a noble goal :-). I agree with the goal, but I still wanted to 
point out that the case of array is really quite outside of the range of 
possibilities of typical JIT compilers. Consider the hypothetical 
problem of having to write a pure-Python array module without using any 
other module, only builtin types. Then you would have to map arrays to 
be normal Python lists, and you would have no way to circumvent the fact 
that all objects in the lists are boxed. The JIT is now not helping you 
at all, because it only optimizes on a code level, and cannot change the 
way your data is structured in memory.

I know that this is not at all how you are proposing the array module 
should be written, but I still wanted to point out that current JITs 
don't help you much if your data is represented in a bad way. We have 
some ideas how data representations could be optimized at runtime, but 
nothing implemented yet.

Cheers,

Carl Friedrich