[pypy-dev] array performace?

Bengt Richter bokr at oz.net
Sat Jul 3 00:56:39 CEST 2010


On 07/02/2010 11:35 AM Carl Friedrich Bolz wrote:
> Hi Paolo,
> 
> On 07/02/2010 02:08 PM, Paolo Giarrusso wrote:
>>>> Unsupported claim is for example that fast interpreters are 10x
>>>> slower than C.
>> That's the only unsupported claim, but it comes from "The Structure
>> and Performance of Efficient Interpreters". I studied that as a
>> student on VM, you are writing one, so I (unconsciously) guessed
>> that everybody knows that paper - I know that's a completely broken
>> way of writing, but I didn't spot it.
> 
> Even if something is claimed by a well-known paper, it doesn't 
> necessarily have to be true. The paper considers a class of interpreters 
> where each specific bytecode does very little work (the paper does not 
> make this assumption explicit). This is not the case for Python at all, 
> so I think that the conclusions of the paper don't apply directly.
> 
> This is explained quite clearly in the following paper:
> 
> Virtual-Machine Abstraction and Optimization Techniques by Stefan 
> Brunthaler in Bytecode 2009.
> 
> 
> [...]
>> Well, at the abstraction level I'm speaking, it sounds like there in
>> the end, the JIT will be able to do what is needed. I am not aware
>> of the details. But then, at the end of that project, it seems to me
>> that it should be possible to write the array module in pure Python
>> using this new FFI interface and have the JIT look at it, shouldn't
>> it? I do not concentrate on array specifically - rewriting a few
>> modules at interpreter level is fine. But as a Python developer I
>> should have no need for that.
> 
> That's a noble goal :-). I agree with the goal, but I still wanted to 
> point out that the case of array is really quite outside of the range of 
> possibilities of typical JIT compilers. Consider the hypothetical 
> problem of having to write a pure-Python array module without using any 
> other module, only builtin types. Then you would have to map arrays to 
> be normal Python lists, and you would have no way to circumvent the fact 
> that all objects in the lists are boxed. The JIT is now not helping you 
> at all, because it only optimizes on a code level, and cannot change the 
> way your data is structured in memory.
> 
> I know that this is not at all how you are proposing the array module 
> should be written, but I still wanted to point out that current JITs 
> don't help you much if your data is represented in a bad way. We have 
> some ideas how data representations could be optimized at runtime, but 
> nothing implemented yet.

A thought/question:

Could/does JIT make use of information in an assert statement? E.g., could we write
     assert set(type(x) for x in img) == set([float]) and len(img)==640*480
in front of a loop operating on img and have JIT use the info as assumed true
even when "if __debug__:" suites are optimized away?

Could such assertions allow e.g. a list to be implemented as a homogeneous vector
of unboxed representations?

What kind of guidelines for writing assertions would have to exist to make them
useful to JIT most easily?

Regards,
Bengt Richter




More information about the Pypy-dev mailing list