Leonardo Santagada santagada at gmail.com
Fri Jan 23 10:36:52 CET 2009

On Jan 23, 2009, at 4:59 AM, joe wrote:

> So, I've been kicking around some ideas for an optimized python VM.   
> I freely
> admit I'm an amateur at this, but I find the problem of making  
> python code
> run faster fascinating.  My ideas arose from observing that google  
> V8's JIT
> compiler and type system are much simpler compared to TraceMonkey,  
> but is
> also faster, and also learning that SquirrelFish is allegedy faster  
> than V8,
> even though it doesn't compile to native code at all (which V8 and I
> believe TraceMonkey both do).

I don't think TraceMonkey is slower than V8 I believe the last time I  
looked TraceMonkey was faster than V8 and it is becoming even faster  
at each interaction.

The way TraceMonkey works reminds me a bit of Psyco, although I might  
be mixing it with the PyPy JIT. But talking about Psyco, why people  
don't go help Psyco if all they want is a JIT? It is not like the idea  
of having a JIT on Python is even new... Psyco was optimizing code  
even before Webkit/V8 existed.

> This leads me to believe that relatively simple, more general  
> concepts in
> VM design can have a bigger impact then specific, highly complicated  
> solutions, in the context of dynamic languages that can't be easily  
> typed
> at compile time.

I believe in the exact oposite, and TraceMonkey is probably one of the  
proofs of that...

> So I've thought of a few ideas for a more (new) streamlined python VM:
> * Simplify the cpython object model as much as possible, while still  
> allowing
>  most of the power of the current model.

This would modify the language, so it might be interesting, but would  
generate something which is not Python.

> * Either keep referencing counting, or experiment with some of the  
> newer
>  techniques such as pointer escaping. Object models that exclusively  
> rely
>  on cyclic GC's have many issues and are hard to get right.

Don't know, but a good GC is way faster than what CPython is doing  
already, but maybe it is a good idea to explore some others  
perspectives on this.

> * Possibly modify the bytecode to be register-based, as in  
> SquirrelFish.
>  Not sure if this is worth it with python code.

Maybe it would help a bit. I don't think it would help more than 10%  
tops (but I am completely guessing here)

> * Use direct threading (which is basically optimizing switch  
> statements to
>  be only one or two instructions) for the bytecode loop.

The problem with this is (besides the error someone has already stated  
about your phrasing) that python has really complex bytecodes, so this  
would also only gain around 10% and it only works with compilers that  
accept goto labels which the MSVC for example does not (maybe there  
are more compilers that also doesn't).

> * Remove string lookups for member access entirely, and replaced  
> with a
>  system of unique identifyers.  The idea is you would use a hash in  
> the
>  types to map a member id to an index.  Hashing ints is faster then  
> strings,
>  and I've even thought about experimenting with using collapsed  
> arrays instead
>  of hashes.  Of course, the design would still need to support  
> string lookups
>  when necessary.  I've thought about this a lot, and I think you'd  
> need the
>  same general idea as V8's hidden classes for this to work right  
> (though
>  instead of classes, it'd just be member/unique id lookup maps).

A form of hidden classes is already part of PyPy (but I think that  
only the jit does this). But you can simply remove string lookups as  
people can implement special methods to track this on the current  
Python. As I said before I don't believe changing the semantics of  
python for the sake of performance is even possible.

> I'm not sure I'll have the time to anytime soon to prototype these  
> ideas, but I
> thought I'd kick them out there and see what people say.  Note, I'm  
> in no way
> suggesting any sort of change to the existing cpython VM (it's way,  
> way too
> early for that kind of talk).

If you are not talking about changing CPython VM why not look at Psyco  
and PyPy? :)

> references:
> v8's design: http://code.google.com/apis/v8/design.html
> squirrelfish's design:
> http://blog.mozilla.com/dmandelin/2008/06/03/squirrelfish/
> Joe

Leonardo Santagada
santagada at gmail.com

