On Thu, Jan 22, 2009 at 10:59 PM, joe <joeedh@gmail.com> wrote:
...

* Use direct threading (which is basically optimizing switch statements to
 be only one or two instructions) for the bytecode loop.

fyi - http://bugs.python.org/issue4753 does this (at least when using gcc).

The optimization is not about removing instructions per se. but in getting rid of the single switch statement's unpredictable branch that causes modern cpus to stall while determining the correct place to jump rather than speculatively guessing correctly a significant portion of the time.
 
...

* Remove string lookups for member access entirely, and replaced with a
 system of unique identifyers.  The idea is you would use a hash in the
 types to map a member id to an index.  Hashing ints is faster then strings,
 and I've even thought about experimenting with using collapsed arrays instead
 of hashes.  Of course, the design would still need to support string lookups
 when necessary.  I've thought about this a lot, and I think you'd need the
 same general idea as V8's hidden classes for this to work right (though
 instead of classes, it'd just be member/unique id lookup maps).

Python strings are already immutable, their hash is computed only once. strings used in code as attributes are interned so that all occurances of that string are the same object making the most common lookups for attribute accesses a table lookup with a pointer equality check.


I'm not sure I'll have the time to anytime soon to prototype these ideas, but I
thought I'd kick them out there and see what people say.  Note, I'm in no way
suggesting any sort of change to the existing cpython VM (it's way, way too
early for that kind of talk).

Anyways, the ideas are great and are definately all things people are considering.  This should be an great year for python performance and all language VM performance in general.

happy hacking,
Greg