[Python-Dev] opcode dispatch optimization

Wed Dec 31 14:47:30 CET 2008

Hello,

I would like to mention that I've written a patch which enables "threaded
interpretation" on the ceval loop with gcc (*). On my computer (an Athlon X2
3600+), it is good for a 15-20% speedup of the interpreter on pystone and
pybench. I also had the opportunity to test it on a Core2-derived CPU, where it
doesn't make a difference (I conjecture it's because Core2 CPUs have
hardware-based indirect branch optimizations). It will make no difference if the
interpreter is compiled with something else than gcc (I tested on Windows).

The additional complexity is very small. There's a separate script which is run
to build the dispatch table (only if needed, that is if dis.py has been
modified). In ceval.c, there are a couple of macros and some #ifdef's. That's
all. It breaks no test in the regression suite.

Could other people test and report their results here? (the patch is for py3k,
btw). Also, what are you thoughts for/against integrating this patch in the
standard interpreter?

Regards

Antoine.

(*) please note: it has nothing to see with multithreading.