opcode dispatch optimization

Hello,
I would like to mention that I've written a patch which enables "threaded interpretation" on the ceval loop with gcc (*). On my computer (an Athlon X2 3600+), it is good for a 15-20% speedup of the interpreter on pystone and pybench. I also had the opportunity to test it on a Core2-derived CPU, where it doesn't make a difference (I conjecture it's because Core2 CPUs have hardware-based indirect branch optimizations). It will make no difference if the interpreter is compiled with something else than gcc (I tested on Windows).
The additional complexity is very small. There's a separate script which is run to build the dispatch table (only if needed, that is if dis.py has been modified). In ceval.c, there are a couple of macros and some #ifdef's. That's all. It breaks no test in the regression suite.
Could other people test and report their results here? (the patch is for py3k, btw). Also, what are you thoughts for/against integrating this patch in the standard interpreter?
Regards
Antoine.
(*) please note: it has nothing to see with multithreading.

Antoine Pitrou <solipsis <at> pitrou.net> writes:
I would like to mention that I've written a patch which enables "threaded interpretation"
... and I forgot to give the URL: http://bugs.python.org/issue4753
Regards
Antoine.

Antoine Pitrou wrote:
I would like to mention that I've written a patch which enables "threaded interpretation" on the ceval loop with gcc (*). On my computer (an Athlon X2 3600+), it is good for a 15-20% speedup of the interpreter on pystone and pybench. I also had the opportunity to test it on a Core2-derived CPU, where it doesn't make a difference (I conjecture it's because Core2 CPUs have hardware-based indirect branch optimizations). It will make no difference if the interpreter is compiled with something else than gcc (I tested on Windows).
The patch makes use of a GCC feature where labels can be used as values: http://gcc.gnu.org/onlinedocs/gcc/Labels-as-Values.html . I didn't know about the feature and got confused by the unary && operator.
A happy new your to you all!
Christian

On Wed, Dec 31, 2008 at 11:44 AM, Christian Heimes lists@cheimes.de wrote:
The patch makes use of a GCC feature where labels can be used as values: http://gcc.gnu.org/onlinedocs/gcc/Labels-as-Values.html . I didn't know about the feature and got confused by the unary && operator.
Right. SpiderMonkey (Mozilla's JavaScript interpreter) does this, and it was good for a similar win on platforms that use GCC. (It took me a while to figure out why it was so much faster, so I think this patch would be better with a few very specific comments!)
SpiderMonkey calls this optimization "threaded code" too, but this isn't the standard meaning of that term. See: http://en.wikipedia.org/wiki/Threaded_code
-j

On Wed, 2008-12-31 at 12:51 -0600, Jason Orendorff wrote:
On Wed, Dec 31, 2008 at 11:44 AM, Christian Heimes lists@cheimes.de wrote:
The patch makes use of a GCC feature where labels can be used as values: http://gcc.gnu.org/onlinedocs/gcc/Labels-as-Values.html . I didn't know about the feature and got confused by the unary && operator.
Right. SpiderMonkey (Mozilla's JavaScript interpreter) does this, and it was good for a similar win on platforms that use GCC. (It took me a while to figure out why it was so much faster, so I think this patch would be better with a few very specific comments!)
SpiderMonkey calls this optimization "threaded code" too, but this isn't the standard meaning of that term. See: http://en.wikipedia.org/wiki/Threaded_code
FWIW, it's also explained pretty well in the first pages of [1]. WebKit's SquirrelFish is direct-threaded as well [2].
Nicolas
[1] http://citeseer.ist.psu.edu/cache/papers/cs/32018/http:zSzzSzwww.jilp.orgzSz... [2] http://webkit.org/blog/189/announcing-squirrelfish/
participants (4)
-
Antoine Pitrou
-
Christian Heimes
-
Jason Orendorff
-
Nicolas Trangez