[issue4753] Faster opcode dispatch on gcc
Paolo 'Blaisorblade' Giarrusso
report at bugs.python.org
Fri Jan 2 03:53:58 CET 2009
Paolo 'Blaisorblade' Giarrusso <p.giarrusso at gmail.com> added the comment:
> I attached some additional benchmarks on SunOS. So far, it seems the
benefits of the proposed optimization are highly compiler-dependent.
Well, it would be more correct to say that as you verified for GCC 3.4,
"miscompilation" of the code happens easily.
Any literature research shows that threading in a fast interpreter does
help. My experience shows two exceptions to this rule:
a) bad compiler output
b) interpreters which are not efficient enough - when other operations
are even slower than instruction dispatch (which is really slow due to
costly mispredictions), threading can't help.
This is shown by the number of interpreters using threading.
Wikipedia has more pointers on this:
Note that what I called "indirect threading" is called there instead
Another example of the importance of threading is also shown in this
Some clues about why Python does not use threading:
It is important to note that people in that mail are not aware of why
threading gives a speedup.
For SunCC, I can't say anything without looking at:
a) the generated code; if jump targets were aligned only for switch but
not for computed gotos, for instance, that could maybe explain such a
slowdown. Lots of other details might be relevant.
b) performance counters results, especially regarding mispredictions of
Python tracker <report at bugs.python.org>
More information about the Python-bugs-list