![](https://secure.gravatar.com/avatar/cdc3cafa377f0e0e93fc69636021ef65.jpg?s=120&d=mm&r=g)
Paolo Giarrusso wrote:
specialized bytecode can be significant, I guess, only if the interpreter is really fast (either a threaded one, or a code-copying one). Is the PyPy interpreter threaded?
sometime ago I tried to measure if/how much we can gain with a threaded interpreter. I manually modified the produced C code to make the main loop threaded, but we didn't gain anything; I think there are three possible reasons: 1) in Python a lot of opcodes are quite complex and time-consuming, so the time spent to dispatch to them is a little percentage of the total time spent for the execution 2) due to Python's semantics, it's not possible to just jump from one opcode to the next, as we need to do a lot of bookkeeping, like remembering what was the last line executed, etc. This means that the trampolines at the end of each opcode contains a lot code duplication, leading to a bigger main loop, with possibly bad effects on the cache (didn't measure this, though) 3) it's possible that I did something wrong, so in that case my measurements are completely useless :-). If anyone wants to try again, it cannot hurt. ciao, Anto