On 24/05/17 21:56, Ben Hoyt wrote:
But interesting to know you got not much of a speedup!
I think that improvements at the hardware level in terms of parallelizing instruction and data fetching (and branch prediction) in even the cheapest processors these days have largely trivialized the amount of time it take the interpreter loop to read another opcode and branch to the code that executes it. I think that the answer to speeding things up is better algorithms at a higher level rather than micro-optimizations. But it's still fun to _try_ this sort of thing isn't it? ;)
I'm wondering if the simplest way to test would be to add the new opcodes and handle them in ceval.c, but then do all the bytecode manipulation in a pure Python peephole optimizer ("bytecode munger").
There used to be just such a thing in pure Python (but not with _new_ opcodes, obviously) - Skip Montanaro wrote it IIRC. Probably about 1997 or so. I think that may be where the current peephole optimizer originated (the way it works is certainly similar to the Python version I remember experimenting with back then). E.