
Some stats about JUMP_IF_FALSE opcodes
Of the 2768 JUMP_IF_FALSE opcodes encountered, 2429 have a POP_TOP on both branches.
Id like to propose that JUMP_IF_FALSE consume the top-of-stack.
I'm all against changing existing opcodes for a minor speed-up. Real speed-up is available by a specializing compiler which turns things into real code.
If you are referring Psyco, I expect that it is several years before maturity. Currently it uses too much memory to be realistic.
If you really want to change the engine, I would consider to emit an extra opcode and try how the change performs against a couple of applications. I doubt the effect, since the little POP_TOP is pretty fast.
OTOH, compared to the work that POP_TOP does, the work of decoding the opcodes is significant.
Where you really can save some time is to shortcut some of the very short opcodes to not jump back to the ticker counting code, but into a shorter circle. This avoids quite some register moves and gives some local optimization possibilities.
Some of that is already done (these say 'continue' instead of 'break'). But I'm sure more can be done.
I would anyway suggest not to change the semantics of existing opcodes for just little win.
Why not? The opcodes are an internal detail of the PVM, and they change a bit with almost every Python version.
...
Id like to propose the following opcodes be added LOAD_CONST(NONE) LOAD_CONST(1) LOAD_CONST(0) LOAD_CONST(EMPTY_STR)
I'd be careful here, too. The interpreter loop is quite large, already, and there is a good chance to loose locality of reference by adding a little bit of code. I had that several times. You don't think you changed much, but where are these 10 percent gone now?
Agreed. This adds more cases to the switch and doesn't reduce the number of opcodes to be decoded (it only reduces the number of bytes per opcode, a very meagre gain indeed).
Not trying to demoralize you completely, but there are limits about what can be gathered by optimizing the interpreter loop. There was once the p2c project, which gave an overall improvement of 25-40 percent, by totally removing the interpreter loop.
Yes, that's an upper bound for what you can gain by fiddling upcodes. There are other gains possible though. The PVM isn't just the switch in ceval.c: it is also all the object implementations. While most are pretty lean, there's still fluff, e.g. in the lookup of builtins (SF patch 597907). --Guido van Rossum (home page: http://www.python.org/~guido/)