Re: [Python-Dev] Bytecode analysis

Feb. 25, 2003

      ...
...
Some stats about JUMP_IF_FALSE opcodes
Of the 2768 JUMP_IF_FALSE opcodes encountered, 2429 have a POP_TOP on
both branches.
Id like to propose that JUMP_IF_FALSE consume the top-of-stack.
I'm all against changing existing opcodes for a minor
speed-up. Real speed-up is available by a specializing
compiler which turns things into real code.
If you are referring Psyco, I expect that it is several years before
maturity.  Currently it uses too much memory to be realistic.
...
If you really want to change the engine, I would consider
to emit an extra opcode and try how the change performs
against a couple of applications. I doubt the effect, since
the little POP_TOP is pretty fast.
OTOH, compared to the work that POP_TOP does, the work of decoding the
opcodes is significant.
...
Where you really can save some time is to shortcut some
of the very short opcodes to not jump back to the ticker
counting code, but into a shorter circle. This avoids
quite some register moves and gives some local optimization
possibilities.
Some of that is already done (these say 'continue' instead of
'break').  But I'm sure more can be done.
...
I would anyway suggest not to change the semantics of
existing opcodes for just little win.
Why not?  The opcodes are an internal detail of the PVM, and they
change a bit with almost every Python version.
...
...
...
Id like to propose the following opcodes be added
LOAD_CONST(NONE)
LOAD_CONST(1)
LOAD_CONST(0)
LOAD_CONST(EMPTY_STR)
I'd be careful here, too. The interpreter loop is quite
large, already, and there is a good chance to loose
locality of reference by adding a little bit of code.
I had that several times. You don't think you changed
much, but where are these 10 percent gone now?
Agreed.  This adds more cases to the switch and doesn't reduce the
number of opcodes to be decoded (it only reduces the number of bytes
per opcode, a very meagre gain indeed).
...
Not trying to demoralize you completely, but there are
limits about what can be gathered by optimizing the
interpreter loop. There was once the p2c project, which
gave an overall improvement of 25-40 percent, by totally
removing the interpreter loop.
Yes, that's an upper bound for what you can gain by fiddling upcodes.

There are other gains possible though.  The PVM isn't just the switch
in ceval.c: it is also all the object implementations.  While most are
pretty lean, there's still fluff, e.g. in the lookup of builtins (SF
patch 597907).

--Guido van Rossum (home page: http://www.python.org/~guido/)