[Python-Dev] Bytecode analysis

Tue, 25 Feb 2003 20:22:44 -0500

> > Some stats about JUMP_IF_FALSE opcodes
> > 
> > Of the 2768 JUMP_IF_FALSE opcodes encountered, 2429 have a POP_TOP on
> > both branches.
> > 
> > Id like to propose that JUMP_IF_FALSE consume the top-of-stack.
> 
> I'm all against changing existing opcodes for a minor
> speed-up. Real speed-up is available by a specializing
> compiler which turns things into real code.

If you are referring Psyco, I expect that it is several years before
maturity.  Currently it uses too much memory to be realistic.

> If you really want to change the engine, I would consider
> to emit an extra opcode and try how the change performs
> against a couple of applications. I doubt the effect, since
> the little POP_TOP is pretty fast.

OTOH, compared to the work that POP_TOP does, the work of decoding the
opcodes is significant.

> Where you really can save some time is to shortcut some
> of the very short opcodes to not jump back to the ticker
> counting code, but into a shorter circle. This avoids
> quite some register moves and gives some local optimization
> possibilities.

Some of that is already done (these say 'continue' instead of
'break').  But I'm sure more can be done.

> I would anyway suggest not to change the semantics of
> existing opcodes for just little win.

Why not?  The opcodes are an internal detail of the PVM, and they
change a bit with almost every Python version.

> ...
> 
> > Id like to propose the following opcodes be added
> > LOAD_CONST(NONE)
> > LOAD_CONST(1)
> > LOAD_CONST(0)
> > LOAD_CONST(EMPTY_STR)
> 
> I'd be careful here, too. The interpreter loop is quite
> large, already, and there is a good chance to loose
> locality of reference by adding a little bit of code.
> I had that several times. You don't think you changed
> much, but where are these 10 percent gone now?

Agreed.  This adds more cases to the switch and doesn't reduce the
number of opcodes to be decoded (it only reduces the number of bytes
per opcode, a very meagre gain indeed).

> Not trying to demoralize you completely, but there are
> limits about what can be gathered by optimizing the
> interpreter loop. There was once the p2c project, which
> gave an overall improvement of 25-40 percent, by totally
> removing the interpreter loop.

Yes, that's an upper bound for what you can gain by fiddling upcodes.

There are other gains possible though.  The PVM isn't just the switch
in ceval.c: it is also all the object implementations.  While most are
pretty lean, there's still fluff, e.g. in the lookup of builtins (SF
patch 597907).

--Guido van Rossum (home page: http://www.python.org/~guido/)