[Python-Dev] Bytecode analysis

Christian Tismer tismer@tismer.com
Wed, 26 Feb 2003 03:09:36 +0100


Guido van Rossum wrote:

[Chris (not the bot)]

> I'm all against changing existing opcodes for a minor
> speed-up. Real speed-up is available by a specializing
> compiler which turns things into real code.

[Guido]
> If you are referring Psyco, I expect that it is several years before
> maturity.  Currently it uses too much memory to be realistic.

You are probably right about maturity, but I don't think
it will be years. In a month, he will be able to work
full-time on this, and I just spent a week with him
on the PyPy sprint and saw a genious fly... amazing!

...

> OTOH, compared to the work that POP_TOP does, the work of decoding the
> opcodes is significant.

Good point.

>>Where you really can save some time is to shortcut some
>>of the very short opcodes to not jump back to the ticker
>>counting code, but into a shorter circle. This avoids
>>quite some register moves and gives some local optimization
>>possibilities.
> 
> 
> Some of that is already done (these say 'continue' instead of
> 'break').  But I'm sure more can be done.

Hey, this is amazing! This was kind of the stuff that
I did to the interpreter couple of years ago, where
you really disliked anybody hacking on the core for
saving some cycles. Now I see this evolve from alone,
with some pleasure. Every path that I can dispose
with is a good patch. :-)

>>I would anyway suggest not to change the semantics of
>>existing opcodes for just little win.
> 
> 
> Why not?  The opcodes are an internal detail of the PVM, and they
> change a bit with almost every Python version.

Maybe I've become a bit conservative, trying to keep
more compatibility between versions than necessary.
There have been opcode changes from time to time,
but semantics were never changed, and new opcodes
were always added to the end of the table, so I
thought you wanted to avoid unneccessary changes
to dis.dis and friends, and maybe stay able
to run older .pycs with newer interpreters, but
I really don't insist. I just found a minor speedup
not worth the change at all.
(Hey did you expect such words from me? :-)

[locality of reference and possible code bloat]

> Agreed.  This adds more cases to the switch and doesn't reduce the
> number of opcodes to be decoded (it only reduces the number of bytes
> per opcode, a very meagre gain indeed).

Well, if we are going to refactor opcodes, it might
be worth considering to do changes that both don't increase
the number of opcodes to be executed (shorten the code objects)
and don't increase the number of opcodes to be distinguished.
Modifying the JUMP_IF_XXX is towards this goal.
There was this other proposal of CALL_METHOD (or so), which
adds an opcode but shortens interpretation and code size.
Maybe it makes sense to get rid of some more seldomly used
opcodes like BUILD_CLASS, which could be replaced by a regular
call to a special function? EXEC_STMT is a candidate as well.
This is used not so often and could be a special function
call. All the PRINT_XXX opcodes are doing some time consuming
stuff, so we could replace them by special function calls,
since the overhead doesn't count here.
They do not need to sit in the big switch.

These were just some random ideas. There are time critical,
often used opcodes, together with seldomly used ones, which
take quite some time, anyway. While compressing patterns
of the frequently used ones into fast combined opcodes
(CALL_METHOD, JUMP_IF_XXX with pop), the space of others
can be reclaimed.

>>Not trying to demoralize you completely, but there are
>>limits about what can be gathered by optimizing the
>>interpreter loop. There was once the p2c project, which
>>gave an overall improvement of 25-40 percent, by totally
>>removing the interpreter loop.
> 
> 
> Yes, that's an upper bound for what you can gain by fiddling upcodes.

Which made opcode optimization rather pointless for me,
and I stopped hacking on this.

> There are other gains possible though.  The PVM isn't just the switch
> in ceval.c: it is also all the object implementations.  While most are
> pretty lean, there's still fluff, e.g. in the lookup of builtins (SF
> patch 597907).

And specialized string-only namespaces, cached method lookups,
melting down local variables into primitive C types, ...
there are tons of stuff which I could add, but I don't want
to try this in C. Python is much better for prototyping. :-)

cheers - chris

-- 
Christian Tismer             :^)   <mailto:tismer@tismer.com>
Mission Impossible 5oftware  :     Have a break! Take a ride on Python's
Johannes-Niemeyer-Weg 9a     :    *Starship* http://starship.python.net/
14109 Berlin                 :     PGP key -> http://wwwkeys.pgp.net/
work +49 30 89 09 53 34  home +49 30 802 86 56  pager +49 173 24 18 776
PGP 0x57F3BF04       9064 F4E1 D754 C2FF 1619  305B C09C 5A3B 57F3 BF04
      whom do you want to sponsor today?   http://www.stackless.com/