[Python-Dev] [ANN] superinstructions (VPython 0.1)
cadr4u at gmail.com
Fri Oct 24 09:41:12 CEST 2008
Antoine Pitrou <solipsis at pitrou.net> writes:
> J. Sievers <cadr4u <at> gmail.com> writes:
>> A sequence of code such as LOAD_CONST LOAD_FAST BINARY_ADD will, in
>> CPython, push some constant onto the stack, push some local onto the
>> stack, then pop both off the stack, add them and push the result back
>> onto the stack.
>> Turning this into a superinstruction means inlining LOAD_CONST and
>> LOAD_FAST, modifying them to store the values they'd otherwise push
>> onto the stack in local variables and adding a version of BINARY_ADD
>> which reads its arguments from those local variables rather than the
>> stack (this reduces dispatch time in addition to pops and pushes).
> The problem is that this only optimizes code like "x + 1" but not "1 + x" or "x
> + y". To make this generic a first step would be to try to fuse LOAD_CONST and
> LOAD_FAST into a single opcode (and check it doesn't slow down the VM). This
> could be possible by copying the constants table into the start of the frame's
> variables array when the frame is created, so that the LOAD_FAST code still does
> a single indexed array dereference. Since constants are constants, they don't
> need to be copied again when the frame is re-used by a subsequent call of the
> same function (but this would slow done recursive functions a bit, since those
> have to create new frames each time they are called).
> Then fusing e.g. LOAD_FAST LOAD_FAST BINARY_ADD into ADD_FAST_FAST would cover
> many more cases than the optimization you are writing about, without any
> explosion in the number of opcodes.
I don't know that I'd call it an explosion. Currently there are ~150
superinstructions in all (problematic when using bytecode but
inconsequential once one is committed to threaded code).
A superinstruction definition in Vmgen, btw, looks as follows:
fcbinary_add = load_fast load_const binary_add
More information about the Python-Dev