[Python-Dev] LOAD_SELF and SELF_ATTR opcodes

Fri Oct 14 21:43:43 CEST 2005

I ran across an interesting paper about some VM optimizations yesterday:

    http://www.object-arts.com/Papers/TheInterpreterIsDead.PDF

One thing mentioned was that saving even one cycle in their 'PUSH_SELF' 
opcode improved interpreter performance by 5%.  I thought that was pretty 
cool, and then I realized CPython doesn't even *have* a PUSH_SELF opcode.

So, today, I took a stab at implementing one, by converting "LOAD_FAST 0" 
calls to a "LOAD_SELF" opcode.  Pystone and Parrotbench improved by about 
2% or so.  That wasn't great, so I added a "SELF_ATTR" opcode that combines 
a LOAD_SELF and a LOAD_ATTR in the same opcode while avoiding extra stack 
and refcount manipulation.  This raised the total improvement for pystone 
to about 5%, but didn't seem to improve parrotbench any further.  I guess 
parrotbench doesn't do much self.attr stuff in places that really count, 
and looking at the code it indeed seems that most self.* stuff is done at 
higher levels of the parsing benchmark, not the innermost loops.

Indeed, even pystone doesn't do much attribute access on the first argument 
of most of its functions, especially not those in inner loops.  Only 
Proc1() and the Record.copy() method do anything that would be helped by 
SELF_ATTR.  But it seems to me that this is very unusual for 
object-oriented code, and that more common uses of Python should be helped 
a lot more by this.  Do we have any benchmarks that don't use 'foo = 
self.foo' type shortcuts in their inner loops?

Anyway, my main question is, do these sound like worthwhile 
optimizations?  The code isn't that complex; the only tricky thing I did 
was having the opcodes' error case (unbound local) fall through to the 
LOAD_FAST opcode so as not to duplicate the error handling code, in the 
hopes of keeping the eval loop size down.