[Python-Dev] LOAD_SELF and SELF_ATTR opcodes
Phillip J. Eby
pje at telecommunity.com
Fri Oct 14 21:43:43 CEST 2005
I ran across an interesting paper about some VM optimizations yesterday:
One thing mentioned was that saving even one cycle in their 'PUSH_SELF'
opcode improved interpreter performance by 5%. I thought that was pretty
cool, and then I realized CPython doesn't even *have* a PUSH_SELF opcode.
So, today, I took a stab at implementing one, by converting "LOAD_FAST 0"
calls to a "LOAD_SELF" opcode. Pystone and Parrotbench improved by about
2% or so. That wasn't great, so I added a "SELF_ATTR" opcode that combines
a LOAD_SELF and a LOAD_ATTR in the same opcode while avoiding extra stack
and refcount manipulation. This raised the total improvement for pystone
to about 5%, but didn't seem to improve parrotbench any further. I guess
parrotbench doesn't do much self.attr stuff in places that really count,
and looking at the code it indeed seems that most self.* stuff is done at
higher levels of the parsing benchmark, not the innermost loops.
Indeed, even pystone doesn't do much attribute access on the first argument
of most of its functions, especially not those in inner loops. Only
Proc1() and the Record.copy() method do anything that would be helped by
SELF_ATTR. But it seems to me that this is very unusual for
object-oriented code, and that more common uses of Python should be helped
a lot more by this. Do we have any benchmarks that don't use 'foo =
self.foo' type shortcuts in their inner loops?
Anyway, my main question is, do these sound like worthwhile
optimizations? The code isn't that complex; the only tricky thing I did
was having the opcodes' error case (unbound local) fall through to the
LOAD_FAST opcode so as not to duplicate the error handling code, in the
hopes of keeping the eval loop size down.
More information about the Python-Dev