I've been reading some books and papers about stack virtual machines optimization, and playing around with Python's bytecode and inner loop organization. As always, I found some interesting results and some frustrating ones.
Recently, I have found your paper about peephole optimization, and other tries you've made in the same job. Well, basically I discovered that I'm not original, and repeated most of your ideas and mistakes. :-) But that's ok. It gave me a good idea of paths to follow if I want to keep playing with this.
One thing I thought and also found a reference in your paper is about some instructions that should be turned into a single opcode. To understand how this would affect the code, I have disassembled the whole Python standard library, and the whole Zope library. After that I've run a script to detect opcode repeatings (excluding SET_LINENO).
Here are the top repeatings:
23632 LOAD_FAST, LOAD_ATTR 15382 LOAD_CONST, LOAD_CONST 12842 JUMP_IF_FALSE, POP_TOP 12397 CALL_FUNCTION, POP_TOP 12121 LOAD_FAST, LOAD_FAST
Not by casuality, I found in your paper references to a LOAD_FAST_ATTR opcode. Since you probably have mentioned this to others, I wouldn't like to bother everyone again asking why it was not implemented. Could you please explain me the reasons that left this behind?
If you have the time, I'd also like to understand what's the trouble involved in getting a peephole optimizer in the python compiler itself. Is it just about compiling performance? I don't remember to have read about this in your paper, but you probably thought about that as well.