
I implemented LOAD_FAST_n, STORE_FAST_n, LOAD_CONST_n for n < 16
Getting a small 2% improvement in speed Going from about 21800 PyStones to 22300 PyStones; very hard to get consistent readings on the PyStones - anyone got any tips on how to get more consistent results under windows?
Upgrade to a real operating system. :-)
Getting a small 3% reduction in .pyc filesizes os.path 24,929 unmodified os.path 24,149 with modifications
I sort of cheated on the switch statement to avoid the use of a goto.
opcode = NEXTOP(); if (HAS_ARG(opcode)) oparg = NEXTARG(); ... switch (opcode) { ... case LOAD_FAST_14: case LOAD_FAST_15: oparg = opcode - LOAD_FAST_0; case LOAD_FAST: x = GETLOCAL(oparg); if (x != NULL) { Py_INCREF(x); ...
This is ok.
I also altered the opcode.h file to use an enum for the opcodes instead of all those #defines. Much easier to re-arrange things that way. I have a feeling that most of the speedup (such that it is) comes from that re-arrangment, which packs the opcodes into a contiguous numeric space. I suspect that sorting the opcodes by frequency of access might also have some positive effect. Also, organising the opcodes and the switch statement so that frequently co-occuring opcodes are adjacent to each other might also have some positive effect.
Try that without the LOAD_XXX_n idea, so you can measure the effect of each idea separately. One problem with renumbering the opcodes is that you have to update the "dis" module, which exports the opcodes as Python constants. Maybe you should add some code there to regenerate the values from the .h file, as is done in keyword.py and symbol.py. --Guido van Rossum (home page: http://www.python.org/~guido/)