
Guido van Rossum wrote:
[me, on micro-optimizing the main loop]
Fair enough (for one particular CPU at least).
Since we're at it, it's worth mentioning another conclusion we came across at the time: the cache effects in the main loop are significant -- it is more important to try keeping at best the main loop small enough, so that those effects are minimized. An experiment I did at the time which gave some delta-speedup: I folded most of the UNOP & BINOP opcodes since they differ only by the functon they call and they duplicate most of the opcode body. Like this: Original: case BINARY_POWER: w = POP(); v = POP(); x = PyNumber_Power(v, w, Py_None); Py_DECREF(v); Py_DECREF(w); PUSH(x); if (x != NULL) continue; break; case BINARY_MULTIPLY: w = POP(); v = POP(); x = PyNumber_Multiply(v, w); Py_DECREF(v); Py_DECREF(w); PUSH(x); if (x != NULL) continue; break; ... Folded version: case BINARY_POWER: w = POP(); v = POP(); x = PyNumber_Power(v, w, Py_None); goto end_binop; case BINARY_MULTIPLY: w = POP(); v = POP(); x = PyNumber_Multiply(v, w); goto end_binop; ... end_binop: Py_DECREF(v); Py_DECREF(w); PUSH(x); if (x != NULL) continue; break; This reduced the code size of ceval.c, which resulted in less cache effects and gave more speed, despite the additional jumps. It possibly results in less page-faults too, although this is unlikely. Which makes me think that, if we want to do something about cache effects, it is probably not a bad idea to just "reorder" the bytecodes in the big switch by decreasing frequency (we have some stats about this -- I believe Skip and MAL have discussed the opcodes' frequency and the charts lie somewhere in the archives). I remember Marc-Andre had done something in this direction and reported some perf improvements too. Since reordering the opcodes doesn't really hurt, if I'm about to do something with the main loop, it'll be only this.
Micro-optimization doesn't worth the effort. As to the 2-byte arg limit, I think I'd be happy if the compiler raises an exception if this limit is exceeded.
That would be a good first step indeed!
I'll try to have a look, if someone else is not more reactive than me. -- Vladimir MARANGOZOV | Vladimir.Marangozov@inrialpes.fr http://sirac.inrialpes.fr/~marangoz | tel:(+33-4)76615277 fax:76615252