
Chris Tismer wrote:
Oh, that was not what I meant. I also did this two years ago and tossed it. Function calls are too expensive. What I mean was to fold opcodes by common patterns. Unfortunately this is slower, too.
Anyway, I didn't want to get too deep into this. Stopping wasting time now :-)
Chris already knows this, but it's worth repeating for people who don't. A function call isn't always too expensive, it depends on how much work the opcode is doing. And it depends on lots of other hard-to-predict effects of the generated code and its interaction with the memory system. The various function call opcodes regularly call out to separate functions. I recall benchmarking various options and often moving big chunks of code out of the mainloop and into functions improved performance slightly. Except when it didn't <0.3 wink>. If you are benchmarking various opcode effects, I'd recommend trying to revive the simple cycle counter instrumentation I did for Python 2.2. The idea is to use the Pentium cycle counter to measure the number of cycles spent on each trip through the mainloop. A rough conclusion from the previous measurements was that trivial opcodes like POP_TOP can execute in less than 100 cycles, including opcode dispatch. An opcode that involves calling out to a C function never executes in less than 100 cycles, and often takes 100s of cycles. There's a patch floating around sourceforge somewhere. Jeremy