
Guido van Rossum wrote:
Separating out LOAD_FAST from the switch shows a nice effect. SET_LINENO is removed by -OO anyway, so there's really no use in optimizing this one.
I tried this and found about three percent speed increase on pystone, for what that's worth. This is with python -OO on Linux x86. Note that removing the (now redundant) case from the switch seemed to make a small difference too.
[patch deleted]
Some speedup confirmed on my Linux PIII too. But you saved a round-trip around the ticker test. If this is acceptable, then let's do it for the opcodes involving only stack operations (probably not the JUMPs) _and_ which do not contain DECREFs which may trigger an external call. Here's the picture on my machine (I get the same picture with -OO): cvs - the original ceval.c in CVS load_fast - ceval.c with your patch top5 - ceval.c with my patch at SF moving 5 opcodes to the top top5-loadfast - my patch and your patch ~/python/dev>./python-cvs Lib/test/pystone.py Pystone(1.1) time for 120000 passes = 19.85 This machine benchmarks at 6045.34 pystones/second ~/python/dev>./python-load_fast Lib/test/pystone.py Pystone(1.1) time for 120000 passes = 19.61 This machine benchmarks at 6119.33 pystones/second ~/python/dev>./python-top5 Lib/test/pystone.py Pystone(1.1) time for 120000 passes = 18.87 This machine benchmarks at 6359.3 pystones/second ~/python/dev>./python-top5-load_fast Lib/test/pystone.py Pystone(1.1) time for 120000 passes = 19.08 This machine benchmarks at 6289.31 pystones/second Which shows, among others, that important cache effects are still here, bacause "python-top5-load_fast" is slower than "python-top5" alone... no-more-time-for-it-from-me-either'ly y'rs -- Vladimir MARANGOZOV | Vladimir.Marangozov@inrialpes.fr http://sirac.inrialpes.fr/~marangoz | tel:(+33-4)76615277 fax:76615252