On Nov 30, 2007 7:16 PM, Brett Cannon <brett@python.org> wrote:
On Nov 30, 2007 12:02 PM, Neil Toronto <ntoronto@cs.byu.edu> wrote:
On both of my systems, using -O2 reduces execution time in pystone by 9% and in pybench by 8%. It's function inlining: "-O3 -fno-inline-functions" works just as well as "-O2". Removing "-g" has little effect on the result.
Systems: - AMD Athlon 64 X2 Dual Core 4600+, 512 KB cache (desktop) - Intel T2300 Dual Core 1.66GHz, 512 KB cache (laptop)
Both are Ubuntu 7.04, GCC 4.1.2.
Does anybody else see this?
It may be GCC being stupid (which has happened before) or not enough cache on my systems (definitely possible). If it's not one of those, I'd say it's because CPython core functions are already very large, and almost everything that ought to be inlined is already a macro.
That's quite possible. Previous benchmarks by AMK have shown that perhaps -0m (or whatever the flag is to optimize for size) sometimes is the best solution. It has always been believed that the eval loop is already large and manages to hit some cache sweet spot.
The flag is -Os. I suspect you will do better to limit the size of inlining rather disabling it completely. The option is -finline-limit=number. I don't know the default value or what you should try. I would be interested to hear more results though. n