[Python-Dev] -O2 faster than -O3?
ntoronto at cs.byu.edu
Sat Dec 1 07:35:09 CET 2007
Neal Norwitz wrote:
> On Nov 30, 2007 7:16 PM, Brett Cannon <brett at python.org> wrote:
>> On Nov 30, 2007 12:02 PM, Neil Toronto <ntoronto at cs.byu.edu> wrote:
>>> On both of my systems, using -O2 reduces execution time in pystone by 9%
>>> and in pybench by 8%. It's function inlining: "-O3
>>> -fno-inline-functions" works just as well as "-O2". Removing "-g" has
>>> little effect on the result.
>>> - AMD Athlon 64 X2 Dual Core 4600+, 512 KB cache (desktop)
>>> - Intel T2300 Dual Core 1.66GHz, 512 KB cache (laptop)
>>> Both are Ubuntu 7.04, GCC 4.1.2.
>>> Does anybody else see this?
>>> It may be GCC being stupid (which has happened before) or not enough
>>> cache on my systems (definitely possible). If it's not one of those, I'd
>>> say it's because CPython core functions are already very large, and
>>> almost everything that ought to be inlined is already a macro.
>> That's quite possible. Previous benchmarks by AMK have shown that
>> perhaps -0m (or whatever the flag is to optimize for size) sometimes
>> is the best solution. It has always been believed that the eval loop
>> is already large and manages to hit some cache sweet spot.
> The flag is -Os. I suspect you will do better to limit the size of
> inlining rather disabling it completely. The option is
> -finline-limit=number. I don't know the default value or what you
> should try. I would be interested to hear more results though.
I've got some pystones (500000) results for the Athlon. The default for
-finline-limit is 600. This is for the current trunk.
Global options pystones/sec (median of 3)
-O3 -fno-inline-functions 54824.6
-O3 -finline-limit=300 51229.7
-O3 -finline-limit=150 51177.7
-O3 -finline-limit=75 51759.8
-O3 -finline-limit=25 53821.3
ceval.c options (-O3 for others) pystones/sec (median of 3)
-O3 -fno-inline-functions 55679.3
-O3 -finline-limit=300 51440.3
-O3 -finline-limit=150 50916.5
-O3 -finline-limit=75 51387.5
-O3 -finline-limit=25 52631.6
Now that's interesting. -O2 seems to be the best global option, and -Os
seems to be best for ceval.c. One more test then:
Global -O2, ceval.c -Os 56753.7
If you're going to run these benchmarks yourself, make sure you "make
clean" before building with different options. (I don't know why it's
necessary, but it is.) To change options for just ceval.c, add this to
Makefile.pre.in under "Special rules":
$(CC) -c $(PY_CFLAGS) -Os \
-o $@ $(srcdir)/Python/ceval.c
The last -O flag should override any other.
More information about the Python-Dev