Python 2.6a2 execution times with various compilers

I did some performance comparisons with various compilers and the resulting Python 2.6a2 and pybench.
I put the details on http://www.in-nomine.org/2008/04/11/python-26a2-execution-times-with-various...
Of course, take benchmark results with a grain of salt, but it seems ICC can provide people with an added performance edge when using Python. In short: I measured a speedup between ~21%-29% with ICC compared to GCC.

I did some more tests concentrating on GCC, partly based on the feedback I got, results at http://www.in-nomine.org/2008/04/12/python-26-compiler-options-results/
Executive summary: Python needs to be compiled with -O2 or -O3. Not doing so, no optimization level, results with GCC 4.2.1 in a doubling of execution time. Using just -O1 is still ~15% slower than using -O2.
Using -mtune=native -march=native can shave of 0,1/0,2 seconds, but otherwise I did not find much difference in using having march or mfpmath present.
Profile-guided optimization did not help much, as might be expected, it pushed about the same kind of optimization as the mtune/march combination.

On Sat, Apr 12, 2008 at 11:09 AM, Jeroen Ruigrok van der Werven < asmodai@in-nomine.org> wrote:
I did some more tests concentrating on GCC, partly based on the feedback I got, results at http://www.in-nomine.org/2008/04/12/python-26-compiler-options-results/
Executive summary: Python needs to be compiled with -O2 or -O3. Not doing so, no optimization level, results with GCC 4.2.1 in a doubling of execution time. Using just -O1 is still ~15% slower than using -O2.
Using -mtune=native -march=native can shave of 0,1/0,2 seconds, but otherwise I did not find much difference in using having march or mfpmath present.
Profile-guided optimization did not help much, as might be expected, it pushed about the same kind of optimization as the mtune/march combination.
With gcc 4.1.3 i'm finding that profile guided optimization when trained on pybench or regrtest does make a measurable difference (2-5% overall time with 10-20% on some pybench tests). I haven't run benchmarks enough times to be confident in my results yet, I'll report back with data once I have it. I'm testing both pybench and regrtest as profiling training runs.
I will check in a special makefile target for easy gcc profile guided compiles shortly so that those who want faster builds easily produce them.

-On [20080413 00:47], Gregory P. Smith (greg@krypto.org) wrote:
With gcc 4.1.3 i'm finding that profile guided optimization when trained on pybench or regrtest does make a measurable difference (2-5% overall time with 10-20% on some pybench tests). I haven't run benchmarks enough times to be confident in my results yet, I'll report back with data once I have it. I'm testing both pybench and regrtest as profiling training runs.
It seems GCC 4.2.4 yields worse code for Python with the same options as 4.2.1, I measured about ~7%-8% slowdown (~0,5 seconds) on my test.
Granted, in general this might all be nitpicking, but for our friends in the calculating departments this might be quite useful to know. The differences are in general not concentrated in specific sections of pybench, but are uniformly distributed. I know my employer can use such additional free optimizations since our jobs spawn in many hours of execution. Next to optimizing the source code, of course, this will also shave off quite a lot of execution time.
I will check in a special makefile target for easy gcc profile guided compiles shortly so that those who want faster builds easily produce them.
That would be interesting I think. I went with -fprofile-generate and -fprofile-use in my small test.

Profile-guided optimization did not help much, as might be expected, it pushed about the same kind of optimization as the mtune/march combination.
With gcc 4.1.3 i'm finding that profile guided optimization when trained on pybench or regrtest does make a measurable difference (2-5% overall time with 10-20% on some pybench tests). I haven't run benchmarks enough times to be confident in my results yet, I'll report back with data once I have it. I'm testing both pybench and regrtest as profiling training
runs.
It seems that profile guided optimization also offers some benefits on Windows (eg, http://mail.python.org/pipermail/python-dev/2007-May/072970.html), so it might be worth trying to coordinate such efforts between platforms (eg, a central document which holds results for all supported platforms sounds worthwhile, and maybe sharing the top-level script that generates the profile data, etc)
Cheers,
Mark
participants (3)
-
Gregory P. Smith
-
Jeroen Ruigrok van der Werven
-
Mark Hammond