[Python-Dev] Compiling Python on Linux with Intel's icc

Alex Leach albl500 at york.ac.uk
Sat Apr 14 17:35:07 CEST 2012


Thought I'd tie this thread up with a successful method, as I've just compiled Python-2.7.3 and have got the benchmarks to run slightly faster than the system Python :D

** First benchmark **

metabuntu:benchmarks> python perf.py -r -b apps /usr/bin/python ../Python-2.7.3/python
Running 2to3...
INFO:root:Running ../Python-2.7.3/python lib/2to3/2to3 -f all lib/2to3_data
INFO:root:Running `['../Python-2.7.3/python', 'lib/2to3/2to3', '-f', 'all', 'lib/2to3_data']` 5 times
INFO:root:Running /usr/bin/python lib/2to3/2to3 -f all lib/2to3_data
INFO:root:Running `['/usr/bin/python', 'lib/2to3/2to3', '-f', 'all', 'lib/2to3_data']` 5 times
Running html5lib...
INFO:root:Running ../Python-2.7.3/python performance/bm_html5lib.py -n 1
INFO:root:Running `['../Python-2.7.3/python', 'performance/bm_html5lib.py', '-n', '1']` 10 times
INFO:root:Running /usr/bin/python performance/bm_html5lib.py -n 1
INFO:root:Running `['/usr/bin/python', 'performance/bm_html5lib.py', '-n', '1']` 10 times
Running rietveld...
INFO:root:Running ../Python-2.7.3/python performance/bm_rietveld.py -n 100
INFO:root:Running /usr/bin/python performance/bm_rietveld.py -n 100
Running spambayes...
INFO:root:Running ../Python-2.7.3/python performance/bm_spambayes.py -n 100
INFO:root:Running /usr/bin/python performance/bm_spambayes.py -n 100

Report on Linux metabuntu 3.0.0-19-server #32-Ubuntu SMP Thu Apr 5 20:05:13 UTC 2012 x86_64 x86_64
Total CPU cores: 12

### html5lib ###
Min: 8.132508 -> 7.316457: 1.11x faster
Avg: 8.297318 -> 7.460066: 1.11x faster
Significant (t=11.15)
Stddev: 0.21605 -> 0.09843: 2.1950x smaller
Timeline: http://tinyurl.com/bqql4oa

### rietveld ###
Min: 0.297604 -> 0.276587: 1.08x faster
Avg: 0.302667 -> 0.279202: 1.08x faster
Significant (t=37.06)
Stddev: 0.00529 -> 0.00348: 1.5188x smaller
Timeline: http://tinyurl.com/brb3dk5

### spambayes ###
Min: 0.152264 -> 0.143518: 1.06x faster
Avg: 0.156512 -> 0.146559: 1.07x faster
Significant (t=6.66)
Stddev: 0.00847 -> 0.01232: 1.4547x larger
Timeline: http://tinyurl.com/d2dzz6k

The following not significant results are hidden, use -v to show them:
2to3.

( I just noticed the date's wrong in the above report... But I did run that just now, being April 14th 2012, ~1300GMT )



** Required patch **

Only file that breaks compilation is Modules/_ctypes/libffi/src/x86/ffi64.c
I uploaded a patch to http://bugs.python.org/issue4130 that corrects the __int128_t issue.



** Compilation method **

I used a two-step compilation process, with Profile-Guided Optimisation. Relevant environment variables are at the bottom.
In the build directory, make a separate directory for the PGO files.
 mkdir PGO
Then, configure command:-
CFLAGS="-O3 -fomit-frame-pointer -shared-intel -fpic -prof-gen -prof-dir $PWD/PGO -fp-model strict -no-prec-div -xHost -fomit-frame-pointer" \
        ./configure --with-libm="-limf" --with-libc="-lirc" --with-signal-module --with-cxx-main="icpc" --without-gcc --build=x86_64-linux-intel

Then I ran `make -j9` and `make test`. Running the tests ensures that (almost) every module is run at least once.
As the -prof-gen option was used, this means that PGO information is written to files in -prof-dir, when the binaries are running.
To give the code even more rigorous usage, I also ran the benchmark suite, which generates even more PGO information.
The results are useless though.

Then, need to do a `make clean`, and reconfigure.
This time, add "-ipo" to CFLAGS, enabling inter-procedural optimisation, and change "-prof-gen" for "-prof-use":-
CFLAGS="-O3 -fomit-frame-pointer -ipo -shared-intel -fpic -prof-use -prof-dir $PWD/PGO -fp-model strict -no-prec-div -xHost -fomit-frame-pointer" \
        ./configure --with-libm="-limf" --with-libc="-lirc" --with-signal-module --with-cxx-main="icpc" --without-gcc --build=x86_64-linux-intel
Then, of course make -j9 && make test

At this point, I produced the above benchmark results.



** Failed test summary **

I'm happy with most of them, except I don't get what the test_gdbm failure is on about..?
I should probably add --enable-curses to the configure command, and I wouldn't mind getting the network and audio modules to build, 
but I can't see any relevant configure options nor find any missing dependencies. Any suggestions would be appreciated.

349 tests OK.
2 tests failed:
    test_cmath test_gdb
1 test altered the execution environment:
    test_distutils
37 tests skipped:
    test_aepack test_al test_applesingle test_bsddb test_bsddb185
    test_bsddb3 test_cd test_cl test_codecmaps_cn test_codecmaps_hk
    test_codecmaps_jp test_codecmaps_kr test_codecmaps_tw test_curses
    test_dl test_gl test_imageop test_imgfile test_kqueue
    test_linuxaudiodev test_macos test_macostools test_msilib
    test_ossaudiodev test_scriptpackages test_smtpnet
    test_socketserver test_startfile test_sunaudiodev test_timeout
    test_tk test_ttk_guionly test_urllib2net test_urllibnet
    test_winreg test_winsound test_zipfile64
2 skips unexpected on linux2:
    test_bsddb test_bsddb3

test test_cmath failed -- Traceback (most recent call last):
  File "/usr/local/src/pysrc/Python-2.7.3/Lib/test/test_cmath.py", line 352, in test_specific_values
    msg=error_message)
  File "/usr/local/src/pysrc/Python-2.7.3/Lib/test/test_cmath.py", line 94, in rAssertAlmostEqual
    'got {!r}'.format(a, b))
AssertionError: acos0000: acos(complex(0.0, 0.0))
Expected: complex(1.5707963267948966, -0.0)
Received: complex(1.5707963267948966, 0.0)
Received value insufficiently close to expected value.

test test_gdb failed -- Traceback (most recent call last):
  File "/usr/local/src/pysrc/Python-2.7.3/Lib/test/test_gdb.py", line 639, in test_up_at_top
    cmds_after_breakpoint=['py-up'] * 4)
  File "/usr/local/src/pysrc/Python-2.7.3/Lib/test/test_gdb.py", line 146, in get_stack_trace
    self.assertEqual(err, '')
AssertionError: 'Traceback (most recent call last):\n  File "/usr/local/src/pysrc/Python-2.7.3/python-gdb.py", line 1367, in invoke\n    move_in_stack(move_up=True)\n  File "/usr/local/src/pysrc/Python-2.7.3/python-gdb.py", line 1347, in move_in_stack\n    iter_frame.print_summary()\n  File "/usr/local/src/pysrc/Python-2.7.3/python-gdb.py", line 1255, in print_summary\n    line = pyop.current_line()\nAttributeError: \'PyIntObjectPtr\' object has no attribute \'current_line\'\nError occurred in Python command: \'PyIntObjectPtr\' object has no attribute \'current_line\'\nTraceback (most recent call last):\n  File "/usr/local/src/pysrc/Python-2.7.3/python-gdb.py", line 1367, in invoke\n    move_in_stack(move_up=True)\n  File "/usr/local/src/pysrc/Python-2.7.3/python-gdb.py", line 1347, in move_in_stack\n    iter_frame.print_summary()\n  File "/usr/local/src/pysrc/Python-2.7.3/python-gdb.py", line 1255, in print_summary\n    line = pyop.current_line()\nAttributeError: \'PyIntObjectPtr\' object has no attribute \'current_line\'\nError occurred in Python command: \'PyIntObjectPtr\' object has no attribute \'current_line\'\nTraceback (most recent call last):\n  File "/usr/local/src/pysrc/Python-2.7.3/python-gdb.py", line 1367, in invoke\n    move_in_stack(move_up=True)\n  File "/usr/local/src/pysrc/Python-2.7.3/python-gdb.py", line 1347, in move_in_stack\n    iter_frame.print_summary()\n  File "/usr/local/src/pysrc/Python-2.7.3/python-gdb.py", line 1255, in print_summary\n    line = pyop.current_line()\nAttributeError: \'PyIntObjectPtr\' object has no attribute \'current_line\'\nError occurred in Python command: \'PyIntObjectPtr\' object has no attribute \'current_line\'\n' != ''



********

Next attempt:-
Gonna try with: --enable-curses, --enable-audio, --enable-network and --enable-ipv6. May as well do that now...
added above switches to configure command.
Also, switched -shared-intel for -static-intel, to compare benchmark times. This seems to hardly impact performance or file size...

CFLAGS="-O3 -fomit-frame-pointer -ipo -static-intel -fpic -prof-use -prof-dir $PWD/PGO -fp-model strict -no-prec-div -xHost -fomit-frame-pointer" \
        ./configure --with-libm="-limf" --with-libc="-lirc" --with-signal-module --with-cxx-main="icpc" --without-gcc --enable-curses --enable-ipv6 --enable-network --enable-audio --enable-gui --build=x86_64-linux-intel



** Test results **

This time I ran regrtest.py manually, to enable the networking and audio tests in particular:-
metabuntu:Python-2.7.3> ./python Lib/test/regrtest.py -uall

test_linuxaudiodev just hung, even after killing processes (pulseaudio) which were using /dev/dsp, so I added 'test_linuxaudiodev' to NOTTESTS in Lib/test/regrtest.py

361 tests OK.
3 tests failed:
    test_cmath test_gdb test_ossaudiodev
1 test altered the execution environment:
    test_distutils
23 tests skipped:
    test_aepack test_al test_applesingle test_bsddb test_bsddb185
    test_bsddb3 test_cd test_cl test_dl test_gl test_imageop
    test_imgfile test_kqueue test_macos test_macostools test_msilib
    test_py3kwarn test_scriptpackages test_startfile test_sunaudiodev
    test_winreg test_winsound test_zipfile64
2 skips unexpected on linux2:
    test_bsddb test_bsddb3

I use ALSA over OSS, so I'm not really concerned about the OSS test failing.
Running test_linuxaudiodev manually hangs on test_play_sound_file. Tested with pulseaudio running, and not running. Test can only be terminated with KILL signal.
Adding some print statements to the test script, I found the test hangs on `self.dev.write(data)`. Not sure why though..?



** New benchmark **

metabuntu:benchmarks> python perf.py -r -b apps /usr/bin/python ../Python-2.7.3/python

Report on Linux metabuntu 3.0.0-19-server #32-Ubuntu SMP Thu Apr 5 20:05:13 UTC 2012 x86_64 x86_64
Total CPU cores: 12

### 2to3 ###
Min: 6.524408 -> 6.316394: 1.03x faster
Avg: 6.611613 -> 6.392400: 1.03x faster
Significant (t=5.05)
Stddev: 0.06477 -> 0.07228: 1.1159x larger
Timeline: http://tinyurl.com/bub35l9

### html5lib ###
Min: 7.916494 -> 7.212451: 1.10x faster
Avg: 8.025302 -> 7.304856: 1.10x faster
Significant (t=17.53)
Stddev: 0.07606 -> 0.10539: 1.3856x larger
Timeline: http://tinyurl.com/dy7296k

### rietveld ###
Min: 0.291469 -> 0.272601: 1.07x faster
Avg: 0.302746 -> 0.280126: 1.08x faster
Significant (t=15.86)
Stddev: 0.01126 -> 0.00874: 1.2885x smaller
Timeline: http://tinyurl.com/c5ys4bt

### spambayes ###
Min: 0.145370 -> 0.138528: 1.05x faster
Avg: 0.146689 -> 0.141168: 1.04x faster
Significant (t=11.27)
Stddev: 0.00147 -> 0.00468: 3.1885x larger
Timeline: http://tinyurl.com/d8rrp6g



** Relevant Environment Variables. (Maybe there's more, maybe less) **

N.B. I have all Intel stuff installed in /usr/intel. The default is /opt/intel though.

LIBRARY_PATH=/usr/intel/composer_xe_2011_sp1.9.293/tbb/lib/intel64//cc4.1.0_libc2.4_kernel2.6.16.21:/usr/intel/composer_xe_2011_sp1.9.293/compiler/lib/intel64:/usr/intel/composer_xe_2011_sp1.9.293/ipp/../compiler/lib/intel64:/usr/intel/composer_xe_2011_sp1.9.293/ipp/lib/intel64:/usr/intel/composer_xe_2011_sp1.9.293/compiler/lib/intel64:/usr/intel/composer_xe_2011_sp1.9.293/mkl/lib/intel64:/usr/intel/composer_xe_2011_sp1.9.293/tbb/lib/intel64//cc4.1.0_libc2.4_kernel2.6.16.21:/usr/intel/composer_xe_2011_sp1.9.293/tbb/lib/intel64//cc4.1.0_libc2.4_kernel2.6.16.21:/usr/intel/composer_xe_2011_sp1.9.293/compiler/lib/intel64:/usr/intel/composer_xe_2011_sp1.9.293/ipp/../compiler/lib/intel64:/usr/intel/composer_xe_2011_sp1.9.293/ipp/lib/intel64:/usr/intel/composer_xe_2011_sp1.9.293/compiler/lib/intel64:/usr/intel/composer_xe_2011_sp1.9.293/mkl/lib/intel64:/usr/intel/composer_xe_2011_sp1.9.293/tbb/lib/intel64//cc4.1.0_libc2.4_kernel2.6.16.21
LD_LIBRARY_PATH=/usr/intel/composer_xe_2011_sp1.9.293/tbb/lib/intel64//cc4.1.0_libc2.4_kernel2.6.16.21:/usr/intel/impi/4.0.3.008/ia32/lib:/usr/intel/impi/4.0.3.008/intel64/lib:/usr/intel/composer_xe_2011_sp1.9.293/compiler/lib/intel64:/usr/intel/composer_xe_2011_sp1.9.293/ipp/../compiler/lib/intel64:/usr/intel/composer_xe_2011_sp1.9.293/ipp/lib/intel64:/usr/intel/composer_xe_2011_sp1.9.293/compiler/lib/intel64:/usr/intel/composer_xe_2011_sp1.9.293/mkl/lib/intel64:/usr/intel/composer_xe_2011_sp1.9.293/tbb/lib/intel64//cc4.1.0_libc2.4_kernel2.6.16.21:/usr/local/lib/boost:/biol/arb/lib:/lib64:/usr/lib64:/usr/local/lib:/usr/local/cuda/lib64:/usr/local/cuda/lib:/usr/intel/composer_xe_2011_sp1.9.293/debugger/lib/intel64:/usr/intel/composer_xe_2011_sp1.9.293/mpirt/lib/intel64
CPATH=/usr/intel/composer_xe_2011_sp1.9.293/tbb/include:/usr/intel/composer_xe_2011_sp1.9.293/mkl/include:/usr/intel/composer_xe_2011_sp1.9.293/tbb/include:/usr/intel/composer_xe_2011_sp1.9.293/tbb/include:/usr/intel/composer_xe_2011_sp1.9.293/mkl/include:/usr/intel/composer_xe_2011_sp1.9.293/tbb/include
CPP=icc -E
PATH=/usr/intel/impi/4.0.3.008/ia32/bin:/usr/intel/impi/4.0.3.008/intel64/bin:/usr/intel/composer_xe_2011_sp1.9.293/bin/intel64:/usr/intel/impi/4.0.3.008/ia32/bin:/usr/intel/composer_xe_2011_sp1.9.293/bin/intel64:/home/albl500/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/games:/usr/intel/bin:/usr/intel/composer_xe_2011_sp1.9.293/mpirt/bin/intel64:/home/albl500/SDKs/android-sdk-linux/tools:/biol/bin:/biol/arb/bin:/usr/local/cuda/bin:/home/albl500/bin:/usr/intel/composer_xe_2011_sp1.9.293/mpirt/bin/intel64
LD=xild
CXX=icpc
CC=icc


** Summary **

All seems okay to me, although using -no-prev-div (over IEEE floating point arithmetic) is slightly concerning, as I occasionally require very accurate floating point arithmetic.
I try to use numpy as much as possible for math operations though, so maybe it's not concerning at all... "-no-prec-div" and "-fp-model strict" I think are required to pass various math tests though.
Otherwise, I got errors about extremely small floats (<10^-300) being unequal to 0.

I hope this proves useful for anyone else trying to compile an optimised Python for an Intel system.

Cheers,
Alex

-- 
Alex Leach BSc. MRes.
Department of Biology
University of York
York YO10 5DD
United Kingdom
www: http://bioltfws1.york.ac.uk/~albl500
EMAIL DISCLAIMER: http://www.york.ac.uk/docs/disclaimer/email.htm


More information about the Python-Dev mailing list