Re: [Python-Dev] Compiling Python on Linux with Intel's icc

Alex Leach <albl500@york.ac.uk> wrote:
Can you translate Intel's suggestion into a patch for ffi64?
Well probably, but this really belongs on the bug tracker. Also, as I said, there are many issues with higher priority. Stefan Krah

Thought I'd tie this thread up with a successful method, as I've just compiled Python-2.7.3 and have got the benchmarks to run slightly faster than the system Python :D ** First benchmark ** metabuntu:benchmarks> python perf.py -r -b apps /usr/bin/python ../Python-2.7.3/python Running 2to3... INFO:root:Running ../Python-2.7.3/python lib/2to3/2to3 -f all lib/2to3_data INFO:root:Running `['../Python-2.7.3/python', 'lib/2to3/2to3', '-f', 'all', 'lib/2to3_data']` 5 times INFO:root:Running /usr/bin/python lib/2to3/2to3 -f all lib/2to3_data INFO:root:Running `['/usr/bin/python', 'lib/2to3/2to3', '-f', 'all', 'lib/2to3_data']` 5 times Running html5lib... INFO:root:Running ../Python-2.7.3/python performance/bm_html5lib.py -n 1 INFO:root:Running `['../Python-2.7.3/python', 'performance/bm_html5lib.py', '-n', '1']` 10 times INFO:root:Running /usr/bin/python performance/bm_html5lib.py -n 1 INFO:root:Running `['/usr/bin/python', 'performance/bm_html5lib.py', '-n', '1']` 10 times Running rietveld... INFO:root:Running ../Python-2.7.3/python performance/bm_rietveld.py -n 100 INFO:root:Running /usr/bin/python performance/bm_rietveld.py -n 100 Running spambayes... INFO:root:Running ../Python-2.7.3/python performance/bm_spambayes.py -n 100 INFO:root:Running /usr/bin/python performance/bm_spambayes.py -n 100 Report on Linux metabuntu 3.0.0-19-server #32-Ubuntu SMP Thu Apr 5 20:05:13 UTC 2012 x86_64 x86_64 Total CPU cores: 12 ### html5lib ### Min: 8.132508 -> 7.316457: 1.11x faster Avg: 8.297318 -> 7.460066: 1.11x faster Significant (t=11.15) Stddev: 0.21605 -> 0.09843: 2.1950x smaller Timeline: http://tinyurl.com/bqql4oa ### rietveld ### Min: 0.297604 -> 0.276587: 1.08x faster Avg: 0.302667 -> 0.279202: 1.08x faster Significant (t=37.06) Stddev: 0.00529 -> 0.00348: 1.5188x smaller Timeline: http://tinyurl.com/brb3dk5 ### spambayes ### Min: 0.152264 -> 0.143518: 1.06x faster Avg: 0.156512 -> 0.146559: 1.07x faster Significant (t=6.66) Stddev: 0.00847 -> 0.01232: 1.4547x larger Timeline: http://tinyurl.com/d2dzz6k The following not significant results are hidden, use -v to show them: 2to3. ( I just noticed the date's wrong in the above report... But I did run that just now, being April 14th 2012, ~1300GMT ) ** Required patch ** Only file that breaks compilation is Modules/_ctypes/libffi/src/x86/ffi64.c I uploaded a patch to http://bugs.python.org/issue4130 that corrects the __int128_t issue. ** Compilation method ** I used a two-step compilation process, with Profile-Guided Optimisation. Relevant environment variables are at the bottom. In the build directory, make a separate directory for the PGO files. mkdir PGO Then, configure command:- CFLAGS="-O3 -fomit-frame-pointer -shared-intel -fpic -prof-gen -prof-dir $PWD/PGO -fp-model strict -no-prec-div -xHost -fomit-frame-pointer" \ ./configure --with-libm="-limf" --with-libc="-lirc" --with-signal-module --with-cxx-main="icpc" --without-gcc --build=x86_64-linux-intel Then I ran `make -j9` and `make test`. Running the tests ensures that (almost) every module is run at least once. As the -prof-gen option was used, this means that PGO information is written to files in -prof-dir, when the binaries are running. To give the code even more rigorous usage, I also ran the benchmark suite, which generates even more PGO information. The results are useless though. Then, need to do a `make clean`, and reconfigure. This time, add "-ipo" to CFLAGS, enabling inter-procedural optimisation, and change "-prof-gen" for "-prof-use":- CFLAGS="-O3 -fomit-frame-pointer -ipo -shared-intel -fpic -prof-use -prof-dir $PWD/PGO -fp-model strict -no-prec-div -xHost -fomit-frame-pointer" \ ./configure --with-libm="-limf" --with-libc="-lirc" --with-signal-module --with-cxx-main="icpc" --without-gcc --build=x86_64-linux-intel Then, of course make -j9 && make test At this point, I produced the above benchmark results. ** Failed test summary ** I'm happy with most of them, except I don't get what the test_gdbm failure is on about..? I should probably add --enable-curses to the configure command, and I wouldn't mind getting the network and audio modules to build, but I can't see any relevant configure options nor find any missing dependencies. Any suggestions would be appreciated. 349 tests OK. 2 tests failed: test_cmath test_gdb 1 test altered the execution environment: test_distutils 37 tests skipped: test_aepack test_al test_applesingle test_bsddb test_bsddb185 test_bsddb3 test_cd test_cl test_codecmaps_cn test_codecmaps_hk test_codecmaps_jp test_codecmaps_kr test_codecmaps_tw test_curses test_dl test_gl test_imageop test_imgfile test_kqueue test_linuxaudiodev test_macos test_macostools test_msilib test_ossaudiodev test_scriptpackages test_smtpnet test_socketserver test_startfile test_sunaudiodev test_timeout test_tk test_ttk_guionly test_urllib2net test_urllibnet test_winreg test_winsound test_zipfile64 2 skips unexpected on linux2: test_bsddb test_bsddb3 test test_cmath failed -- Traceback (most recent call last): File "/usr/local/src/pysrc/Python-2.7.3/Lib/test/test_cmath.py", line 352, in test_specific_values msg=error_message) File "/usr/local/src/pysrc/Python-2.7.3/Lib/test/test_cmath.py", line 94, in rAssertAlmostEqual 'got {!r}'.format(a, b)) AssertionError: acos0000: acos(complex(0.0, 0.0)) Expected: complex(1.5707963267948966, -0.0) Received: complex(1.5707963267948966, 0.0) Received value insufficiently close to expected value. test test_gdb failed -- Traceback (most recent call last): File "/usr/local/src/pysrc/Python-2.7.3/Lib/test/test_gdb.py", line 639, in test_up_at_top cmds_after_breakpoint=['py-up'] * 4) File "/usr/local/src/pysrc/Python-2.7.3/Lib/test/test_gdb.py", line 146, in get_stack_trace self.assertEqual(err, '') AssertionError: 'Traceback (most recent call last):\n File "/usr/local/src/pysrc/Python-2.7.3/python-gdb.py", line 1367, in invoke\n move_in_stack(move_up=True)\n File "/usr/local/src/pysrc/Python-2.7.3/python-gdb.py", line 1347, in move_in_stack\n iter_frame.print_summary()\n File "/usr/local/src/pysrc/Python-2.7.3/python-gdb.py", line 1255, in print_summary\n line = pyop.current_line()\nAttributeError: \'PyIntObjectPtr\' object has no attribute \'current_line\'\nError occurred in Python command: \'PyIntObjectPtr\' object has no attribute \'current_line\'\nTraceback (most recent call last):\n File "/usr/local/src/pysrc/Python-2.7.3/python-gdb.py", line 1367, in invoke\n move_in_stack(move_up=True)\n File "/usr/local/src/pysrc/Python-2.7.3/python-gdb.py", line 1347, in move_in_stack\n iter_frame.print_summary()\n File "/usr/local/src/pysrc/Python-2.7.3/python-gdb.py", line 1255, in print_summary\n line = pyop.current_line()\nAttributeError: \'PyIntObjectPtr\' object has no attribute \'current_line\'\nError occurred in Python command: \'PyIntObjectPtr\' object has no attribute \'current_line\'\nTraceback (most recent call last):\n File "/usr/local/src/pysrc/Python-2.7.3/python-gdb.py", line 1367, in invoke\n move_in_stack(move_up=True)\n File "/usr/local/src/pysrc/Python-2.7.3/python-gdb.py", line 1347, in move_in_stack\n iter_frame.print_summary()\n File "/usr/local/src/pysrc/Python-2.7.3/python-gdb.py", line 1255, in print_summary\n line = pyop.current_line()\nAttributeError: \'PyIntObjectPtr\' object has no attribute \'current_line\'\nError occurred in Python command: \'PyIntObjectPtr\' object has no attribute \'current_line\'\n' != '' ******** Next attempt:- Gonna try with: --enable-curses, --enable-audio, --enable-network and --enable-ipv6. May as well do that now... added above switches to configure command. Also, switched -shared-intel for -static-intel, to compare benchmark times. This seems to hardly impact performance or file size... CFLAGS="-O3 -fomit-frame-pointer -ipo -static-intel -fpic -prof-use -prof-dir $PWD/PGO -fp-model strict -no-prec-div -xHost -fomit-frame-pointer" \ ./configure --with-libm="-limf" --with-libc="-lirc" --with-signal-module --with-cxx-main="icpc" --without-gcc --enable-curses --enable-ipv6 --enable-network --enable-audio --enable-gui --build=x86_64-linux-intel ** Test results ** This time I ran regrtest.py manually, to enable the networking and audio tests in particular:- metabuntu:Python-2.7.3> ./python Lib/test/regrtest.py -uall test_linuxaudiodev just hung, even after killing processes (pulseaudio) which were using /dev/dsp, so I added 'test_linuxaudiodev' to NOTTESTS in Lib/test/regrtest.py 361 tests OK. 3 tests failed: test_cmath test_gdb test_ossaudiodev 1 test altered the execution environment: test_distutils 23 tests skipped: test_aepack test_al test_applesingle test_bsddb test_bsddb185 test_bsddb3 test_cd test_cl test_dl test_gl test_imageop test_imgfile test_kqueue test_macos test_macostools test_msilib test_py3kwarn test_scriptpackages test_startfile test_sunaudiodev test_winreg test_winsound test_zipfile64 2 skips unexpected on linux2: test_bsddb test_bsddb3 I use ALSA over OSS, so I'm not really concerned about the OSS test failing. Running test_linuxaudiodev manually hangs on test_play_sound_file. Tested with pulseaudio running, and not running. Test can only be terminated with KILL signal. Adding some print statements to the test script, I found the test hangs on `self.dev.write(data)`. Not sure why though..? ** New benchmark ** metabuntu:benchmarks> python perf.py -r -b apps /usr/bin/python ../Python-2.7.3/python Report on Linux metabuntu 3.0.0-19-server #32-Ubuntu SMP Thu Apr 5 20:05:13 UTC 2012 x86_64 x86_64 Total CPU cores: 12 ### 2to3 ### Min: 6.524408 -> 6.316394: 1.03x faster Avg: 6.611613 -> 6.392400: 1.03x faster Significant (t=5.05) Stddev: 0.06477 -> 0.07228: 1.1159x larger Timeline: http://tinyurl.com/bub35l9 ### html5lib ### Min: 7.916494 -> 7.212451: 1.10x faster Avg: 8.025302 -> 7.304856: 1.10x faster Significant (t=17.53) Stddev: 0.07606 -> 0.10539: 1.3856x larger Timeline: http://tinyurl.com/dy7296k ### rietveld ### Min: 0.291469 -> 0.272601: 1.07x faster Avg: 0.302746 -> 0.280126: 1.08x faster Significant (t=15.86) Stddev: 0.01126 -> 0.00874: 1.2885x smaller Timeline: http://tinyurl.com/c5ys4bt ### spambayes ### Min: 0.145370 -> 0.138528: 1.05x faster Avg: 0.146689 -> 0.141168: 1.04x faster Significant (t=11.27) Stddev: 0.00147 -> 0.00468: 3.1885x larger Timeline: http://tinyurl.com/d8rrp6g ** Relevant Environment Variables. (Maybe there's more, maybe less) ** N.B. I have all Intel stuff installed in /usr/intel. The default is /opt/intel though. LIBRARY_PATH=/usr/intel/composer_xe_2011_sp1.9.293/tbb/lib/intel64//cc4.1.0_libc2.4_kernel2.6.16.21:/usr/intel/composer_xe_2011_sp1.9.293/compiler/lib/intel64:/usr/intel/composer_xe_2011_sp1.9.293/ipp/../compiler/lib/intel64:/usr/intel/composer_xe_2011_sp1.9.293/ipp/lib/intel64:/usr/intel/composer_xe_2011_sp1.9.293/compiler/lib/intel64:/usr/intel/composer_xe_2011_sp1.9.293/mkl/lib/intel64:/usr/intel/composer_xe_2011_sp1.9.293/tbb/lib/intel64//cc4.1.0_libc2.4_kernel2.6.16.21:/usr/intel/composer_xe_2011_sp1.9.293/tbb/lib/intel64//cc4.1.0_libc2.4_kernel2.6.16.21:/usr/intel/composer_xe_2011_sp1.9.293/compiler/lib/intel64:/usr/intel/composer_xe_2011_sp1.9.293/ipp/../compiler/lib/intel64:/usr/intel/composer_xe_2011_sp1.9.293/ipp/lib/intel64:/usr/intel/composer_xe_2011_sp1.9.293/compiler/lib/intel64:/usr/intel/composer_xe_2011_sp1.9.293/mkl/lib/intel64:/usr/intel/composer_xe_2011_sp1.9.293/tbb/lib/intel64//cc4.1.0_libc2.4_kernel2.6.16.21 LD_LIBRARY_PATH=/usr/intel/composer_xe_2011_sp1.9.293/tbb/lib/intel64//cc4.1.0_libc2.4_kernel2.6.16.21:/usr/intel/impi/4.0.3.008/ia32/lib:/usr/intel/impi/4.0.3.008/intel64/lib:/usr/intel/composer_xe_2011_sp1.9.293/compiler/lib/intel64:/usr/intel/composer_xe_2011_sp1.9.293/ipp/../compiler/lib/intel64:/usr/intel/composer_xe_2011_sp1.9.293/ipp/lib/intel64:/usr/intel/composer_xe_2011_sp1.9.293/compiler/lib/intel64:/usr/intel/composer_xe_2011_sp1.9.293/mkl/lib/intel64:/usr/intel/composer_xe_2011_sp1.9.293/tbb/lib/intel64//cc4.1.0_libc2.4_kernel2.6.16.21:/usr/local/lib/boost:/biol/arb/lib:/lib64:/usr/lib64:/usr/local/lib:/usr/local/cuda/lib64:/usr/local/cuda/lib:/usr/intel/composer_xe_2011_sp1.9.293/debugger/lib/intel64:/usr/intel/composer_xe_2011_sp1.9.293/mpirt/lib/intel64 CPATH=/usr/intel/composer_xe_2011_sp1.9.293/tbb/include:/usr/intel/composer_xe_2011_sp1.9.293/mkl/include:/usr/intel/composer_xe_2011_sp1.9.293/tbb/include:/usr/intel/composer_xe_2011_sp1.9.293/tbb/include:/usr/intel/composer_xe_2011_sp1.9.293/mkl/include:/usr/intel/composer_xe_2011_sp1.9.293/tbb/include CPP=icc -E PATH=/usr/intel/impi/4.0.3.008/ia32/bin:/usr/intel/impi/4.0.3.008/intel64/bin:/usr/intel/composer_xe_2011_sp1.9.293/bin/intel64:/usr/intel/impi/4.0.3.008/ia32/bin:/usr/intel/composer_xe_2011_sp1.9.293/bin/intel64:/home/albl500/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/games:/usr/intel/bin:/usr/intel/composer_xe_2011_sp1.9.293/mpirt/bin/intel64:/home/albl500/SDKs/android-sdk-linux/tools:/biol/bin:/biol/arb/bin:/usr/local/cuda/bin:/home/albl500/bin:/usr/intel/composer_xe_2011_sp1.9.293/mpirt/bin/intel64 LD=xild CXX=icpc CC=icc ** Summary ** All seems okay to me, although using -no-prev-div (over IEEE floating point arithmetic) is slightly concerning, as I occasionally require very accurate floating point arithmetic. I try to use numpy as much as possible for math operations though, so maybe it's not concerning at all... "-no-prec-div" and "-fp-model strict" I think are required to pass various math tests though. Otherwise, I got errors about extremely small floats (<10^-300) being unequal to 0. I hope this proves useful for anyone else trying to compile an optimised Python for an Intel system. Cheers, Alex -- Alex Leach BSc. MRes. Department of Biology University of York York YO10 5DD United Kingdom www: http://bioltfws1.york.ac.uk/~albl500 EMAIL DISCLAIMER: http://www.york.ac.uk/docs/disclaimer/email.htm

Thought I'd tie this thread up with a successful method, as I've just compiled Python-2.7.3 and have got the benchmarks to run slightly faster than the system Python :D ** First benchmark ** metabuntu:benchmarks> python perf.py -r -b apps /usr/bin/python ../Python-2.7.3/python Running 2to3... INFO:root:Running ../Python-2.7.3/python lib/2to3/2to3 -f all lib/2to3_data INFO:root:Running `['../Python-2.7.3/python', 'lib/2to3/2to3', '-f', 'all', 'lib/2to3_data']` 5 times INFO:root:Running /usr/bin/python lib/2to3/2to3 -f all lib/2to3_data INFO:root:Running `['/usr/bin/python', 'lib/2to3/2to3', '-f', 'all', 'lib/2to3_data']` 5 times Running html5lib... INFO:root:Running ../Python-2.7.3/python performance/bm_html5lib.py -n 1 INFO:root:Running `['../Python-2.7.3/python', 'performance/bm_html5lib.py', '-n', '1']` 10 times INFO:root:Running /usr/bin/python performance/bm_html5lib.py -n 1 INFO:root:Running `['/usr/bin/python', 'performance/bm_html5lib.py', '-n', '1']` 10 times Running rietveld... INFO:root:Running ../Python-2.7.3/python performance/bm_rietveld.py -n 100 INFO:root:Running /usr/bin/python performance/bm_rietveld.py -n 100 Running spambayes... INFO:root:Running ../Python-2.7.3/python performance/bm_spambayes.py -n 100 INFO:root:Running /usr/bin/python performance/bm_spambayes.py -n 100 Report on Linux metabuntu 3.0.0-19-server #32-Ubuntu SMP Thu Apr 5 20:05:13 UTC 2012 x86_64 x86_64 Total CPU cores: 12 ### html5lib ### Min: 8.132508 -> 7.316457: 1.11x faster Avg: 8.297318 -> 7.460066: 1.11x faster Significant (t=11.15) Stddev: 0.21605 -> 0.09843: 2.1950x smaller Timeline: http://tinyurl.com/bqql4oa ### rietveld ### Min: 0.297604 -> 0.276587: 1.08x faster Avg: 0.302667 -> 0.279202: 1.08x faster Significant (t=37.06) Stddev: 0.00529 -> 0.00348: 1.5188x smaller Timeline: http://tinyurl.com/brb3dk5 ### spambayes ### Min: 0.152264 -> 0.143518: 1.06x faster Avg: 0.156512 -> 0.146559: 1.07x faster Significant (t=6.66) Stddev: 0.00847 -> 0.01232: 1.4547x larger Timeline: http://tinyurl.com/d2dzz6k The following not significant results are hidden, use -v to show them: 2to3. ( I just noticed the date's wrong in the above report... But I did run that just now, being April 14th 2012, ~1300GMT ) ** Required patch ** Only file that breaks compilation is Modules/_ctypes/libffi/src/x86/ffi64.c I uploaded a patch to http://bugs.python.org/issue4130 that corrects the __int128_t issue. ** Compilation method ** I used a two-step compilation process, with Profile-Guided Optimisation. Relevant environment variables are at the bottom. In the build directory, make a separate directory for the PGO files. mkdir PGO Then, configure command:- CFLAGS="-O3 -fomit-frame-pointer -shared-intel -fpic -prof-gen -prof-dir $PWD/PGO -fp-model strict -no-prec-div -xHost -fomit-frame-pointer" \ ./configure --with-libm="-limf" --with-libc="-lirc" --with-signal-module --with-cxx-main="icpc" --without-gcc --build=x86_64-linux-intel Then I ran `make -j9` and `make test`. Running the tests ensures that (almost) every module is run at least once. As the -prof-gen option was used, this means that PGO information is written to files in -prof-dir, when the binaries are running. To give the code even more rigorous usage, I also ran the benchmark suite, which generates even more PGO information. The results are useless though. Then, need to do a `make clean`, and reconfigure. This time, add "-ipo" to CFLAGS, enabling inter-procedural optimisation, and change "-prof-gen" for "-prof-use":- CFLAGS="-O3 -fomit-frame-pointer -ipo -shared-intel -fpic -prof-use -prof-dir $PWD/PGO -fp-model strict -no-prec-div -xHost -fomit-frame-pointer" \ ./configure --with-libm="-limf" --with-libc="-lirc" --with-signal-module --with-cxx-main="icpc" --without-gcc --build=x86_64-linux-intel Then, of course make -j9 && make test At this point, I produced the above benchmark results. ** Failed test summary ** I'm happy with most of them, except I don't get what the test_gdbm failure is on about..? I should probably add --enable-curses to the configure command, and I wouldn't mind getting the network and audio modules to build, but I can't see any relevant configure options nor find any missing dependencies. Any suggestions would be appreciated. 349 tests OK. 2 tests failed: test_cmath test_gdb 1 test altered the execution environment: test_distutils 37 tests skipped: test_aepack test_al test_applesingle test_bsddb test_bsddb185 test_bsddb3 test_cd test_cl test_codecmaps_cn test_codecmaps_hk test_codecmaps_jp test_codecmaps_kr test_codecmaps_tw test_curses test_dl test_gl test_imageop test_imgfile test_kqueue test_linuxaudiodev test_macos test_macostools test_msilib test_ossaudiodev test_scriptpackages test_smtpnet test_socketserver test_startfile test_sunaudiodev test_timeout test_tk test_ttk_guionly test_urllib2net test_urllibnet test_winreg test_winsound test_zipfile64 2 skips unexpected on linux2: test_bsddb test_bsddb3 test test_cmath failed -- Traceback (most recent call last): File "/usr/local/src/pysrc/Python-2.7.3/Lib/test/test_cmath.py", line 352, in test_specific_values msg=error_message) File "/usr/local/src/pysrc/Python-2.7.3/Lib/test/test_cmath.py", line 94, in rAssertAlmostEqual 'got {!r}'.format(a, b)) AssertionError: acos0000: acos(complex(0.0, 0.0)) Expected: complex(1.5707963267948966, -0.0) Received: complex(1.5707963267948966, 0.0) Received value insufficiently close to expected value. test test_gdb failed -- Traceback (most recent call last): File "/usr/local/src/pysrc/Python-2.7.3/Lib/test/test_gdb.py", line 639, in test_up_at_top cmds_after_breakpoint=['py-up'] * 4) File "/usr/local/src/pysrc/Python-2.7.3/Lib/test/test_gdb.py", line 146, in get_stack_trace self.assertEqual(err, '') AssertionError: 'Traceback (most recent call last):\n File "/usr/local/src/pysrc/Python-2.7.3/python-gdb.py", line 1367, in invoke\n move_in_stack(move_up=True)\n File "/usr/local/src/pysrc/Python-2.7.3/python-gdb.py", line 1347, in move_in_stack\n iter_frame.print_summary()\n File "/usr/local/src/pysrc/Python-2.7.3/python-gdb.py", line 1255, in print_summary\n line = pyop.current_line()\nAttributeError: \'PyIntObjectPtr\' object has no attribute \'current_line\'\nError occurred in Python command: \'PyIntObjectPtr\' object has no attribute \'current_line\'\nTraceback (most recent call last):\n File "/usr/local/src/pysrc/Python-2.7.3/python-gdb.py", line 1367, in invoke\n move_in_stack(move_up=True)\n File "/usr/local/src/pysrc/Python-2.7.3/python-gdb.py", line 1347, in move_in_stack\n iter_frame.print_summary()\n File "/usr/local/src/pysrc/Python-2.7.3/python-gdb.py", line 1255, in print_summary\n line = pyop.current_line()\nAttributeError: \'PyIntObjectPtr\' object has no attribute \'current_line\'\nError occurred in Python command: \'PyIntObjectPtr\' object has no attribute \'current_line\'\nTraceback (most recent call last):\n File "/usr/local/src/pysrc/Python-2.7.3/python-gdb.py", line 1367, in invoke\n move_in_stack(move_up=True)\n File "/usr/local/src/pysrc/Python-2.7.3/python-gdb.py", line 1347, in move_in_stack\n iter_frame.print_summary()\n File "/usr/local/src/pysrc/Python-2.7.3/python-gdb.py", line 1255, in print_summary\n line = pyop.current_line()\nAttributeError: \'PyIntObjectPtr\' object has no attribute \'current_line\'\nError occurred in Python command: \'PyIntObjectPtr\' object has no attribute \'current_line\'\n' != '' ******** Next attempt:- Gonna try with: --enable-curses, --enable-audio, --enable-network and --enable-ipv6. May as well do that now... added above switches to configure command. Also, switched -shared-intel for -static-intel, to compare benchmark times. This seems to hardly impact performance or file size... CFLAGS="-O3 -fomit-frame-pointer -ipo -static-intel -fpic -prof-use -prof-dir $PWD/PGO -fp-model strict -no-prec-div -xHost -fomit-frame-pointer" \ ./configure --with-libm="-limf" --with-libc="-lirc" --with-signal-module --with-cxx-main="icpc" --without-gcc --enable-curses --enable-ipv6 --enable-network --enable-audio --enable-gui --build=x86_64-linux-intel ** Test results ** This time I ran regrtest.py manually, to enable the networking and audio tests in particular:- metabuntu:Python-2.7.3> ./python Lib/test/regrtest.py -uall test_linuxaudiodev just hung, even after killing processes (pulseaudio) which were using /dev/dsp, so I added 'test_linuxaudiodev' to NOTTESTS in Lib/test/regrtest.py 361 tests OK. 3 tests failed: test_cmath test_gdb test_ossaudiodev 1 test altered the execution environment: test_distutils 23 tests skipped: test_aepack test_al test_applesingle test_bsddb test_bsddb185 test_bsddb3 test_cd test_cl test_dl test_gl test_imageop test_imgfile test_kqueue test_macos test_macostools test_msilib test_py3kwarn test_scriptpackages test_startfile test_sunaudiodev test_winreg test_winsound test_zipfile64 2 skips unexpected on linux2: test_bsddb test_bsddb3 I use ALSA over OSS, so I'm not really concerned about the OSS test failing. Running test_linuxaudiodev manually hangs on test_play_sound_file. Tested with pulseaudio running, and not running. Test can only be terminated with KILL signal. Adding some print statements to the test script, I found the test hangs on `self.dev.write(data)`. Not sure why though..? ** New benchmark ** metabuntu:benchmarks> python perf.py -r -b apps /usr/bin/python ../Python-2.7.3/python Report on Linux metabuntu 3.0.0-19-server #32-Ubuntu SMP Thu Apr 5 20:05:13 UTC 2012 x86_64 x86_64 Total CPU cores: 12 ### 2to3 ### Min: 6.524408 -> 6.316394: 1.03x faster Avg: 6.611613 -> 6.392400: 1.03x faster Significant (t=5.05) Stddev: 0.06477 -> 0.07228: 1.1159x larger Timeline: http://tinyurl.com/bub35l9 ### html5lib ### Min: 7.916494 -> 7.212451: 1.10x faster Avg: 8.025302 -> 7.304856: 1.10x faster Significant (t=17.53) Stddev: 0.07606 -> 0.10539: 1.3856x larger Timeline: http://tinyurl.com/dy7296k ### rietveld ### Min: 0.291469 -> 0.272601: 1.07x faster Avg: 0.302746 -> 0.280126: 1.08x faster Significant (t=15.86) Stddev: 0.01126 -> 0.00874: 1.2885x smaller Timeline: http://tinyurl.com/c5ys4bt ### spambayes ### Min: 0.145370 -> 0.138528: 1.05x faster Avg: 0.146689 -> 0.141168: 1.04x faster Significant (t=11.27) Stddev: 0.00147 -> 0.00468: 3.1885x larger Timeline: http://tinyurl.com/d8rrp6g ** Relevant Environment Variables. (Maybe there's more, maybe less) ** N.B. I have all Intel stuff installed in /usr/intel. The default is /opt/intel though. LIBRARY_PATH=/usr/intel/composer_xe_2011_sp1.9.293/tbb/lib/intel64//cc4.1.0_libc2.4_kernel2.6.16.21:/usr/intel/composer_xe_2011_sp1.9.293/compiler/lib/intel64:/usr/intel/composer_xe_2011_sp1.9.293/ipp/../compiler/lib/intel64:/usr/intel/composer_xe_2011_sp1.9.293/ipp/lib/intel64:/usr/intel/composer_xe_2011_sp1.9.293/compiler/lib/intel64:/usr/intel/composer_xe_2011_sp1.9.293/mkl/lib/intel64:/usr/intel/composer_xe_2011_sp1.9.293/tbb/lib/intel64//cc4.1.0_libc2.4_kernel2.6.16.21:/usr/intel/composer_xe_2011_sp1.9.293/tbb/lib/intel64//cc4.1.0_libc2.4_kernel2.6.16.21:/usr/intel/composer_xe_2011_sp1.9.293/compiler/lib/intel64:/usr/intel/composer_xe_2011_sp1.9.293/ipp/../compiler/lib/intel64:/usr/intel/composer_xe_2011_sp1.9.293/ipp/lib/intel64:/usr/intel/composer_xe_2011_sp1.9.293/compiler/lib/intel64:/usr/intel/composer_xe_2011_sp1.9.293/mkl/lib/intel64:/usr/intel/composer_xe_2011_sp1.9.293/tbb/lib/intel64//cc4.1.0_libc2.4_kernel2.6.16.21 LD_LIBRARY_PATH=/usr/intel/composer_xe_2011_sp1.9.293/tbb/lib/intel64//cc4.1.0_libc2.4_kernel2.6.16.21:/usr/intel/impi/4.0.3.008/ia32/lib:/usr/intel/impi/4.0.3.008/intel64/lib:/usr/intel/composer_xe_2011_sp1.9.293/compiler/lib/intel64:/usr/intel/composer_xe_2011_sp1.9.293/ipp/../compiler/lib/intel64:/usr/intel/composer_xe_2011_sp1.9.293/ipp/lib/intel64:/usr/intel/composer_xe_2011_sp1.9.293/compiler/lib/intel64:/usr/intel/composer_xe_2011_sp1.9.293/mkl/lib/intel64:/usr/intel/composer_xe_2011_sp1.9.293/tbb/lib/intel64//cc4.1.0_libc2.4_kernel2.6.16.21:/usr/local/lib/boost:/biol/arb/lib:/lib64:/usr/lib64:/usr/local/lib:/usr/local/cuda/lib64:/usr/local/cuda/lib:/usr/intel/composer_xe_2011_sp1.9.293/debugger/lib/intel64:/usr/intel/composer_xe_2011_sp1.9.293/mpirt/lib/intel64 CPATH=/usr/intel/composer_xe_2011_sp1.9.293/tbb/include:/usr/intel/composer_xe_2011_sp1.9.293/mkl/include:/usr/intel/composer_xe_2011_sp1.9.293/tbb/include:/usr/intel/composer_xe_2011_sp1.9.293/tbb/include:/usr/intel/composer_xe_2011_sp1.9.293/mkl/include:/usr/intel/composer_xe_2011_sp1.9.293/tbb/include CPP=icc -E PATH=/usr/intel/impi/4.0.3.008/ia32/bin:/usr/intel/impi/4.0.3.008/intel64/bin:/usr/intel/composer_xe_2011_sp1.9.293/bin/intel64:/usr/intel/impi/4.0.3.008/ia32/bin:/usr/intel/composer_xe_2011_sp1.9.293/bin/intel64:/home/albl500/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/games:/usr/intel/bin:/usr/intel/composer_xe_2011_sp1.9.293/mpirt/bin/intel64:/home/albl500/SDKs/android-sdk-linux/tools:/biol/bin:/biol/arb/bin:/usr/local/cuda/bin:/home/albl500/bin:/usr/intel/composer_xe_2011_sp1.9.293/mpirt/bin/intel64 LD=xild CXX=icpc CC=icc ** Summary ** All seems okay to me, although using -no-prev-div (over IEEE floating point arithmetic) is slightly concerning, as I occasionally require very accurate floating point arithmetic. I try to use numpy as much as possible for math operations though, so maybe it's not concerning at all... "-no-prec-div" and "-fp-model strict" I think are required to pass various math tests though. Otherwise, I got errors about extremely small floats (<10^-300) being unequal to 0. I hope this proves useful for anyone else trying to compile an optimised Python for an Intel system. Cheers, Alex -- Alex Leach BSc. MRes. Department of Biology University of York York YO10 5DD United Kingdom www: http://bioltfws1.york.ac.uk/~albl500 EMAIL DISCLAIMER: http://www.york.ac.uk/docs/disclaimer/email.htm
participants (2)
-
Alex Leach
-
Stefan Krah