Compiling numpy using icc gets missing library error
import numpy Traceback (most recent call last): File "<stdin>", line 1, in <module> File "/home/hoytak/sysroot/lib/python2.7/site-packages/numpy/__init__.py",
Hello, I've been trying out intel's compiler with python and so far things have been going pretty well. I managed to compile python 2.7.1 with icc (with help from http://software.intel.com/en-us/forums/showthread.php?t=56652), so distutils automatically defaults to the icc compiler. So far a number of packages work great, so the overall system seems to work fine. However, I'm running into a few problems compiling numpy. I'm using the latest development build from git (rev ac3cba). The first is that some typing for long double isn't getting recognized -- building numpy breaks with the following error (I'll also post a bug report on this if requested). Traceback (most recent call last): File "setup.py", line 201, in <module> setup_package() File "setup.py", line 194, in setup_package configuration=configuration ) File "/home/hoytak/sysroot/src/numpy/numpy/distutils/core.py", line 186, in setup return old_setup(**new_attr) File "/home/hoytak/sysroot/lib/python2.7/distutils/core.py", line 152, in setup dist.run_commands() File "/home/hoytak/sysroot/lib/python2.7/distutils/dist.py", line 953, in run_commands self.run_command(cmd) File "/home/hoytak/sysroot/lib/python2.7/distutils/dist.py", line 972, in run_command cmd_obj.run() File "/home/hoytak/sysroot/src/numpy/numpy/distutils/command/build.py", line 37, in run old_build.run(self) File "/home/hoytak/sysroot/lib/python2.7/distutils/command/build.py", line 127, in run self.run_command(cmd_name) File "/home/hoytak/sysroot/lib/python2.7/distutils/cmd.py", line 326, in run_command self.distribution.run_command(command) File "/home/hoytak/sysroot/lib/python2.7/distutils/dist.py", line 972, in run_command cmd_obj.run() File "/home/hoytak/sysroot/src/numpy/numpy/distutils/command/build_src.py", line 152, in run self.build_sources() File "/home/hoytak/sysroot/src/numpy/numpy/distutils/command/build_src.py", line 169, in build_sources self.build_extension_sources(ext) File "/home/hoytak/sysroot/src/numpy/numpy/distutils/command/build_src.py", line 328, in build_extension_sources sources = self.generate_sources(sources, ext) File "/home/hoytak/sysroot/src/numpy/numpy/distutils/command/build_src.py", line 385, in generate_sources source = func(extension, build_dir) File "numpy/core/setup.py", line 442, in generate_config_h rep = check_long_double_representation(config_cmd) File "numpy/core/setup_common.py", line 136, in check_long_double_representation type = long_double_representation(pyod(object)) File "numpy/core/setup_common.py", line 280, in long_double_representation raise ValueError("Could not lock sequences (%s)" % saw) ValueError: Could not lock sequences (None) This happens regardless of whether I use the --compiler=intel option. I'm using one of the new sandy bridge processors (i7-2600k), which may be part of the reason. Looking at the code, it seems this comes from a failure to detect the type info from a particular string. I've attached a dump of the lines variable passed to the long_double_representation function. As a workaround, I set it manually to return 'INTEL_EXTENDED_12_BYTES_LE' with the idea that I could test this using the unit tests later on. When I did that, everything else compiled fine and numpy installed. However, attempting to import numpy gives me this: line 137, in <module> import add_newdocs File "/home/hoytak/sysroot/lib/python2.7/site-packages/numpy/add_newdocs.py", line 9, in <module> from numpy.lib import add_newdoc File "/home/hoytak/sysroot/lib/python2.7/site-packages/numpy/lib/__init__.py", line 4, in <module> from type_check import * File "/home/hoytak/sysroot/lib/python2.7/site-packages/numpy/lib/type_check.py", line 8, in <module> import numpy.core.numeric as _nx File "/home/hoytak/sysroot/lib/python2.7/site-packages/numpy/core/__init__.py", line 5, in <module> import multiarray ImportError: /home/hoytak/sysroot/lib/python2.7/site-packages/numpy/core/multiarray.so: undefined symbol: npy_half_to_float In case it's relevant: $ ldd /home/hoytak/sysroot/lib/python2.7/site-packages/numpy/core/multiarray.so linux-vdso.so.1 => (0x00007fff60fff000) libpython2.7.so.1.0 => /home/hoytak/sysroot/lib/libpython2.7.so.1.0 (0x00007f5916dc3000) libimf.so => /usr/intel/composerxe-2011.2.137/composerxe-2011.2.137/compiler/lib/intel64/libimf.so (0x00007f59169df000) libsvml.so => /usr/intel/composerxe-2011.2.137/composerxe-2011.2.137/compiler/lib/intel64/libsvml.so (0x00007f5916337000) libm.so.6 => /lib/libm.so.6 (0x00007f5916093000) libgcc_s.so.1 => /lib/libgcc_s.so.1 (0x00007f5915e7c000) libintlc.so.5 => /usr/intel/composerxe-2011.2.137/composerxe-2011.2.137/compiler/lib/intel64/libintlc.so.5 (0x00007f5915d2d000) libpthread.so.0 => /lib/libpthread.so.0 (0x00007f5915b10000) libc.so.6 => /lib/libc.so.6 (0x00007f591578c000) libdl.so.2 => /lib/libdl.so.2 (0x00007f5915588000) libutil.so.1 => /lib/libutil.so.1 (0x00007f5915385000) /lib64/ld-linux-x86-64.so.2 (0x00007f591770c000) I also attached the build log here (which is the same for the first error up to the crash). Any suggestions of where to look or where something could have gone wrong? Thanks! -- Hoyt ++++++++++++++++++++++++++++++++++++++++++++++++ + Hoyt Koepke + University of Washington Department of Statistics + http://www.stat.washington.edu/~hoytak/ + hoytak@gmail.com ++++++++++++++++++++++++++++++++++++++++++
Okay, so here's a follow up on my progress. Apologies in advance for the long email here, but I'd like to be thorough about this before I forget. For the sake of completeness, here's my setup. I'm running python 2.7.1 compiled from source with icc. I'm running ubuntu 10.10 on one of intel's new processors (a i7-2600). The goal is to compile numpy and scipy both with intel's compiler and intel's mkl. I finally got numpy to compile with icc / ifort with pretty much all of the tests passing. It's a bit of work, partly cause I was trying to be an optimization junky, but I thought I'd share my discoveries. Scipy also compiles, but with some errors (which are likely due to me not configuring f2py correctly). First, I wanted to compile things with intel's interprocedural optimization enabled, and that seems to work, but only if -O2 is used for the compiling stage and -O1 is used for the linking stage. If -O3 is given for the compiling stage, then the einsum test goes into some sort of infinite loop and hangs. If -O2 or -O3 are given for the linker, then there are random other segfaults (I forget where). However, with these optimization levels, things are stable. Also, if I turn off -ipo, then -O3 works fine for compiling. I'm not sure if this reflects bugs in the flags I'm passing to the intel compiler or in icc/ifort itself. Second, to use -ipo, it's critical that xiar is used instead of ar to create object archives. This needed to be changed in fcompiler/intel.py and intelccompiler.py. I've attached a diff of these files that gives working options for me. I don't know if these options are set in the correct place or not, but perhaps they would be helpful: The essence of it is the following (from intelccompiler.py) linker_flags = '-O1 -ipo -openmp -lpthread -fno-alias -xHOST -fPIC ' compiler_opt_flags = '-static -ipo -xHOST -O2 -fPIC -DMKL_LP64 -mkl -wd188 -g -fno-alias ' icc_run_string = 'icc ' + compiler_opt_flags icpc_run_string = 'icpc ' + compiler_opt_flags linker_run_string = 'icc ' + linker_flags + ' -shared ' with the rest of this diff setting these options. In this case, the -openmp and -lpthread are required for linking with the threaded layer of the MKL. This could possibly be ripped out of there. Also, the -fno-alias is critical for the c compiler -- random segfaults and memory corruptions occur without it. The -DMKL_LP64 is to ensure proper linking with the lp64 (32 bit indices) part of mkl, instead of the ilp64 (64 bit indices). The latter isn't supported by the lapack_lite module -- things compile, but don't work. -mkl may or may not help things. For the fortran side, this was the compiler string: compiler_opt_flags = '-static -ipo -xHOST -fPIC -DMKL_LP64 -mkl -wd188 -g -fno-alias -O3' Here you don't need the -fno-alias and -O3 seems to work. Third, it was a bit of a pain to figure out how to get the linking/detection done correctly, as somehow order matters, and it was easy to get undefined symbols, runtime errors, etc. Very annoying. In the end, my site.cfg file looked like this: [DEFAULT] library_dirs=/usr/intel/current/mkl/lib/intel64 include_dirs=/usr/intel/current/mkl/include mkl_libs = mkl_rt, mkl_core, mkl_intel_thread, mkl_intel_lp64 blas_libs = mkl_blas95_lp64 lapack_libs = mkl_lapack95_lp64 [lapack_opt] library_dirs=/usr/intel/current/mkl/lib/intel64 include_dirs=/usr/intel/current/mkl/include/intel64/lp64 libraries = mkl_lapack95_lp64 [blas_opt] library_dirs = /usr/intel/current/mkl/lib/intel64 include_dirs = /usr/intel/current/mkl/include/intel64/lp64 libraries = mkl_blas95_lp64 where /usr/intel/current/ points to my intel install location. It's critical that the mkl_libs are given in that order. I didn't find another combination that worked. Finally, I attached my bash setup script for environment variables. I don't know how much of a role those play in things, but I had them in place when things started working, so I should put them here. Now, on to scipy. With all these options in place, scipy compiles fine. However, there are two problems, and these don't seem to go away at any optimization level. I'm looking for suggestions. I'm guessing it's some sort of configuration error. 1) The CloughTocher2DInterpolator segfaults every time it's called to interpret values. I couldn't manage to track it down -- it's in the cython code somewhere -- but I can give more details next time, I disabled it for now. 2) f2py isn't getting the interfaces right. When I run the test suite, I get about 250 errors, all of the form: ValueError: failed to create intent(cache|hide)|optional array-- must have defined dimensions but got (5,5,) and so on, with different tuples on the end. Other than these errors, everything seemed to work great. What might I be doing wrong there? Thanks! -- Hoyt ++++++++++++++++++++++++++++++++++++++++++++++++ + Hoyt Koepke + University of Washington Department of Statistics + http://www.stat.washington.edu/~hoytak/ + hoytak@gmail.com ++++++++++++++++++++++++++++++++++++++++++
Okay, last update. I finally have got everything to work. It turns out the problems that I had earlier with f2py were due to intel's -ipo flag. So the only place this flag works is with the C++ code, not fortran or c. Also, I forgot to mention -- the qhull_a.h method has a workaround for some aspect of intel's compiler that is no longer needed and in fact causes an error. It's for a macro that simply suppresses unused variable warnings. In my opinion, it could be removed, as it's only used two places, and scipy spits out enough warnings that that is hardly an issue. Thus my change was around line 102 in qhul_a.h. Replace #if defined(__INTEL_COMPILER) && !defined(QHULL_OS_WIN) template <typename T> inline void qhullUnused(T &x) { (void)x; } # define QHULL_UNUSED(x) qhullUnused(x); #else # define QHULL_UNUSED(x) (void)x; #endif with #define QHULL_UNUSED(x) Also, I could still not get the CloughTocher2DInterpolator to not segfault. Thus I had to disable it by raising an exception in the init method. With this in place, everything compiles and the unit tests pretty much all run, with 5 failures mostly due to numerical accuracy stuff and 9 errors due to the interpolator. In summary, my final environment variables that give the flags for compiling stuff are: export FLAGS='-xHOST -static -fPIC -g -fltconsistency' export CFLAGS="$FLAGS -O2 -fno-alias" export CPPFLAGS="$FLAGS -fno-alias -ipo -O3" export CXXFLAGS="$CPPFLAGS" export FFLAGS="$FLAGS -O3" export F77FLAGS="$FFLAGS" export F90FLAGS="$FFLAGS" export LDFLAGS="-xHOST -O1 -openmp -lpthread -fPIC" And the arguments given to the fortran compiler in fcompiler/intel.py are: compiler_opt_flags = '-static -xHOST -fPIC -DMKL_LP64 -mkl -g -O3' I'd be happy to answer any more questions about the process as needed. Now, back to my real work. -- Hoyt ++++++++++++++++++++++++++++++++++++++++++++++++ + Hoyt Koepke + University of Washington Department of Statistics + http://www.stat.washington.edu/~hoytak/ + hoytak@gmail.com ++++++++++++++++++++++++++++++++++++++++++
On Fri, 25 Mar 2011 13:46:41 -0700, Hoyt Koepke wrote: [clip]
Also, I could still not get the CloughTocher2DInterpolator to not segfault.
Backtrace would be useful here. It's probably best to recompile with "-O0" and some debug flags enabled in the compiler to get something reasonable out. -- Pauli Virtanen
Okay, even better news. After a little bit of probing around, it turns out that some of the errors, namely the one in the interpolator, were due to intel's handling of floating point values at higher optimization levels. Thus the two rather important flags, if you wish to avoid difficult-to-explain segfaults, are -fp-model strict ( disables unsafe fp optimizations) -fno-alias (C & C++, not fortran) Now here's a summary of the other errors: -ipo -- okay for linker / c++, but not elsewhere. For C, it causes the long_double_representation() function in setup_common() to fail (I'll note this on the corresponding ticket), and causes f2py to fail to generate correct wrappers. -O3 -- okay for C++ / fortran, but not C. For C, it causes einsum to hang. Thus the highest optimization levels I could find in which everything compiled and ran were: Fortran: -static -DMKL_LP64 -mkl -xHOST -O3 -fPIC -fp-model strict C: -static -DMKL_LP64 -mkl -xHOST -O2 -fno-alias -fPIC -fp-model strict C++: -static -DMKL_LP64 -mkl -xHOST -O3 -fno-alias -fPIC -fp-model strict Linker: -DMKL_LP64 -mkl -xHOST -O2 -fno-alias -fPIC -fp-model strict -openmp -lpthread Enjoy, --Hoyt On Fri, Mar 25, 2011 at 2:30 PM, Pauli Virtanen <pav@iki.fi> wrote:
On Fri, 25 Mar 2011 13:46:41 -0700, Hoyt Koepke wrote: [clip]
Also, I could still not get the CloughTocher2DInterpolator to not segfault.
Backtrace would be useful here. It's probably best to recompile with "-O0" and some debug flags enabled in the compiler to get something reasonable out.
-- Pauli Virtanen
_______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
-- ++++++++++++++++++++++++++++++++++++++++++++++++ + Hoyt Koepke + University of Washington Department of Statistics + http://www.stat.washington.edu/~hoytak/ + hoytak@gmail.com ++++++++++++++++++++++++++++++++++++++++++
Hi Hoyt, Thanks for the thorough description of getting everything to work. On Sat, Mar 26, 2011 at 12:14 AM, Hoyt Koepke <hoytak@stat.washington.edu> wrote:
Okay, even better news. After a little bit of probing around, it turns out that some of the errors, namely the one in the interpolator, were due to intel's handling of floating point values at higher optimization levels. Thus the two rather important flags, if you wish to avoid difficult-to-explain segfaults, are
-fp-model strict ( disables unsafe fp optimizations) -fno-alias (C & C++, not fortran)
Now here's a summary of the other errors:
-ipo -- okay for linker / c++, but not elsewhere. For C, it causes the long_double_representation() function in setup_common() to fail (I'll note this on the corresponding ticket), and causes f2py to fail to generate correct wrappers.
Would it be enough to put a comment in the intelccompiler.py code not to use the -ipo flag, or do you think there's something to fix here? Doesn't seem to be a new problem by the way: http://cens.ioc.ee/pipermail/f2py-users/2006-January/001229.html
-O3 -- okay for C++ / fortran, but not C. For C, it causes einsum to hang.
-O3 is the default optimization level, so this is a bug I guess. There's also another report in #1378 that -O3 doesn't work with numpy 1.4.0 (which does not have einsum). Should it be lowered to -O2 by default?
Thus the highest optimization levels I could find in which everything compiled and ran were:
Fortran: -static -DMKL_LP64 -mkl -xHOST -O3 -fPIC -fp-model strict C: -static -DMKL_LP64 -mkl -xHOST -O2 -fno-alias -fPIC -fp-model strict C++: -static -DMKL_LP64 -mkl -xHOST -O3 -fno-alias -fPIC -fp-model strict Linker: -DMKL_LP64 -mkl -xHOST -O2 -fno-alias -fPIC -fp-model strict -openmp -lpthread
I'm not sure which of those flags would be appropriate as a default in distutils, perhaps only fp-model-strict? If you could help put together a patch for numpy.distutils, that would be very helpful I think. The rest of your description could be put at http://scipy.org/Installing_SciPy/Linux. Ralf
Thanks for the thorough description of getting everything to work.
You're welcome. I'm glad people find it helpful :-).
-ipo -- okay for linker / c++, but not elsewhere. For C, it causes the long_double_representation() function in setup_common() to fail (I'll note this on the corresponding ticket), and causes f2py to fail to generate correct wrappers.
Would it be enough to put a comment in the intelccompiler.py code not to use the -ipo flag, or do you think there's something to fix here? Doesn't seem to be a new problem by the way: http://cens.ioc.ee/pipermail/f2py-users/2006-January/001229.html
Well, to be honest, it's debatable how this should be fixed, if at all. The issue is that the intermediate object files and archives are in their own format -- hence the need for xiar instead of ar -- and I suspect that that's what's messing up f2py and the long_double_representation() stuff (I don't actually know how either really works, so I could be wrong here). Reverse engineering those formats doesn't make sense. Now it's possible that there may be a better way of determining long_double_representation(), but I don't know. IMO it's not worth it.
-O3 -- okay for C++ / fortran, but not C. For C, it causes einsum to hang.
-O3 is the default optimization level, so this is a bug I guess. There's also another report in #1378 that -O3 doesn't work with numpy 1.4.0 (which does not have einsum). Should it be lowered to -O2 by default?
I'm not sure what's going on here. I think it is likely a bug in icc brought out by the really low-level code in einsum. However, the documentation said that the extra optimizations enabled by -O3 are likely not going to help except in large, predictable loops where more aggressive loop transformations are doable and help. Thus my vote is to put in O3 for C++ and fortran, but O2 on C (because of this bug). However, if the einsum bug gets fixed, nice! (BTW, I think it's also be possible to change per-file optimization settings with intel specific pragmas, but I don't know for sure. If so, that could be a workaround.)
Fortran: -static -DMKL_LP64 -mkl -xHOST -O3 -fPIC -fp-model strict C: -static -DMKL_LP64 -mkl -xHOST -O2 -fno-alias -fPIC -fp-model strict C++: -static -DMKL_LP64 -mkl -xHOST -O3 -fno-alias -fPIC -fp-model strict Linker: -DMKL_LP64 -mkl -xHOST -O2 -fno-alias -fPIC -fp-model strict -openmp -lpthread
I'm not sure which of those flags would be appropriate as a default in distutils, perhaps only fp-model-strict? If you could help put together a patch for numpy.distutils, that would be very helpful I think. The rest of your description could be put at http://scipy.org/Installing_SciPy/Linux.
-fp-model strict and -fno-alias, as it's analogous to the -fno-strict-aliasing gcc flag required by python. The -static flag is probably optional. -xHOST is the same as gcc's --march=native, so it seems like the nature of the distutils system isn't appropriate. I'll try to get a patch together for the distutils. However, I'd want someone to review it since I'm not that confident in my knowledge of the distutils code. I can also try to turn this into a more complete description for the wiki. -- Hoyt ++++++++++++++++++++++++++++++++++++++++++++++++ + Hoyt Koepke + University of Washington Department of Statistics + http://www.stat.washington.edu/~hoytak/ + hoytak@gmail.com ++++++++++++++++++++++++++++++++++++++++++
participants (3)
-
Hoyt Koepke
-
Pauli Virtanen
-
Ralf Gommers