Proposed fix for MKL and dynamic loading

(Apologies if this has been fixed in trunk; I base this on 1.4.0 and no related comments of MKL on the mailing list) I finally got the latest version of MKL working. What appears to have changed is that the MKL shared libraries will themselves dynamically load different other libraries, depending on the detected CPU. This is in some ways great news for me, because it means I can avoid worrying about miscompiles when compiling one single version of NumPy/SciPy to use for our heterogenous cluster. So I'd rather *not* link statically [1]. Anyway, after modifying site.cfg [2], things almost work, but not quite. The problem is that Python by default imports shared libs using RTLD_LOCAL. With this patch to NumPy it does: Change in numpy/linalg/linalg.py: from numpy.linalg import lapack_lite to: try: import sys import ctypes _old_rtld = sys.getdlopenflags() sys.setdlopenflags(_old_rtld|ctypes.RTLD_GLOBAL) from numpy.linalg import lapack_lite finally: sys.setdlopenflags(_old_rtld) del sys; del ctypes; del _old_rtld Questions: a) Should I submit a patch? b) Negative consequences? Perhaps another Python module can now not load a different BLAS implementation? (That still seems better than not being able to use MKL IMO). c) Should this only be enabled by a flag somewhere? Where? Or can one just do it regardless of BLAS? d) Do I need a "if hasattr" for Windows, or will Windows just ignore it, or does this apply to Windows too? [1] BTW, I could not figure out how to link statically if I wanted -- is "search_static_first = 1" supposed to work? Perhaps MKL will insist on loading some parts dynamically even then *shrug*. Dag Sverre

Dag Sverre Seljebotn wrote:
(Apologies if this has been fixed in trunk; I base this on 1.4.0 and no related comments of MKL on the mailing list)
I finally got the latest version of MKL working. What appears to have changed is that the MKL shared libraries will themselves dynamically load different other libraries, depending on the detected CPU.
This is in some ways great news for me, because it means I can avoid worrying about miscompiles when compiling one single version of NumPy/SciPy to use for our heterogenous cluster. So I'd rather *not* link statically [1].
Anyway, after modifying site.cfg [2], things almost work, but not quite. The problem is that Python by default imports shared libs using RTLD_LOCAL. With this patch to NumPy it does:
Change in numpy/linalg/linalg.py:
from numpy.linalg import lapack_lite
to:
try: import sys import ctypes _old_rtld = sys.getdlopenflags() sys.setdlopenflags(_old_rtld|ctypes.RTLD_GLOBAL) from numpy.linalg import lapack_lite finally: sys.setdlopenflags(_old_rtld) del sys; del ctypes; del _old_rtld
Questions:
a) Should I submit a patch? b) Negative consequences? Perhaps another Python module can now not load a different BLAS implementation? (That still seems better than not being able to use MKL IMO). c) Should this only be enabled by a flag somewhere? Where? Or can one just do it regardless of BLAS? d) Do I need a "if hasattr" for Windows, or will Windows just ignore it, or does this apply to Windows too?
[1] BTW, I could not figure out how to link statically if I wanted -- is "search_static_first = 1" supposed to work? Perhaps MKL will insist on loading some parts dynamically even then *shrug*.
Forgot this: [2] Here's my site.cfg: [mkl] library_dirs=/mn/corcaroli/d1/dagss/intel/mkl/10.2.3.029/lib/em64t include_dirs = /mn/corcaroli/d1/dagss/intel/mkl/10.2.3.029/include lapack_libs = mkl_lapack mkl_libs = mkl_intel_lp64, mkl_intel_thread, mkl_core, iomp5 Then I need to set LD_LIBRARY_PATH as well prior to running (which I'm quite OK with). Dag Sverre

try: import sys import ctypes _old_rtld = sys.getdlopenflags() sys.setdlopenflags(_old_rtld|ctypes.RTLD_GLOBAL) from numpy.linalg import lapack_lite finally: sys.setdlopenflags(_old_rtld) del sys; del ctypes; del _old_rtld
This also applies to scipy code that relies on BLAS as well. Lisandra Dalcin gave me a tip that is close to this one some months ago (http://matt.eifelle.com/2008/11/03/i-used-the-latest-mkl-with-numpy-and.../). The best official solution is to statically link against the MKL with Python. Matthieu -- Information System Engineer, Ph.D. Blog: http://matt.eifelle.com LinkedIn: http://www.linkedin.com/in/matthieubrucher

Matthieu Brucher wrote:
try: import sys import ctypes _old_rtld = sys.getdlopenflags() sys.setdlopenflags(_old_rtld|ctypes.RTLD_GLOBAL) from numpy.linalg import lapack_lite finally: sys.setdlopenflags(_old_rtld) del sys; del ctypes; del _old_rtld
This also applies to scipy code that relies on BLAS as well. Lisandra Dalcin gave me a tip that is close to this one some months ago (http://matt.eifelle.com/2008/11/03/i-used-the-latest-mkl-with-numpy-and.../). The best official solution is to statically link against the MKL with Python.
IIUC, it should be enough to load the .so-s in GLOBAL mode once. So it is probably enough to ensure NumPy is patched in a way so that SciPy loads NumPy which loads the .so-s in GLOBAL mode, so that a seperate patch for SciPy is not necesarry. (Remains to be tried, I'm moving on to building SciPy now.) As for static linking, do you mean linking MKL into the Python interpreter itself? Or statically linking with NumPy? In the former case....well, even if the above solution is a not-officially-supported hack, I'd prefer that to messing with the Python build as long as it actually works, which it seems to...requiring custom Python builds for MKL support is not something one should do if one could avoid it. (I build my own Python anyway, but I suppose many potential NumPy/MKL users don't.) Dag Sverre

2010/1/21 Dag Sverre Seljebotn <dagss@student.matnat.uio.no>:
Matthieu Brucher wrote:
try: import sys import ctypes _old_rtld = sys.getdlopenflags() sys.setdlopenflags(_old_rtld|ctypes.RTLD_GLOBAL) from numpy.linalg import lapack_lite finally: sys.setdlopenflags(_old_rtld) del sys; del ctypes; del _old_rtld
This also applies to scipy code that relies on BLAS as well. Lisandra Dalcin gave me a tip that is close to this one some months ago (http://matt.eifelle.com/2008/11/03/i-used-the-latest-mkl-with-numpy-and.../). The best official solution is to statically link against the MKL with Python.
IIUC, it should be enough to load the .so-s in GLOBAL mode once. So it is probably enough to ensure NumPy is patched in a way so that SciPy loads NumPy which loads the .so-s in GLOBAL mode, so that a seperate patch for SciPy is not necesarry. (Remains to be tried, I'm moving on to building SciPy now.)
Indeed, it should be enough.
As for static linking, do you mean linking MKL into the Python interpreter itself? Or statically linking with NumPy?
statically linking with numpy. This is what was advised to me by Intel. Matthieu -- Information System Engineer, Ph.D. Blog: http://matt.eifelle.com LinkedIn: http://www.linkedin.com/in/matthieubrucher

Matthieu Brucher wrote:
2010/1/21 Dag Sverre Seljebotn <dagss@student.matnat.uio.no>:
Matthieu Brucher wrote:
try: import sys import ctypes _old_rtld = sys.getdlopenflags() sys.setdlopenflags(_old_rtld|ctypes.RTLD_GLOBAL) from numpy.linalg import lapack_lite finally: sys.setdlopenflags(_old_rtld) del sys; del ctypes; del _old_rtld
This also applies to scipy code that relies on BLAS as well. Lisandra Dalcin gave me a tip that is close to this one some months ago (http://matt.eifelle.com/2008/11/03/i-used-the-latest-mkl-with-numpy-and.../). The best official solution is to statically link against the MKL with Python.
IIUC, it should be enough to load the .so-s in GLOBAL mode once. So it is probably enough to ensure NumPy is patched in a way so that SciPy loads NumPy which loads the .so-s in GLOBAL mode, so that a seperate patch for SciPy is not necesarry. (Remains to be tried, I'm moving on to building SciPy now.)
Indeed, it should be enough.
As for static linking, do you mean linking MKL into the Python interpreter itself? Or statically linking with NumPy?
statically linking with numpy. This is what was advised to me by Intel.
Somehow I didn't manage to do that. a) search_static_first does not seem to work for me b) moving the .so's out of the way does manage something, but mkl_lapack only exists in .so form. Moving only that back in still didn't work. In the end I stopped playing, even more as RTLD_GLOBAL seems a superior solution, even if Intel isn't willing to directly support it... Dag Sverre

Dag Sverre Seljebotn wrote:
Questions:
a) Should I submit a patch? b) Negative consequences? Perhaps another Python module can now not load a different BLAS implementation? (That still seems better than not being able to use MKL IMO).
Besides the problem of ctypes not always being available, I am very wary of those library-specific hacks. Worse, it is version dependent, because it depends on the MKL.
d) Do I need a "if hasattr" for Windows, or will Windows just ignore it, or does this apply to Windows too?
Windows does not have dlopen, and has totally different semantics for dynamic loading. Besides, this is not needed on windows. So it should not be executed at all.
[1] BTW, I could not figure out how to link statically if I wanted -- is "search_static_first = 1" supposed to work? Perhaps MKL will insist on loading some parts dynamically even then *shrug*.
search_static_first is inherently fragile - using the linker to do this is much better (with -WL,-Bshared/-Wl,-Bstatic flags). cheers, David

David Cournapeau wrote:
Dag Sverre Seljebotn wrote:
Questions:
a) Should I submit a patch? b) Negative consequences? Perhaps another Python module can now not load a different BLAS implementation? (That still seems better than not being able to use MKL IMO).
Besides the problem of ctypes not always being available, I am very wary of those library-specific hacks. Worse, it is version dependent, because it depends on the MKL.
I was thinking that this was perhaps a general problem -- that *if* ATLAS started implementing support for dynamically switchable kernels at load time (which is a feature I certainly wish for), it would suffer the same problems. But I don't really know that. DLFCN can be used instead of ctypes. Which I think is not always available either, but "except ImportError: pass" should be fine in this kind of situation -- if you need the workaround you'd typically have it. The only real issue I can see is if it has a significant impact on import times for non-MKL users. But I won't put up a big fight for this kind patch -- I can work around it for my own purposes. I just though it might be nice to make things easier/more transparent for NumPy/MKL users.
[1] BTW, I could not figure out how to link statically if I wanted -- is "search_static_first = 1" supposed to work? Perhaps MKL will insist on loading some parts dynamically even then *shrug*.
search_static_first is inherently fragile - using the linker to do this is much better (with -WL,-Bshared/-Wl,-Bstatic flags).
Thanks! (I'll do that if I get any problems, but I have 3-4 other libs depending on BLAS as well loaded, so shared is better in principle.) Dag Sverre

[1] BTW, I could not figure out how to link statically if I wanted -- is "search_static_first = 1" supposed to work? Perhaps MKL will insist on loading some parts dynamically even then *shrug*.
search_static_first is inherently fragile - using the linker to do this is much better (with -WL,-Bshared/-Wl,-Bstatic flags).
How do you write the site.cfg accordingly? Matthieu -- Information System Engineer, Ph.D. Blog: http://matt.eifelle.com LinkedIn: http://www.linkedin.com/in/matthieubrucher
participants (3)
-
Dag Sverre Seljebotn
-
David Cournapeau
-
Matthieu Brucher