Re: [Numpy-discussion] building NumPy with Intel CC & MKL (solved!)
Christian Marquardt <christian@marquardt.sc> [2007-01-24 11:09]:
I'll try to explain... I hope it's not too basic.
Christian, at this point you could explain that shoes are not interchangeable -- that they are built to be worn on the left foot or the right foot -- and I'd be grateful for the explanation. I've left much detail in what follows in the hope that the details may help someone who is also having trouble using the Intel MKL.
Python is searching for its modules along the PYTHONPATH, i.e. a list of directories where it expects to find whatever it needs. This is the same as the Unix shell (or the DOC command.com) is looking in the PATH in order to find programs or shell /batch scripts, or the dynamic loader is using LD_LIBRARY_PATH for finding shared libraries.
> import numpy > print numpy <module 'numpy' from '/usr/lib/python2.5/site-packages/numpy/__init__.pyc'>
What am I to make of this? Is it the rpm numpy or is it the numpy I built using the Intel compiler and MKL?
This tells from which directory your Python installation actually loaded numpy from: It used the numpy installed in the directory
/usr/lib/python2.5/site-packages/numpy
By *convention* (as someone already pointed out before), the /usr/lib/python2.5/site-packages is the directory where the original system versions of python packages should be installed. In particular, the rpm version will very likely install it's stuff there.
It did.
When installing additional python modules or packages via a command like
python setup.py install
the new packages will also be installed in that system directory. So if you have installed your Intel version of numpy with the above command, you might have overwritten the rpm stuff. There is a way to install in a different place; more on that below.
I'm 95% sure that command put numpy in /usr/local/lib/python25/site-packages It's possible I used --prefix= <something>, but I don't recall doing so.
You now probably want to find out if the numpy version in /usr/lib/... is the Intel one or the original rpm one. To do this, you can check if the MKL and Intel libraries are actually loaded by the shared libraries within the numpy installation. You can use the command ldd which shows which shared libraries are loaded by executables or other shared libraries. For example, in my installation, the command
ldd <wherever>/python2.5/site-packages/numpy/linalg/lapack_lite.so
gives the following output:
MEDEA /opt/apps/lib/python2.5/site-packages/numpy>ldd ./linalg/lapack_lite.so linux-gate.so.1 => (0xffffe000) libmkl_lapack32.so => /opt/intel/mkl/8.1/lib/32/libmkl_lapack32.so (0x40124000) libmkl_lapack64.so => /opt/intel/mkl/8.1/lib/32/libmkl_lapack64.so (0x403c8000) libmkl.so => /opt/intel/mkl/8.1/lib/32/libmkl.so (0x40692000) libvml.so => /opt/intel/mkl/8.1/lib/32/libvml.so (0x406f3000) libguide.so => /opt/intel/mkl/8.1/lib/32/libguide.so (0x4072c000) libpthread.so.0 => /lib/tls/libpthread.so.0 (0x40785000) libimf.so => /opt/intel/fc/9.1/lib/libimf.so (0x40797000) libm.so.6 => /lib/tls/libm.so.6 (0x409d5000) libgcc_s.so.1 => /lib/libgcc_s.so.1 (0x409f8000) libirc.so => /opt/intel/fc/9.1/lib/libirc.so (0x40a00000) libc.so.6 => /lib/tls/libc.so.6 (0x40a41000) libdl.so.2 => /lib/libdl.so.2 (0x40b5b000) /lib/ld-linux.so.2 (0x80000000)
Note that the MKL libraries are referenced at the beginning - just look at the path names! If the output for your lapack_lite.so also contains references to the MKL libs, you've got the Intel version in /usr/lib/python2.5/.... (and have probably overwritten the rpm version). If you do not get any reference to the MKL stuff, it's still the rpm version which does not use the MKL.
Now, let's assume that you have the rpm version in /usr/lib/python2.5/.... Maybe you'll want to reinstall the rpm to be sure that this is the case.
You now want to a) install your Intel version in some well-defined place, and b) make sure that your Python picks that version up when importing numpy.
To achieve a) one way is to reinstall numpy from the source as before, BUT with
python setup.py --prefix=<somewhere> ^^^^^^^^^^^^^^^^^^^^^
<somewhere> is the path to some directory, e.g.
python setup.py install --prefix=$HOME
The latter would install numpy into the directory
$HOME/lib/python2.5/site-packages/numpy
Do an ls afterwards to check if numpy really arrived there. Instead of using the environment variable HOME, you can of course also any other directory you like. I'll stick to HOME in the following.
For b), we have to tell python that modules are waiting for it to be picked up in $HOME/lib/python2.5/site-packages. You do that by setting the environment variable PYTHONPATH, as was also mentioned in this thread. In our example, you would do (for a bash or ksh)
export PYTHONPATH=$HOME/lib/python2.5/site-packages
As long as this variable is set and exported (i.e., visible in the environment of every program you start), the next instance of Python you'll start will now begin searching for modules in PYTHONPATH whenever you do an import, and only fall back to the ones in the system wide installation if it doesn't find the required module in PYTHONPATH.
So, after having set PYTHONPATH in your environment, start up python and import numpy. Do the 'print numpy' within python again and look at the output. Does it point to the installation directory of your Intel version? Great; you're done. If not, this means that something went wrong. It might be that you had a typo in the export command or the directory name; it might mean that you didn't export the PYTHONPATH before running python; it might be that the installation had failed for some reason. You just have to play around a bit and see what's going on... but it's not difficult.
It is when one cannot recall what one did yesterday. :( That's an overstatement, but my recall is becoming unreliable.
Now that you have two versions of numpy, you can (kind of) switch between them by making use of the PYTHONPATH. If you unset it ('unset PYTHONPATH'), the next python session you are starting in the same shell/window will use the original system version. Setting PYTHONPATH again and having it point to your local site-packages directory activates the stuff you've installed in there. You cannot switch between the two numpy versions in the same session); if you want to try the other, you'll have to start a new python and make sure that the PYTHONPATH is set up appropriately for what you want.
In the long run, and if you have decided which version to use, you can export PYTHONPATH in your $HOME/.profile and don't have to do that manually each time (which becomes quite cumbersome after a while, of course).
Common practice is probably that you install your favourite versions or builds of python modules in one place (i.e. using $HOME as --prefix), and set PYTHONPATH accordingly. It's not a good idea to overwrite the system wide installations, but again - that's purely a convention, nothing more.
Hope this helps a bit... Good luck!
Thank you for taking the time to write such a detailed explanation. If only the documentation were so detailed... I looked in /usr/lib/python2.5/site-packages/numpy and it was not obvious whether the rpm version is there or the version I compiled. So I did a 'find' from the /: find . -name "ctypeslib*" One of the results was: ./usr/local/lib/python2.5/site-packages/numpy/ctypeslib.py So the python setup.py command defaulted to /usr/local/... (a Good Thing, IMHO). I did: export PYTHONPATH=/usr/local/lib/python2.5/site-packages python Python 2.5 (r25:51908, Nov 27 2006, 19:14:46) [GCC 4.1.2 20061115 (prerelease) (SUSE Linux)] on linux2 Type "help", "copyright", "credits" or "license" for more information.
import numpy Traceback (most recent call last): File "<stdin>", line 1, in <module> File "/usr/local/lib/python2.5/site-packages/numpy/__init__.py", line 36, in <module> import core File "/usr/local/lib/python2.5/site-packages/numpy/core/__init__.py", line 5, in <module> import multiarray ImportError: libsvml.so: cannot open shared object file: No such file or directory
So I checked: ~> ldd /usr/lib/python2.5/site-packages/numpy/linalg/lapack_lite.so linux-gate.so.1 => (0xffffe000) libpthread.so.0 => /lib/libpthread.so.0 (0xb7cc4000) libc.so.6 => /lib/libc.so.6 (0xb7b96000) /lib/ld-linux.so.2 (0x80000000) ~> ldd /usr/local/lib/python2.5/site-packages/numpy/linalg/lapack_lite.so linux-gate.so.1 => (0xffffe000) libmkl_lapack32.so => /opt/intel/mkl/8.1/lib/32/libmkl_lapack32.so (0xb7bd1000) libmkl_lapack64.so => /opt/intel/mkl/8.1/lib/32/libmkl_lapack64.so (0xb7907000) libmkl.so => /opt/intel/mkl/8.1/lib/32/libmkl.so (0xb78a6000) libvml.so => /opt/intel/mkl/8.1/lib/32/libvml.so (0xb786d000) libpthread.so.0 => /lib/libpthread.so.0 (0xb7830000) libsvml.so => not found libimf.so => not found libm.so.6 => /lib/libm.so.6 (0xb780a000) libgcc_s.so.1 => /lib/libgcc_s.so.1 (0xb77fe000) libirc.so => not found libc.so.6 => /lib/libc.so.6 (0xb76cf000) libdl.so.2 => /lib/libdl.so.2 (0xb76cb000) libguide.so => /opt/intel/mkl/8.1/lib/32/libguide.so (0xb7696000) /lib/ld-linux.so.2 (0x80000000) At this point my fading brain managed to recall that a 'source' command had to be issued to use the Intel compiler (icc). ~> source /opt/intel/cc/9.1.042/bin/iccvars.sh ~> python Python 2.5 (r25:51908, Nov 27 2006, 19:14:46) [GCC 4.1.2 20061115 (prerelease) (SUSE Linux)] on linux2 Type "help", "copyright", "credits" or "license" for more information.
import numpy print numpy <module 'numpy' from '/usr/local/lib/python2.5/site-packages/numpy/__init__.pyc'>
Ah, FINALLY! So did a speed test with a little Monte Carlo program I wrote: ''' A program that uses Monte Carlo to estimate how often the number of rare events with a Poisson distribution will differ by a given amount. ''' import numpy as n from numpy.random import poisson from time import time lam = 4.0 # mu & var for Poisson distributed rands (they are equal in Poisson) N = 10 #number of times to run the program maxNumEvents = 20 #events larger than this are ignored numPois = 100000 #number of pairs of outcomes to generate freqA = 2 #number of times event A occurred freqB = 6 #number of times event B occurred print "#rands fraction [freqA,freqB] fraction [lam,lam] largest% total[mean,mean]" t0 = time() for g in range(1): for h in range(N): suma = n.zeros((maxNumEvents+1,maxNumEvents+1), int) #possible outcomes array count = poisson(lam, size =(numPois,2)) #generate array of pairs of Poissons for i in range(numPois): #if count[i,0] > maxNumEvents: continue #if count[i,1] > maxNumEvents: continue suma[count[i,0],count[i,1]] += 1 d = n.sum(suma) print d, float(suma[freqA,freqB])/d, float(suma[lam,lam])/d , suma.max(), suma[lam,lam] print 'time', time()-t0 Using the SUSE rpm: python relative_risk.py #rands fraction [2,6] fraction [lam,lam] largest% total[mean,mean] 100000 0.01539 0.03869 3869 3869 100000 0.01534 0.03766 3907 3766 100000 0.01553 0.03841 3859 3841 100000 0.01496 0.03943 3943 3943 100000 0.01513 0.03829 3856 3829 100000 0.01485 0.03825 3993 3825 100000 0.01545 0.03716 3859 3716 100000 0.01526 0.03909 3919 3909 100000 0.01491 0.03826 3913 3826 100000 0.01478 0.03771 3782 3771 time 2.38847184181 Using the MKL version: python relative_risk.py #rands fraction [2,6] fraction [lam,lam] largest% total[mean,mean] 100000 0.01502 0.03764 3895 3764 100000 0.01513 0.03841 3841 3841 100000 0.01511 0.03753 3810 3753 100000 0.01577 0.03766 3873 3766 100000 0.01541 0.0373 3963 3730 100000 0.01586 0.03862 3912 3862 100000 0.01552 0.03785 3870 3785 100000 0.01502 0.03854 3896 3854 100000 0.015 0.03803 3880 3803 100000 0.01515 0.03749 3855 3749 time 2.0455300808 So the rpm version only takes ~17% longer to run this program. I'm surprised that there isn't a larger difference. Perhaps there will be in a different type of program. BTW, the cpu is an Intel e6600 Core 2 Duo overclocked to 3.06 GHz (it will run reliably at 3.24 GHz). I've added these lines to .bashrc: source /opt/intel/cc/9.1.042/bin/iccvars.sh export PYTHONPATH=/usr/local/lib/python2.5/site-packages:/usr/lib/python2.5 export INCLUDE=/opt/intel/mkl/8.1/include:$INCLUDE export LD_LIBRARY_PATH=/usr/local/lib:/opt/intel/mkl/8.1/lib/32:$LD_LIBRARY_PATH I don't understand why the 'site-packages' must be included, but without it, numpy is loaded from /usr/lib/python/site-packages. Why does in look in the subdirectories in one case, but not in the other? Oh, well it works. Thanks much for the detailed explanation. It's greatly appreciated. :) Regards, -rex -- I know so little, but i once knew it fluently...
rex wrote:
I've added these lines to .bashrc: source /opt/intel/cc/9.1.042/bin/iccvars.sh export PYTHONPATH=/usr/local/lib/python2.5/site-packages:/usr/lib/python2.5 export INCLUDE=/opt/intel/mkl/8.1/include:$INCLUDE export LD_LIBRARY_PATH=/usr/local/lib:/opt/intel/mkl/8.1/lib/32:$LD_LIBRARY_PATH
I don't understand why the 'site-packages' must be included, but without it, numpy is loaded from /usr/lib/python/site-packages. Why does in look in the subdirectories in one case, but not in the other? Oh, well it works.
Because SuSE did not configure their Python installation to look in /usr/local/lib/python2.5/site-packages/. -- Robert Kern "I have come to believe that the whole world is an enigma, a harmless enigma that is made terrible by our own mad attempt to interpret it as though it had an underlying truth." -- Umberto Eco
Hi Rex,
Thank you for taking the time to write such a detailed explanation. If only the documentation were so detailed...
Now that you've gone through your odyssey trying to numpy/scipy w/ this particular combo (SuSE/MKL/IntelCC), now would be a great time to whip up wiki page ... you know .. for the documentation ;-)
So the rpm version only takes ~17% longer to run this program. I'm surprised that there isn't a larger difference. Perhaps there will be in a different type of program. BTW, the cpu is an Intel e6600 Core 2 Duo overclocked to 3.06 GHz (it will run reliably at 3.24 GHz).
That's not so bad, though, is it? I'd also be interested in seeing some more benchmarks though .. I wonder if there is a standard benchmarking suite somewhere .. Congrats on completing the gauntlet, -steve
Steve Lianoglou wrote:
Hi Rex,
Thank you for taking the time to write such a detailed explanation. If only the documentation were so detailed...
Now that you've gone through your odyssey trying to numpy/scipy w/ this particular combo (SuSE/MKL/IntelCC), now would be a great time to whip up wiki page ... you know .. for the documentation ;-)
So the rpm version only takes ~17% longer to run this program. I'm surprised that there isn't a larger difference. Perhaps there will be in a different type of program. BTW, the cpu is an Intel e6600 Core 2 Duo overclocked to 3.06 GHz (it will run reliably at 3.24 GHz).
That's not so bad, though, is it? I'd also be interested in seeing some more benchmarks though .. I wonder if there is a standard benchmarking suite somewhere ..
The code used for this benchmark uses only two few functions: poisson and sum, and I wouldn't be suprised that a lot of code is spent in python (vs in the core C functions), where the intel compiler doesn't make a big difference. Does this code uses the MKL at all ? The MKL gives an optimized fft and BLAS/LAPACK, right ? David
Steve Lianoglou <lists.steve@arachnedesign.net> [2007-01-24 20:06]:
Now that you've gone through your odyssey trying to numpy/scipy w/ this particular combo (SuSE/MKL/IntelCC), now would be a great time to whip up wiki page ... you know .. for the documentation ;-)
Yes, I should do that, but I want to optimize the compiler flags first, and try to get SciPy to build.
So the rpm version only takes ~17% longer to run this program. I'm surprised that there isn't a larger difference. Perhaps there will be in a different type of program. BTW, the cpu is an Intel e6600 Core 2 Duo overclocked to 3.06 GHz (it will run reliably at 3.24 GHz).
That's not so bad, though, is it? I'd also be interested in seeing some more benchmarks though .. I wonder if there is a standard benchmarking suite somewhere ..
I think it should do much better. A few minutes ago I compiled a C math benchmark with : icc -o3 -parallel -xT and it ran 2.8x as fast as it did when compiled with gcc -o3. In fact, it ran at a little over a gigaflop, which is a higher speed than anyone has reported for this benchmark.
Congrats on completing the gauntlet,
Thank. It's the 2nd time. I eventually succeed with an earlier version as well, thanks to Travis. -rex
rex wrote:
I think it should do much better. A few minutes ago I compiled a C math benchmark with :
icc -o3 -parallel -xT
and it ran 2.8x as fast as it did when compiled with gcc -o3. In fact, it ran at a little over a gigaflop, which is a higher speed than anyone has reported for this benchmark.
Without seeing the benchmark, it would be quite hard to know what's happening. Also, when you are using numpy, you are using python, and for some cases, it can be really easy to slow things down because you are doing something wrong (an example is using non contiguous arrays without knowing it; I got caught often when translating some matlab code to numpy); also the numeric code in numpy *may* be written in a way that icc cannot optimize as well as pure C code. All this is pure speculations, without seeing and running/profiling the actual code David
On 25/01/07, David Cournapeau <david@ar.media.kyoto-u.ac.jp> wrote:
rex wrote:
I think it should do much better. A few minutes ago I compiled a C math benchmark with :
icc -o3 -parallel -xT
and it ran 2.8x as fast as it did when compiled with gcc -o3. In fact, it ran at a little over a gigaflop, which is a higher speed than anyone has reported for this benchmark.
Without seeing the benchmark, it would be quite hard to know what's happening. Also, when you are using numpy, you are using python, and for
Perhaps compiling python itself with icc might give a useful speedup. Apparently somebody managed this for python 2.3 in 2003: http://mail.python.org/pipermail/c++-sig/2003-October/005824.html --George Nurser.
participants (5)
-
David Cournapeau -
George Nurser -
rex -
Robert Kern -
Steve Lianoglou