performance solving system of equations in numpy and MATLAB

I recently did a conceptual experiment to estimate the computational time required to solve an exact expression in contrast to an approximate solution (Helmholtz vs. Helmholtz-Kirchhoff integrals). The exact solution requires a matrix inversion, and in my case the matrix would contain ~15000 rows. On my machine MATLAB seems to perform this matrix inversion with random matrices about 9x faster (20 sec vs 3 mins). I thought the performance would be roughly the same because I presume both rely on the same LAPACK solvers. I will not actually need to solve this problem (even at 20 sec it is prohibitive for broadband simulation), but if I needed to I would reluctantly choose MATLAB . I am simply wondering why there is this performance gap, and if there is a better way to solve this problem in numpy? Thank you, Ned #Python version import numpy as np testA = np.random.randn(15000, 15000) testb = np.random.randn(15000) %time testx = np.linalg.solve(testA, testb) %MATLAB version testA = randn(15000); testb = randn(15000, 1); tic(); testx = testA \ testb; toc();

Hi, Probably MATLAB is shipping with Intel MKL enabled, which probably is the fastest LAPACK implementation out there. NumPy supports linking with MKL, and actually Anaconda does that by default, so switching to Anaconda would be a good option for you. Here you have what I am getting with Anaconda's NumPy and a machine with 8 cores: In [1]: import numpy as np In [2]: testA = np.random.randn(15000, 15000) In [3]: testb = np.random.randn(15000) In [4]: %time testx = np.linalg.solve(testA, testb) CPU times: user 5min 36s, sys: 4.94 s, total: 5min 41s Wall time: 46.1 s This is not 20 sec, but it is not 3 min either (but of course that depends on your machine). Francesc 2015-12-16 18:34 GMT+01:00 Edward Richards <edwardlrichards@gmail.com>:
-- Francesc Alted

Sorry, I have to correct myself, as per: http://docs.continuum.io/mkl-optimizations/index it seems that Anaconda is not linking with MKL by default (I thought that was the case before?). After installing MKL (conda install mkl), I am getting: In [1]: import numpy as np Vendor: Continuum Analytics, Inc. Package: mkl Message: trial mode expires in 30 days In [2]: testA = np.random.randn(15000, 15000) In [3]: testb = np.random.randn(15000) In [4]: %time testx = np.linalg.solve(testA, testb) CPU times: user 1min, sys: 468 ms, total: 1min 1s Wall time: 15.3 s so, it looks like you will need to buy a MKL license separately (which makes sense for a commercial product). Sorry for the confusion. Francesc 2015-12-16 18:59 GMT+01:00 Francesc Alted <faltet@gmail.com>:
-- Francesc Alted

Sometime ago I saw this: https://software.intel.com/sites/campaigns/nest/ I don't know if the "community" license applies in your case though. It is worth taking a look at. On Wed, Dec 16, 2015 at 4:30 PM, Francesc Alted <faltet@gmail.com> wrote:

Continuum provides MKL free now - you just need to have a free anaconda.org account to get the license: http://docs.continuum.io/mkl-optimizations/index HTH, Michael On Wed, Dec 16, 2015 at 12:35 PM Edison Gustavo Muenz < edisongustavo@gmail.com> wrote:

Hi, On Wed, Dec 16, 2015 at 6:34 PM, Edison Gustavo Muenz <edisongustavo@gmail.com> wrote:
If you're on a recent Mac, I would guess that the default Accelerate-linked numpy / scipy will be in the same performance range as those linked to the MKL, but I am happy to be corrected. Cheers, Matthew

On 16 Dec 2015, at 8:22 PM, Matthew Brett <matthew.brett@gmail.com> wrote:
Getting around 30 s wall time here on a not so recent 4-core iMac, so that would seem to fit (iirc Accelerate should actually largely be using the same machine code as MKL). Cheers, Derek

On 16/12/15 20:47, Derek Homeier wrote:
Getting around 30 s wall time here on a not so recent 4-core iMac, so that would seem to fit (iirc Accelerate should actually largely be using the same machine code as MKL).
Yes, the same kernels, but not the same threadpool. Accelerate uses the GCD, MKL uses Intel TBB and Intel OpenMP (both of them). GCD scales better than TBB, even in Intel's own benchmarks. However, GCD uses a kernel threadpool (accesible via kqueue) which is not fork-safe, whereas MKL's threadpool is fork-safe (but will leak memory on fork). Sturla

On 17/12/15 12:06, Francesc Alted wrote:
Pretty good. I did not know that OpenBLAS was so close in performance to MKL.
MKL, OpenBLAS and Accelerate are very close in performance, except for level-1 BLAS where Accelerate and MKL are better than OpenBLAS. MKL requires the number of threads to be a multiple of four to achieve good performance, OpenBLAS and Accelerate do not. It e.g. matters if you have an online data acquisition and DSP system and want to dedicate one processor to take care of i/o tasks. In this case OpenBLAS and Accelerate are likely to perform better than MKL. Sturla

Hi, I just ran both on the same hardware and got a slightly faster computation with numpy: Matlab R2012a: 16.78 s (best of 3) numpy (python 3.4, numpy 1.10.1, anaconda accelerate (MKL)): 14.8 s (best of 3) The difference could because my Matlab version is a few years old, so it's MKL would be less up to date. Greg On Thu, Dec 17, 2015 at 9:29 AM, Andy Ray Terrel <andy.terrel@gmail.com> wrote:

What operating system are you on and how did you install numpy? From a package manager, from source, by downloading from somewhere...? On Dec 16, 2015 9:34 AM, "Edward Richards" <edwardlrichards@gmail.com> wrote:

Hi, Probably MATLAB is shipping with Intel MKL enabled, which probably is the fastest LAPACK implementation out there. NumPy supports linking with MKL, and actually Anaconda does that by default, so switching to Anaconda would be a good option for you. Here you have what I am getting with Anaconda's NumPy and a machine with 8 cores: In [1]: import numpy as np In [2]: testA = np.random.randn(15000, 15000) In [3]: testb = np.random.randn(15000) In [4]: %time testx = np.linalg.solve(testA, testb) CPU times: user 5min 36s, sys: 4.94 s, total: 5min 41s Wall time: 46.1 s This is not 20 sec, but it is not 3 min either (but of course that depends on your machine). Francesc 2015-12-16 18:34 GMT+01:00 Edward Richards <edwardlrichards@gmail.com>:
-- Francesc Alted

Sorry, I have to correct myself, as per: http://docs.continuum.io/mkl-optimizations/index it seems that Anaconda is not linking with MKL by default (I thought that was the case before?). After installing MKL (conda install mkl), I am getting: In [1]: import numpy as np Vendor: Continuum Analytics, Inc. Package: mkl Message: trial mode expires in 30 days In [2]: testA = np.random.randn(15000, 15000) In [3]: testb = np.random.randn(15000) In [4]: %time testx = np.linalg.solve(testA, testb) CPU times: user 1min, sys: 468 ms, total: 1min 1s Wall time: 15.3 s so, it looks like you will need to buy a MKL license separately (which makes sense for a commercial product). Sorry for the confusion. Francesc 2015-12-16 18:59 GMT+01:00 Francesc Alted <faltet@gmail.com>:
-- Francesc Alted

Sometime ago I saw this: https://software.intel.com/sites/campaigns/nest/ I don't know if the "community" license applies in your case though. It is worth taking a look at. On Wed, Dec 16, 2015 at 4:30 PM, Francesc Alted <faltet@gmail.com> wrote:

Continuum provides MKL free now - you just need to have a free anaconda.org account to get the license: http://docs.continuum.io/mkl-optimizations/index HTH, Michael On Wed, Dec 16, 2015 at 12:35 PM Edison Gustavo Muenz < edisongustavo@gmail.com> wrote:

Hi, On Wed, Dec 16, 2015 at 6:34 PM, Edison Gustavo Muenz <edisongustavo@gmail.com> wrote:
If you're on a recent Mac, I would guess that the default Accelerate-linked numpy / scipy will be in the same performance range as those linked to the MKL, but I am happy to be corrected. Cheers, Matthew

On 16 Dec 2015, at 8:22 PM, Matthew Brett <matthew.brett@gmail.com> wrote:
Getting around 30 s wall time here on a not so recent 4-core iMac, so that would seem to fit (iirc Accelerate should actually largely be using the same machine code as MKL). Cheers, Derek

On 16/12/15 20:47, Derek Homeier wrote:
Getting around 30 s wall time here on a not so recent 4-core iMac, so that would seem to fit (iirc Accelerate should actually largely be using the same machine code as MKL).
Yes, the same kernels, but not the same threadpool. Accelerate uses the GCD, MKL uses Intel TBB and Intel OpenMP (both of them). GCD scales better than TBB, even in Intel's own benchmarks. However, GCD uses a kernel threadpool (accesible via kqueue) which is not fork-safe, whereas MKL's threadpool is fork-safe (but will leak memory on fork). Sturla

On 17/12/15 12:06, Francesc Alted wrote:
Pretty good. I did not know that OpenBLAS was so close in performance to MKL.
MKL, OpenBLAS and Accelerate are very close in performance, except for level-1 BLAS where Accelerate and MKL are better than OpenBLAS. MKL requires the number of threads to be a multiple of four to achieve good performance, OpenBLAS and Accelerate do not. It e.g. matters if you have an online data acquisition and DSP system and want to dedicate one processor to take care of i/o tasks. In this case OpenBLAS and Accelerate are likely to perform better than MKL. Sturla

Hi, I just ran both on the same hardware and got a slightly faster computation with numpy: Matlab R2012a: 16.78 s (best of 3) numpy (python 3.4, numpy 1.10.1, anaconda accelerate (MKL)): 14.8 s (best of 3) The difference could because my Matlab version is a few years old, so it's MKL would be less up to date. Greg On Thu, Dec 17, 2015 at 9:29 AM, Andy Ray Terrel <andy.terrel@gmail.com> wrote:

What operating system are you on and how did you install numpy? From a package manager, from source, by downloading from somewhere...? On Dec 16, 2015 9:34 AM, "Edward Richards" <edwardlrichards@gmail.com> wrote:
participants (11)
-
Andy Ray Terrel
-
Daπid
-
Derek Homeier
-
Edison Gustavo Muenz
-
Edward Richards
-
Francesc Alted
-
Gregory Lee
-
Matthew Brett
-
Michael Sarahan
-
Nathaniel Smith
-
Sturla Molden