Well, here is the question that started this all. In the slow environment, blas seems to be there and work well, but numpy doesn't use it! In [1]: import time, numpy, scipy In [2]: from scipy import linalg In [3]: n=1000 In [4]: A = numpy.random.rand(n,n) In [5]: B = numpy.random.rand(n,n) In [6]: then = time.time(); C=scipy.dot(A,B); print time.time()-then 7.62005901337 In [7]: begin = time.time(); C=linalg.blas.dgemm(1.0,A,B);print time.time() - begin 0.325305938721 In [8]: begin = time.time(); C=linalg.blas.ddot(A,B);print time.time() - begin 0.0363020896912 On Sat, Jun 20, 2015 at 4:09 AM, Sebastian Berg <sebastian@sipsolutions.net> wrote:

On Fr, 2015-06-19 at 16:19 -0500, Elliot Hallmark wrote:

Debian Sid, 64-bit. I was trying to fix the problem of np.dot running very slow.

I ended up uninstalling numpy, installing libatlas3-base through apt-get and re-installing numpy. The performance of dot is greatly improved! But I can't tell from any other method whether numpy is set up correctly. Consider comparing the faster one to another in a virtual env that is still slow:

Not that I really know this stuff, but one thing to be sure is probably checking `ldd /usr/lib/python2.7/dist-packages/numpy/core/_dotblas.so`. That is probably silly (I really never cared to learn this stuff), but I think it can't go wrong....

About the other difference. Aside from CPU, etc. differences, I expect you got a newer numpy version then the other user. Not sure which part got much faster, but there were for example quite a few speedups in the code converting to array, so I expect it is very likely that this is the reason.

- Sebastian

###

fast one ###

In [1]: import time, numpy

In [2]: n=1000

In [3]: A = numpy.random.rand(n,n)

In [4]: B = numpy.random.rand(n,n)

In [5]: then = time.time(); C=numpy.dot(A,B); print time.time()-then 0.306427001953

In [6]: numpy.show_config() blas_info: libraries = ['blas'] library_dirs = ['/usr/lib'] language = f77 lapack_info: libraries = ['lapack'] library_dirs = ['/usr/lib'] language = f77 atlas_threads_info: NOT AVAILABLE blas_opt_info: libraries = ['blas'] library_dirs = ['/usr/lib'] language = f77 define_macros = [('NO_ATLAS_INFO', 1)] atlas_blas_threads_info: NOT AVAILABLE openblas_info: NOT AVAILABLE lapack_opt_info: libraries = ['lapack', 'blas'] library_dirs = ['/usr/lib'] language = f77 define_macros = [('NO_ATLAS_INFO', 1)] atlas_info: NOT AVAILABLE lapack_mkl_info: NOT AVAILABLE blas_mkl_info: NOT AVAILABLE atlas_blas_info: NOT AVAILABLE mkl_info: NOT AVAILABLE

###

slow one ###

In [1]: import time, numpy

In [2]: n=1000

In [3]: A = numpy.random.rand(n,n)

In [4]: B = numpy.random.rand(n,n)

In [5]: then = time.time(); C=numpy.dot(A,B); print time.time()-then 7.88430500031

In [6]: numpy.show_config() blas_info: libraries = ['blas'] library_dirs = ['/usr/lib'] language = f77 lapack_info: libraries = ['lapack'] library_dirs = ['/usr/lib'] language = f77 atlas_threads_info: NOT AVAILABLE blas_opt_info: libraries = ['blas'] library_dirs = ['/usr/lib'] language = f77 define_macros = [('NO_ATLAS_INFO', 1)] atlas_blas_threads_info: NOT AVAILABLE openblas_info: NOT AVAILABLE lapack_opt_info: libraries = ['lapack', 'blas'] library_dirs = ['/usr/lib'] language = f77 define_macros = [('NO_ATLAS_INFO', 1)] atlas_info: NOT AVAILABLE lapack_mkl_info: NOT AVAILABLE blas_mkl_info: NOT AVAILABLE atlas_blas_info: NOT AVAILABLE mkl_info: NOT AVAILABLE

#####

Further, in the following comparison between Cpython and converting to numpy array for one operation, I get Cpython being faster by the same amount in both environments. But another user got numpy being faster.

In [1]: import numpy as np

In [2]: pts = range(100,1000)

In [3]: pts[100] = 0

In [4]: %timeit pts_arr = np.array(pts); mini = np.argmin(pts_arr) 10000 loops, best of 3: 129 µs per loop

In [5]: %timeit mini = sorted(enumerate(pts))[0][1] 10000 loops, best of 3: 89.2 µs per loop

The other user got

In [29]: %timeit pts_arr = np.array(pts); mini = np.argmin(pts_arr) 10000 loops, best of 3: 37.7 µs per loop

In [30]: %timeit mini = sorted(enumerate(pts))[0][1] 10000 loops, best of 3: 69.2 µs per loop

And I can't help but wonder if there is further configuration I need to make numpy faster, or if this is just a difference between out machines In the future, should I ignore show_config() and just do this dot product test?

Any guidance would be appreciated.

Thanks,

Elliot _______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion

_______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion