I can't tell if Numpy is configured properly with show_config()
Debian Sid, 64-bit. I was trying to fix the problem of np.dot running very slow. I ended up uninstalling numpy, installing libatlas3-base through apt-get and re-installing numpy. The performance of dot is greatly improved! But I can't tell from any other method whether numpy is set up correctly. Consider comparing the faster one to another in a virtual env that is still slow: ### fast one ### In [1]: import time, numpy In [2]: n=1000 In [3]: A = numpy.random.rand(n,n) In [4]: B = numpy.random.rand(n,n) In [5]: then = time.time(); C=numpy.dot(A,B); print time.time()-then 0.306427001953 In [6]: numpy.show_config() blas_info: libraries = ['blas'] library_dirs = ['/usr/lib'] language = f77 lapack_info: libraries = ['lapack'] library_dirs = ['/usr/lib'] language = f77 atlas_threads_info: NOT AVAILABLE blas_opt_info: libraries = ['blas'] library_dirs = ['/usr/lib'] language = f77 define_macros = [('NO_ATLAS_INFO', 1)] atlas_blas_threads_info: NOT AVAILABLE openblas_info: NOT AVAILABLE lapack_opt_info: libraries = ['lapack', 'blas'] library_dirs = ['/usr/lib'] language = f77 define_macros = [('NO_ATLAS_INFO', 1)] atlas_info: NOT AVAILABLE lapack_mkl_info: NOT AVAILABLE blas_mkl_info: NOT AVAILABLE atlas_blas_info: NOT AVAILABLE mkl_info: NOT AVAILABLE ### slow one ### In [1]: import time, numpy In [2]: n=1000 In [3]: A = numpy.random.rand(n,n) In [4]: B = numpy.random.rand(n,n) In [5]: then = time.time(); C=numpy.dot(A,B); print time.time()-then 7.88430500031 In [6]: numpy.show_config() blas_info: libraries = ['blas'] library_dirs = ['/usr/lib'] language = f77 lapack_info: libraries = ['lapack'] library_dirs = ['/usr/lib'] language = f77 atlas_threads_info: NOT AVAILABLE blas_opt_info: libraries = ['blas'] library_dirs = ['/usr/lib'] language = f77 define_macros = [('NO_ATLAS_INFO', 1)] atlas_blas_threads_info: NOT AVAILABLE openblas_info: NOT AVAILABLE lapack_opt_info: libraries = ['lapack', 'blas'] library_dirs = ['/usr/lib'] language = f77 define_macros = [('NO_ATLAS_INFO', 1)] atlas_info: NOT AVAILABLE lapack_mkl_info: NOT AVAILABLE blas_mkl_info: NOT AVAILABLE atlas_blas_info: NOT AVAILABLE mkl_info: NOT AVAILABLE ##### Further, in the following comparison between Cpython and converting to numpy array for one operation, I get Cpython being faster by the same amount in both environments. But another user got numpy being faster. In [1]: import numpy as np In [2]: pts = range(100,1000) In [3]: pts[100] = 0 In [4]: %timeit pts_arr = np.array(pts); mini = np.argmin(pts_arr) 10000 loops, best of 3: 129 µs per loop In [5]: %timeit mini = sorted(enumerate(pts))[0][1] 10000 loops, best of 3: 89.2 µs per loop The other user got In [29]: %timeit pts_arr = np.array(pts); mini = np.argmin(pts_arr) 10000 loops, best of 3: 37.7 µs per loop In [30]: %timeit mini = sorted(enumerate(pts))[0][1] 10000 loops, best of 3: 69.2 µs per loop And I can't help but wonder if there is further configuration I need to make numpy faster, or if this is just a difference between out machines In the future, should I ignore show_config() and just do this dot product test? Any guidance would be appreciated. Thanks, Elliot
Elliot Hallmark <Permafacture@gmail.com> wrote:
And I can't help but wonder if there is further configuration I need to make numpy faster, or if this is just a difference between out machines
Try to build NumPy with Intel MKL or OpenBLAS instead. ATLAS is only efficient on the host computer on which it is built, and even there it is not very fast (but far better than the reference BLAS). Sturla
On Fr, 2015-06-19 at 16:19 -0500, Elliot Hallmark wrote:
Debian Sid, 64-bit. I was trying to fix the problem of np.dot running very slow.
I ended up uninstalling numpy, installing libatlas3-base through apt-get and re-installing numpy. The performance of dot is greatly improved! But I can't tell from any other method whether numpy is set up correctly. Consider comparing the faster one to another in a virtual env that is still slow:
Not that I really know this stuff, but one thing to be sure is probably checking `ldd /usr/lib/python2.7/dist-packages/numpy/core/_dotblas.so`. That is probably silly (I really never cared to learn this stuff), but I think it can't go wrong.... About the other difference. Aside from CPU, etc. differences, I expect you got a newer numpy version then the other user. Not sure which part got much faster, but there were for example quite a few speedups in the code converting to array, so I expect it is very likely that this is the reason. - Sebastian
###
fast one ###
In [1]: import time, numpy
In [2]: n=1000
In [3]: A = numpy.random.rand(n,n)
In [4]: B = numpy.random.rand(n,n)
In [5]: then = time.time(); C=numpy.dot(A,B); print time.time()-then 0.306427001953
In [6]: numpy.show_config() blas_info: libraries = ['blas'] library_dirs = ['/usr/lib'] language = f77 lapack_info: libraries = ['lapack'] library_dirs = ['/usr/lib'] language = f77 atlas_threads_info: NOT AVAILABLE blas_opt_info: libraries = ['blas'] library_dirs = ['/usr/lib'] language = f77 define_macros = [('NO_ATLAS_INFO', 1)] atlas_blas_threads_info: NOT AVAILABLE openblas_info: NOT AVAILABLE lapack_opt_info: libraries = ['lapack', 'blas'] library_dirs = ['/usr/lib'] language = f77 define_macros = [('NO_ATLAS_INFO', 1)] atlas_info: NOT AVAILABLE lapack_mkl_info: NOT AVAILABLE blas_mkl_info: NOT AVAILABLE atlas_blas_info: NOT AVAILABLE mkl_info: NOT AVAILABLE
###
slow one ###
In [1]: import time, numpy
In [2]: n=1000
In [3]: A = numpy.random.rand(n,n)
In [4]: B = numpy.random.rand(n,n)
In [5]: then = time.time(); C=numpy.dot(A,B); print time.time()-then 7.88430500031
In [6]: numpy.show_config() blas_info: libraries = ['blas'] library_dirs = ['/usr/lib'] language = f77 lapack_info: libraries = ['lapack'] library_dirs = ['/usr/lib'] language = f77 atlas_threads_info: NOT AVAILABLE blas_opt_info: libraries = ['blas'] library_dirs = ['/usr/lib'] language = f77 define_macros = [('NO_ATLAS_INFO', 1)] atlas_blas_threads_info: NOT AVAILABLE openblas_info: NOT AVAILABLE lapack_opt_info: libraries = ['lapack', 'blas'] library_dirs = ['/usr/lib'] language = f77 define_macros = [('NO_ATLAS_INFO', 1)] atlas_info: NOT AVAILABLE lapack_mkl_info: NOT AVAILABLE blas_mkl_info: NOT AVAILABLE atlas_blas_info: NOT AVAILABLE mkl_info: NOT AVAILABLE
#####
Further, in the following comparison between Cpython and converting to numpy array for one operation, I get Cpython being faster by the same amount in both environments. But another user got numpy being faster.
In [1]: import numpy as np
In [2]: pts = range(100,1000)
In [3]: pts[100] = 0
In [4]: %timeit pts_arr = np.array(pts); mini = np.argmin(pts_arr) 10000 loops, best of 3: 129 µs per loop
In [5]: %timeit mini = sorted(enumerate(pts))[0][1] 10000 loops, best of 3: 89.2 µs per loop
The other user got
In [29]: %timeit pts_arr = np.array(pts); mini = np.argmin(pts_arr) 10000 loops, best of 3: 37.7 µs per loop
In [30]: %timeit mini = sorted(enumerate(pts))[0][1] 10000 loops, best of 3: 69.2 µs per loop
And I can't help but wonder if there is further configuration I need to make numpy faster, or if this is just a difference between out machines In the future, should I ignore show_config() and just do this dot product test?
Any guidance would be appreciated.
Thanks,
Elliot _______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Well, here is the question that started this all. In the slow environment, blas seems to be there and work well, but numpy doesn't use it! In [1]: import time, numpy, scipy In [2]: from scipy import linalg In [3]: n=1000 In [4]: A = numpy.random.rand(n,n) In [5]: B = numpy.random.rand(n,n) In [6]: then = time.time(); C=scipy.dot(A,B); print time.time()-then 7.62005901337 In [7]: begin = time.time(); C=linalg.blas.dgemm(1.0,A,B);print time.time() - begin 0.325305938721 In [8]: begin = time.time(); C=linalg.blas.ddot(A,B);print time.time() - begin 0.0363020896912 On Sat, Jun 20, 2015 at 4:09 AM, Sebastian Berg <sebastian@sipsolutions.net> wrote:
On Fr, 2015-06-19 at 16:19 -0500, Elliot Hallmark wrote:
Debian Sid, 64-bit. I was trying to fix the problem of np.dot running very slow.
I ended up uninstalling numpy, installing libatlas3-base through apt-get and re-installing numpy. The performance of dot is greatly improved! But I can't tell from any other method whether numpy is set up correctly. Consider comparing the faster one to another in a virtual env that is still slow:
Not that I really know this stuff, but one thing to be sure is probably checking `ldd /usr/lib/python2.7/dist-packages/numpy/core/_dotblas.so`. That is probably silly (I really never cared to learn this stuff), but I think it can't go wrong....
About the other difference. Aside from CPU, etc. differences, I expect you got a newer numpy version then the other user. Not sure which part got much faster, but there were for example quite a few speedups in the code converting to array, so I expect it is very likely that this is the reason.
- Sebastian
###
fast one ###
In [1]: import time, numpy
In [2]: n=1000
In [3]: A = numpy.random.rand(n,n)
In [4]: B = numpy.random.rand(n,n)
In [5]: then = time.time(); C=numpy.dot(A,B); print time.time()-then 0.306427001953
In [6]: numpy.show_config() blas_info: libraries = ['blas'] library_dirs = ['/usr/lib'] language = f77 lapack_info: libraries = ['lapack'] library_dirs = ['/usr/lib'] language = f77 atlas_threads_info: NOT AVAILABLE blas_opt_info: libraries = ['blas'] library_dirs = ['/usr/lib'] language = f77 define_macros = [('NO_ATLAS_INFO', 1)] atlas_blas_threads_info: NOT AVAILABLE openblas_info: NOT AVAILABLE lapack_opt_info: libraries = ['lapack', 'blas'] library_dirs = ['/usr/lib'] language = f77 define_macros = [('NO_ATLAS_INFO', 1)] atlas_info: NOT AVAILABLE lapack_mkl_info: NOT AVAILABLE blas_mkl_info: NOT AVAILABLE atlas_blas_info: NOT AVAILABLE mkl_info: NOT AVAILABLE
###
slow one ###
In [1]: import time, numpy
In [2]: n=1000
In [3]: A = numpy.random.rand(n,n)
In [4]: B = numpy.random.rand(n,n)
In [5]: then = time.time(); C=numpy.dot(A,B); print time.time()-then 7.88430500031
In [6]: numpy.show_config() blas_info: libraries = ['blas'] library_dirs = ['/usr/lib'] language = f77 lapack_info: libraries = ['lapack'] library_dirs = ['/usr/lib'] language = f77 atlas_threads_info: NOT AVAILABLE blas_opt_info: libraries = ['blas'] library_dirs = ['/usr/lib'] language = f77 define_macros = [('NO_ATLAS_INFO', 1)] atlas_blas_threads_info: NOT AVAILABLE openblas_info: NOT AVAILABLE lapack_opt_info: libraries = ['lapack', 'blas'] library_dirs = ['/usr/lib'] language = f77 define_macros = [('NO_ATLAS_INFO', 1)] atlas_info: NOT AVAILABLE lapack_mkl_info: NOT AVAILABLE blas_mkl_info: NOT AVAILABLE atlas_blas_info: NOT AVAILABLE mkl_info: NOT AVAILABLE
#####
Further, in the following comparison between Cpython and converting to numpy array for one operation, I get Cpython being faster by the same amount in both environments. But another user got numpy being faster.
In [1]: import numpy as np
In [2]: pts = range(100,1000)
In [3]: pts[100] = 0
In [4]: %timeit pts_arr = np.array(pts); mini = np.argmin(pts_arr) 10000 loops, best of 3: 129 µs per loop
In [5]: %timeit mini = sorted(enumerate(pts))[0][1] 10000 loops, best of 3: 89.2 µs per loop
The other user got
In [29]: %timeit pts_arr = np.array(pts); mini = np.argmin(pts_arr) 10000 loops, best of 3: 37.7 µs per loop
In [30]: %timeit mini = sorted(enumerate(pts))[0][1] 10000 loops, best of 3: 69.2 µs per loop
And I can't help but wonder if there is further configuration I need to make numpy faster, or if this is just a difference between out machines In the future, should I ignore show_config() and just do this dot product test?
Any guidance would be appreciated.
Thanks,
Elliot _______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
_______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Sebastian, in the slow virtual-env, _dotblas.so isn't there. I only have _dummy.so On Sat, Jun 20, 2015 at 3:02 PM, Elliot Hallmark <Permafacture@gmail.com> wrote:
Well, here is the question that started this all. In the slow environment, blas seems to be there and work well, but numpy doesn't use it!
In [1]: import time, numpy, scipy
In [2]: from scipy import linalg
In [3]: n=1000
In [4]: A = numpy.random.rand(n,n)
In [5]: B = numpy.random.rand(n,n)
In [6]: then = time.time(); C=scipy.dot(A,B); print time.time()-then 7.62005901337
In [7]: begin = time.time(); C=linalg.blas.dgemm(1.0,A,B);print time.time() - begin 0.325305938721
In [8]: begin = time.time(); C=linalg.blas.ddot(A,B);print time.time() - begin 0.0363020896912
On Sat, Jun 20, 2015 at 4:09 AM, Sebastian Berg < sebastian@sipsolutions.net> wrote:
On Fr, 2015-06-19 at 16:19 -0500, Elliot Hallmark wrote:
Debian Sid, 64-bit. I was trying to fix the problem of np.dot running very slow.
I ended up uninstalling numpy, installing libatlas3-base through apt-get and re-installing numpy. The performance of dot is greatly improved! But I can't tell from any other method whether numpy is set up correctly. Consider comparing the faster one to another in a virtual env that is still slow:
Not that I really know this stuff, but one thing to be sure is probably checking `ldd /usr/lib/python2.7/dist-packages/numpy/core/_dotblas.so`. That is probably silly (I really never cared to learn this stuff), but I think it can't go wrong....
About the other difference. Aside from CPU, etc. differences, I expect you got a newer numpy version then the other user. Not sure which part got much faster, but there were for example quite a few speedups in the code converting to array, so I expect it is very likely that this is the reason.
- Sebastian
###
fast one ###
In [1]: import time, numpy
In [2]: n=1000
In [3]: A = numpy.random.rand(n,n)
In [4]: B = numpy.random.rand(n,n)
In [5]: then = time.time(); C=numpy.dot(A,B); print time.time()-then 0.306427001953
In [6]: numpy.show_config() blas_info: libraries = ['blas'] library_dirs = ['/usr/lib'] language = f77 lapack_info: libraries = ['lapack'] library_dirs = ['/usr/lib'] language = f77 atlas_threads_info: NOT AVAILABLE blas_opt_info: libraries = ['blas'] library_dirs = ['/usr/lib'] language = f77 define_macros = [('NO_ATLAS_INFO', 1)] atlas_blas_threads_info: NOT AVAILABLE openblas_info: NOT AVAILABLE lapack_opt_info: libraries = ['lapack', 'blas'] library_dirs = ['/usr/lib'] language = f77 define_macros = [('NO_ATLAS_INFO', 1)] atlas_info: NOT AVAILABLE lapack_mkl_info: NOT AVAILABLE blas_mkl_info: NOT AVAILABLE atlas_blas_info: NOT AVAILABLE mkl_info: NOT AVAILABLE
###
slow one ###
In [1]: import time, numpy
In [2]: n=1000
In [3]: A = numpy.random.rand(n,n)
In [4]: B = numpy.random.rand(n,n)
In [5]: then = time.time(); C=numpy.dot(A,B); print time.time()-then 7.88430500031
In [6]: numpy.show_config() blas_info: libraries = ['blas'] library_dirs = ['/usr/lib'] language = f77 lapack_info: libraries = ['lapack'] library_dirs = ['/usr/lib'] language = f77 atlas_threads_info: NOT AVAILABLE blas_opt_info: libraries = ['blas'] library_dirs = ['/usr/lib'] language = f77 define_macros = [('NO_ATLAS_INFO', 1)] atlas_blas_threads_info: NOT AVAILABLE openblas_info: NOT AVAILABLE lapack_opt_info: libraries = ['lapack', 'blas'] library_dirs = ['/usr/lib'] language = f77 define_macros = [('NO_ATLAS_INFO', 1)] atlas_info: NOT AVAILABLE lapack_mkl_info: NOT AVAILABLE blas_mkl_info: NOT AVAILABLE atlas_blas_info: NOT AVAILABLE mkl_info: NOT AVAILABLE
#####
Further, in the following comparison between Cpython and converting to numpy array for one operation, I get Cpython being faster by the same amount in both environments. But another user got numpy being faster.
In [1]: import numpy as np
In [2]: pts = range(100,1000)
In [3]: pts[100] = 0
In [4]: %timeit pts_arr = np.array(pts); mini = np.argmin(pts_arr) 10000 loops, best of 3: 129 µs per loop
In [5]: %timeit mini = sorted(enumerate(pts))[0][1] 10000 loops, best of 3: 89.2 µs per loop
The other user got
In [29]: %timeit pts_arr = np.array(pts); mini = np.argmin(pts_arr) 10000 loops, best of 3: 37.7 µs per loop
In [30]: %timeit mini = sorted(enumerate(pts))[0][1] 10000 loops, best of 3: 69.2 µs per loop
And I can't help but wonder if there is further configuration I need to make numpy faster, or if this is just a difference between out machines In the future, should I ignore show_config() and just do this dot product test?
Any guidance would be appreciated.
Thanks,
Elliot _______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
_______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
On Sat, Jun 20, 2015 at 2:08 PM, Elliot Hallmark <Permafacture@gmail.com> wrote:
Sebastian, in the slow virtual-env, _dotblas.so isn't there. I only have _dummy.so
On Sat, Jun 20, 2015 at 3:02 PM, Elliot Hallmark <Permafacture@gmail.com> wrote:
Well, here is the question that started this all. In the slow environment, blas seems to be there and work well, but numpy doesn't use it!
In [1]: import time, numpy, scipy
In [2]: from scipy import linalg
In [3]: n=1000
In [4]: A = numpy.random.rand(n,n)
In [5]: B = numpy.random.rand(n,n)
In [6]: then = time.time(); C=scipy.dot(A,B); print time.time()-then 7.62005901337
In [7]: begin = time.time(); C=linalg.blas.dgemm(1.0,A,B);print time.time() - begin 0.325305938721
In [8]: begin = time.time(); C=linalg.blas.ddot(A,B);print time.time() - begin 0.0363020896912
What numpy version? <snip> Chuck
participants (4)
-
Charles R Harris
-
Elliot Hallmark
-
Sebastian Berg
-
Sturla Molden