[Numpy-discussion] Numpy matrix multiplication slow even though ATLAS linked
Jan-Willem van de Meent
vandemeent at damtp.cam.ac.uk
Fri Oct 31 13:40:21 EDT 2008
On Friday 31 October 2008 13:45:56 Pauli Virtanen wrote:
> Thu, 30 Oct 2008 22:19:01 +0000, Jan-Willem van de Meent wrote:
> > On Thursday 30 October 2008 18:41:51 Charles R Harris wrote:
> >> On Thu, Oct 30, 2008 at 5:19 AM, Jan-Willem van de Meent <
> >>
> >> vandemeent at damtp.cam.ac.uk> wrote:
> >> > Dear all,
> >> >
> >> > This is my first post to this list. I am having perfomance issues
> >> > with with numpy/atlas. Doing dot(a,a) for a 2000x2000 matrix takes
> >> > about 1m40s, even though numpy is appears to link to my atlas
> >> > libraries:
>
> Can you try to benchmark your ATLAS library using a simple C or Fortran
> program to check if the problem is in Numpy, or in Atlas itself.
>
> For comparison,
>
> gfortran -o test test.f90 -lblas
>
> time ./test # ATLAS
> -> 0.55 s
>
> LD_PRELOAD=/usr/lib/libblas.so.3.0 time ./test # reference BLAS
> -> 5.6 s
>
>
> test.f90
> --------
> program main
> integer, parameter :: n = 1000
> double precision, dimension(n,n) :: a, b, c
> integer :: i, j
>
> do i = 1, n
> do j = 1,n
> a(i,j) = i+j
> b(i,j) = i-j
> end do
> end do
>
> call dgemm('N', 'N', n, n, n, 1d0, a, n, b, n, 0d0, c, n)
> end program main
>
> _______________________________________________
> Numpy-discussion mailing list
> Numpy-discussion at scipy.org
> http://projects.scipy.org/mailman/listinfo/numpy-discussion
I must admit have no experience calling Atlas routines from either C or Fortan
and am a bit clumsy with compilers. However, I got your test case to compile
by doing:
fortran -o test_atlas test_atlas.f90 -lptf77blas -latlas
which gives
time ./test_atlas
-> 0.85 s
I don't understand what the LD_PRELOAD directive is supposed to do, but timing
it gives
time LD_PRELOAD=/usr/lib/libblas.so.3.0.3 ./test_atlas
-> 0.86 s
For reference, here are the results of xatlbench and xdmmtst_big (generated at
compile time by Atlas). As far as I can tell from comparison with on-line
posted results, these should be pretty normal.
./xatlbench
Clock rate=1667Mhz
single precision double precision
********************* ********************
real complex real complex
Benchmark % Clock % Clock % Clock % Clock
========= ========= ========= ========= =========
kSelMM 264.6 264.6 86.1 84.6
kGenMM 86.7 89.3 84.7 84.6
kMM_NT 78.8 77.6 75.5 75.9
kMM_TN 87.2 84.2 77.7 82.1
BIG_MM 261.4 261.9 85.2 85.7
kMV_N 27.2 91.4 50.7 78.3
kMV_T 75.2 76.9 53.9 63.1
kGER 49.0 84.7 23.6 46.5
./xdmmtst_big
TEST TA TB M N K alpha beta Time Mflop SpUp PASS
==== == == === === === ===== ===== ====== ===== ==== ====
1 N N 100 100 100 1.0 1.0 0.00 600.1 1.00 ---
1 N N 100 100 100 1.0 1.0 0.00 600.1 1.00 YES
2 N N 200 200 200 1.0 1.0 0.01 1600.0 1.00 ---
2 N N 200 200 200 1.0 1.0 0.01 1600.2 1.00 YES
3 N N 300 300 300 1.0 1.0 0.04 1350.1 1.00 ---
3 N N 300 300 300 1.0 1.0 0.04 1350.1 1.00 YES
4 N N 400 400 400 1.0 1.0 0.09 1371.5 1.00 ---
4 N N 400 400 400 1.0 1.0 0.09 1422.3 1.04 YES
5 N N 500 500 500 1.0 1.0 0.18 1389.0 1.00 ---
5 N N 500 500 500 1.0 1.0 0.18 1389.0 1.00 YES
6 N N 600 600 600 1.0 1.0 0.31 1408.8 1.00 ---
6 N N 600 600 600 1.0 1.0 0.31 1408.8 1.00 YES
7 N N 700 700 700 1.0 1.0 0.49 1409.7 1.00 ---
7 N N 700 700 700 1.0 1.0 0.49 1409.7 1.00 YES
8 N N 800 800 800 1.0 1.0 0.73 1409.3 1.00 ---
8 N N 800 800 800 1.0 1.0 0.73 1409.3 1.00 YES
9 N N 900 900 900 1.0 1.0 1.03 1411.1 1.00 ---
9 N N 900 900 900 1.0 1.0 1.03 1411.1 1.00 YES
10 N N 1000 1000 1000 1.0 1.0 1.41 1418.5 1.00 ---
10 N N 1000 1000 1000 1.0 1.0 1.42 1408.5 0.99 YES
More information about the NumPy-Discussion
mailing list