I'm curious to see if anyone has looked at the implementation of
dot/matrixmultiply recently. I was doing a benchmark lapack-based matrix
inversion and was surprised to find that the accuracy check was taking
longer than the inversion. The bottleneck was the matrix multiply. It
appears that a blas dgemm implementation might give a large improvement.
Here is a sample benchmark using PyLapack's dblas.dgemm for comparison.
Results for a 1000 x 1000 matrix inversion:
time elapsed for inverse (sec) 10.050768
time elapsed for matrixmultiply (sec) 24.142714
time elapsed for dgemm-matrixmultiply (sec) 5.997188
max error is: 3.30398486348e-10
Given the "dot" code doesn't look too bad, so I might be able to add dgemm
support myself, at least for some cases.
-chris
p.s. I'll include the test code below
#####################################
import time
from Numeric import *
from LinearAlgebra import *
import dblas
for N in [5,100, 1000]:
a = reshape(arange(float(N*N)), (N,N)) + identity(N)
# print "typecode of array is: ", a.typecode()
start = time.time()
inv_a = inverse(a)
stop = time.time()
# print inv_a
startmult = time.time()
b = matrixmultiply(a, inv_a)
stopmult = time.time()
residual = b - identity(N)
#DGEMM(TRANSA, TRANSB, M, N, K, ALPHA,A, LDA, B, LDB, BETA,C,LDC)
C = zeros(a.shape, a.typecode())
alpha = array([1.0])
beta = array([1.0])
startdgemm = time.time()
dblas.dgemm('N','N', N,N,N, alpha,
a,N, # A, LDA
inv_a, N, # B, LDB
beta, # BETA
C, N)
stopdgemm = time.time()
# print C
print "%d x %d matrix inversion" % (N,N)
print "max error is:",
print maximum.reduce(fabs(ravel(residual)))
print "time elapsed for inverse (sec) %f" % (stop-start)
print "time elapsed for matrixmultiply (sec) %f" % (stopmult-startmult)
print "time elapsed for dgemm-matrixmultiply (sec) %f" % (stopdgemm-startdgemm)
print