[Numpy-discussion] inverting and calculating eigenvalues for many small matrices

Tue Jul 12 11:19:07 EDT 2011

Den 11.07.2011 23:01, skrev Daniel Wheeler:
> The above uses "map" to fake a vector solution, but this is heinously
> slow. Are there any better ways to do this without resorting to cython
> or weave (would it even be faster (or possible) to use "np.linalg.eig"
> and "np.linalg.inv" within cython)?

I see two problems here:

    def inv(A):
     """
     Inverts N MxM matrices, A.shape = (M, M, N), inv(A).shape = (M, M, N).
     """
     return np.array(map(np.linalg.inv, A.transpose(2, 0, 1))).transpose(1, 2, 0)

At least get rid of the transpositions and non-contiguous memory access.

Use shape (N,M,M) in C order or (M,M,N) in Fortran order to make memory 
access more contiguous and cache friendly. Statements like 
A.transpose(2,0,1) are evil: they just burn the CPU and flood the memory 
bus; and cache, prefetching, etc. will do no good as everything is out 
of order.

LAPACK is written in Fortran, at least with scipy.linalg I would use 
shape (M,M,N) in Fortran order to avoid redundant transpositions by 
f2py. I am not sure what NumPy's lapack_lite does, though.

You will have a lot of Python overhead by using np.linalg.inv on each of 
the N matrices. Therefore, you don't gain much by using map instead of a 
for loop. Using map will save a few attribute lookups per loop, but 
there are dozens of them.

To make the loop over N matrices fast, there is nothing that beats a 
loop in C or Fortran (or Cython) if you have a 3D array. And that brings 
us to the second issue, which is that it would be nice if the solvers in 
numpy.linalg (and scipy.linalg) were vectorized for 3D arrays.

Calling np.linalg.inv from Cython will not help though, as you incur the 
same overhead as calling np.linalg.inv with map from Python.

Another question is if you really need to compute the inverse. A matrix 
inversion and subsequent matrix multiplication can be replaced by 
solving a linear system, which only takes half the amount of computation.

Sturla