inverting and calculating eigenvalues for many small matrices
Hi, I am trying to find the eigenvalues and eigenvectors as well as the inverse for a large number of small matrices. The matrix size (MxM) will typically range from 2x2 to 8x8 at most. The number of matrices (N) can be from 100 up to a million or more. My current solution is to define "eig" and "inv" to be, def inv(A): """ Inverts N MxM matrices, A.shape = (M, M, N), inv(A).shape = (M, M, N). """ return np.array(map(np.linalg.inv, A.transpose(2, 0, 1))).transpose(1, 2, 0) def eig(A): """ Calculate the eigenvalues and eigenvectors of N MxM matrices, A.shape = (M, M, N), eig(A)[0].shape = (M, N), eig(A)[1].shape = (M, M, N) """ tmp = zip(*map(np.linalg.eig, A.transpose(2, 0, 1))) return (np.array(tmp[0]).swapaxes(0,1), np.array(tmp[1]).transpose(1,2,0)) The above uses "map" to fake a vector solution, but this is heinously slow. Are there any better ways to do this without resorting to cython or weave (would it even be faster (or possible) to use "np.linalg.eig" and "np.linalg.inv" within cython)? I could write specialized versions when M=2 or M=3, which could be fully vectorized, but I'd rather have a general solution. Are there better algorithms than the ones used in "np.linalg.inv" and "np.linalg.eig" when M < 9, say, that I could hand code using numpy in a fully vectorized way? My end goal is to implement a Riemann flux calculation as part of a finite volume solver. This requires calculating "Abar = R E inv(R)" where R are the right eigenvectors of a matrix A and E is the matrix with the absolute value of the eigenvalues of A along the diagonal (both R and E are sorted in ascending order of their corresponding eigenvalues). As well as using "eig" and "inv" defined above, matrix multiplication and sorted eigenvalues and eigenvectors are also required. Fortunately, these can be vectorized as follows, def sum(a, axis=0): """ Faster than using np.sum. """ return np.tensordot(np.ones(a.shape[axis], 'l'), a, (0, axis)) def mul(A, B): """ Matrix multiply N MxM matrices, A.shape = B.shape = (M, M, N), mul(A, B).shape = (M, M, N) """ return sum(A.swapaxes(0,1)[:, :, np.newaxis] * B[:, np.newaxis], 0) def sortedeig(A): """ Calculates the sorted eigenvalues and eigenvectors of N MxM matrices, A.shape = (M, M, N), sortedeig(A)[0].shape = (M, N), sortedeig(A)[1].shape = (M, M, N). """ N = A.shape[-1] eigenvalues, R = eig(A) order = eigenvalues.argsort(0).swapaxes(0, 1) Nlist = [[i] for i in xrange(N)] return (eigenvalues[order, Nlist].swapaxes(0, 1), R[:, order, Nlist].swapaxes(1, 2)) The above two functions take very little time compared with "eig" and "inv". Using "sum" helps too. Given A, calculating "Abar = R E inv(R)" can then simply be written eigenvalues, R = sortedeig(A) E = abs(eigenvalues) * numerix.identity(eigenvalues.shape[0])[..., np.newaxis] Abar = mul(mul(R, E), inv(R)) Maybe there is a better way to calculate "Abar" rather than explicitly calculating the eigenvalues and eigenvectors for every matrix. Any help is much appreciated. -- Daniel Wheeler
On 07/11/2011 11:01 PM, Daniel Wheeler wrote:
Hi, I am trying to find the eigenvalues and eigenvectors as well as the inverse for a large number of small matrices. The matrix size (MxM) will typically range from 2x2 to 8x8 at most. The number of matrices (N) can be from 100 up to a million or more. My current solution is to define "eig" and "inv" to be,
def inv(A): """ Inverts N MxM matrices, A.shape = (M, M, N), inv(A).shape = (M, M, N). """ return np.array(map(np.linalg.inv, A.transpose(2, 0, 1))).transpose(1, 2, 0)
def eig(A): """ Calculate the eigenvalues and eigenvectors of N MxM matrices, A.shape = (M, M, N), eig(A)[0].shape = (M, N), eig(A)[1].shape = (M, M, N) """ tmp = zip(*map(np.linalg.eig, A.transpose(2, 0, 1))) return (np.array(tmp[0]).swapaxes(0,1), np.array(tmp[1]).transpose(1,2,0))
The above uses "map" to fake a vector solution, but this is heinously slow. Are there any better ways to do this without resorting to cython or weave (would it even be faster (or possible) to use "np.linalg.eig" and "np.linalg.inv" within cython)? I could write specialized versions
If you want to go the Cython route, here's a start: http://www.vetta.org/2009/09/tokyo-a-cython-blas-wrapper-for-fast-matrix-mat... Dag Sverre
On Tue, Jul 12, 2011 at 3:51 AM, Dag Sverre Seljebotn <d.s.seljebotn@astro.uio.no> wrote:
On 07/11/2011 11:01 PM, Daniel Wheeler wrote:
Hi, I am trying to find the eigenvalues and eigenvectors as well as the inverse for a large number of small matrices. The matrix size
If you want to go the Cython route, here's a start:
http://www.vetta.org/2009/09/tokyo-a-cython-blas-wrapper-for-fast-matrix-mat...
Thanks for the heads up. Looks like an option. Presumably, it would still have to use "map" even with more direct access to BLAS (still going C <-> python for every matrix)? Also, adding extra non-standard dependencies is a problem as this code is part of a production code that's passed onto others. -- Daniel Wheeler
On 07/12/2011 04:10 PM, Daniel Wheeler wrote:
On Tue, Jul 12, 2011 at 3:51 AM, Dag Sverre Seljebotn <d.s.seljebotn@astro.uio.no> wrote:
On 07/11/2011 11:01 PM, Daniel Wheeler wrote:
Hi, I am trying to find the eigenvalues and eigenvectors as well as the inverse for a large number of small matrices. The matrix size
If you want to go the Cython route, here's a start:
http://www.vetta.org/2009/09/tokyo-a-cython-blas-wrapper-for-fast-matrix-mat...
Thanks for the heads up. Looks like an option. Presumably, it would still have to use "map" even with more direct access to BLAS (still going C<-> python for every matrix)? Also, adding extra non-standard dependencies is a problem as this code is part of a production code that's passed onto others.
I was thinking you'd use it as a starting point, and actually write low-level for-loops indexing the buffer data pointers in Cython. If you make sure that np.ascontiguousarray or np.asfortranarray, you can do cimport numpy as np np.import_array() ... def func(np.ndarray[double, ndim=3, mode='fortran'] arr): double *buf = <double*>PyArray_DATA(arr) # low-level C-like code to get slices of buf and pass to BLAS Dag Sverre
On Tue, Jul 12, 2011 at 10:52 AM, Dag Sverre Seljebotn <d.s.seljebotn@astro.uio.no> wrote:
On 07/12/2011 04:10 PM, Daniel Wheeler wrote:
On Tue, Jul 12, 2011 at 3:51 AM, Dag Sverre Seljebotn Thanks for the heads up. Looks like an option. Presumably, it would still have to use "map" even with more direct access to BLAS (still going C<-> python for every matrix)? Also, adding extra non-standard dependencies is a problem as this code is part of a production code that's passed onto others.
I was thinking you'd use it as a starting point, and actually write low-level for-loops indexing the buffer data pointers in Cython.
I realized that as soon as I'd hit the send button. Thanks. -- Daniel Wheeler
Hi, I put together a set of tools for inverting, multiplying and finding eigenvalues for many small matrices (arrays of shape (N, M, M) where MxM is the size of each matrix). Thanks to the posoter who suggested using the Tokyo package. Although not used directly, it helped with figuring the correct arguments to pass to the lapack routines and getting stated with cython. I put the code up here if anyone happens to be interested. <http://matforge.org/fipy/browser/sandbox/smallMatrixTools/smt> It consists of three files, smallMatrixTools.py, smt.pyx amd smt.pxd. The speed tests comparing with numpy came out something like this... N, M, M: 10000, 2, 2 mulinv speed up: 65.9, eigvec speed up: 11.2 N, M, M: 10000, 3, 3 mulinv speed up: 32.3, eigvec speed up: 7.2 N, M, M: 10000, 4, 4 mulinv speed up: 24.1, eigvec speed up: 5.9 N, M, M: 10000, 5, 5 mulinv speed up: 17.0, eigvec speed up: 5.2 for random matrices. Not bad speed ups, but not out of this world. I'm new to cython so there may be some costly mistakes in the implementation. I profiled and it seems that most of the time is now being spent in the lapack routines, but still not completely convinced by the profiling results. One thing that I know I'm doing wrong is reassigning every sub-matrix to a new array. This may not be that costly, but it seems fairly ugly. I wasn't sure how to pass the address of the submatrix to the lapack routines so I'm assigning to a new array and passing that instead. I tested with <http://matforge.org/fipy/browser/sandbox/smallMatrixTools/smt/test.py> and speed tests were done using <http://matforge.org/fipy/browser/sandbox/smallMatrixTools/smt/speedTest.py>. Cheers On Tue, Jul 12, 2011 at 3:51 AM, Dag Sverre Seljebotn <d.s.seljebotn@astro.uio.no> wrote:
On 07/11/2011 11:01 PM, Daniel Wheeler wrote:
Hi, I am trying to find the eigenvalues and eigenvectors as well as the inverse for a large number of small matrices. The matrix size (MxM) will typically range from 2x2 to 8x8 at most. The number of matrices (N) can be from 100 up to a million or more. My current solution is to define "eig" and "inv" to be,
def inv(A): """ Inverts N MxM matrices, A.shape = (M, M, N), inv(A).shape = (M, M, N). """ return np.array(map(np.linalg.inv, A.transpose(2, 0, 1))).transpose(1, 2, 0)
def eig(A): """ Calculate the eigenvalues and eigenvectors of N MxM matrices, A.shape = (M, M, N), eig(A)[0].shape = (M, N), eig(A)[1].shape = (M, M, N) """ tmp = zip(*map(np.linalg.eig, A.transpose(2, 0, 1))) return (np.array(tmp[0]).swapaxes(0,1), np.array(tmp[1]).transpose(1,2,0))
The above uses "map" to fake a vector solution, but this is heinously slow. Are there any better ways to do this without resorting to cython or weave (would it even be faster (or possible) to use "np.linalg.eig" and "np.linalg.inv" within cython)? I could write specialized versions
If you want to go the Cython route, here's a start:
http://www.vetta.org/2009/09/tokyo-a-cython-blas-wrapper-for-fast-matrix-mat...
Dag Sverre _______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
-- Daniel Wheeler
On 2011-08-15, at 4:11 PM, Daniel Wheeler wrote:
One thing that I know I'm doing wrong is reassigning every sub-matrix to a new array. This may not be that costly, but it seems fairly ugly. I wasn't sure how to pass the address of the submatrix to the lapack routines so I'm assigning to a new array and passing that instead.
It looks like the arrays you're passing are C contiguous. Am I right about this? (I ask because I was under the impression that BLAS/LAPACK routines typically want Fortran-ordered input arrays). If your 3D array is also C-contiguous, you should be able to do pointer arithmetic with A.data and B.data. foo.strides[0] will tell you how many bytes you need to move to get to the next element along that axis. If the 3D array is anything but C contiguous, then I believe the copy is necessary. You should check for that in your Python-visible "solve" wrapper, and make a copy of it that is C contiguous if necessary (check foo.flags.c_contiguous), as this will be likely faster than copying to the same buffer each time in the loop. David
On Tue, Aug 16, 2011 at 5:53 AM, David Warde-Farley <wardefar@iro.umontreal.ca> wrote:
On 2011-08-15, at 4:11 PM, Daniel Wheeler wrote:
One thing that I know I'm doing wrong is reassigning every sub-matrix to a new array. This may not be that costly, but it seems fairly ugly. I wasn't sure how to pass the address of the submatrix to the lapack routines so I'm assigning to a new array and passing that instead.
It looks like the arrays you're passing are C contiguous. Am I right about this? (I ask because I was under the impression that BLAS/LAPACK routines typically want Fortran-ordered input arrays).
Are you saying that fortran ordered arrays should be passed? The tests pass when compared against doing numpy equivalents so I don't believe that its currently broken (maybe suboptimal). There is a transpose and copy here <http://matforge.org/fipy/browser/sandbox/smallMatrixTools/smt/smt.pyx#L65> and here <http://matforge.org/fipy/browser/sandbox/smallMatrixTools/smt/smt.pyx#L80>. I believe that reorders correctly. Maybe I should cast the arrays to explicit fortran ordering rather than doing that (not sure how)? However, the transpose and copy doesn't seem to be expensive compared with the actual lapack routines.
If your 3D array is also C-contiguous, you should be able to do pointer arithmetic with A.data and B.data. foo.strides[0] will tell you how many bytes you need to move to get to the next element along that axis.
Sounds complicated, but I'll try and figure it out. Thanks for the idea.
If the 3D array is anything but C contiguous, then I believe the copy is necessary. You should check for that in your Python-visible "solve" wrapper, and make a copy of it that is C contiguous if necessary (check foo.flags.c_contiguous), as this will be likely faster than copying to the same buffer each time in the loop.
The copy is required after the transpose (which is required for fortran ordering). I'll look into the pointer arithmetic stuff and see if that helps any. Thanks. -- Daniel Wheeler
Den 11.07.2011 23:01, skrev Daniel Wheeler:
The above uses "map" to fake a vector solution, but this is heinously slow. Are there any better ways to do this without resorting to cython or weave (would it even be faster (or possible) to use "np.linalg.eig" and "np.linalg.inv" within cython)?
I see two problems here: def inv(A): """ Inverts N MxM matrices, A.shape = (M, M, N), inv(A).shape = (M, M, N). """ return np.array(map(np.linalg.inv, A.transpose(2, 0, 1))).transpose(1, 2, 0) At least get rid of the transpositions and non-contiguous memory access. Use shape (N,M,M) in C order or (M,M,N) in Fortran order to make memory access more contiguous and cache friendly. Statements like A.transpose(2,0,1) are evil: they just burn the CPU and flood the memory bus; and cache, prefetching, etc. will do no good as everything is out of order. LAPACK is written in Fortran, at least with scipy.linalg I would use shape (M,M,N) in Fortran order to avoid redundant transpositions by f2py. I am not sure what NumPy's lapack_lite does, though. You will have a lot of Python overhead by using np.linalg.inv on each of the N matrices. Therefore, you don't gain much by using map instead of a for loop. Using map will save a few attribute lookups per loop, but there are dozens of them. To make the loop over N matrices fast, there is nothing that beats a loop in C or Fortran (or Cython) if you have a 3D array. And that brings us to the second issue, which is that it would be nice if the solvers in numpy.linalg (and scipy.linalg) were vectorized for 3D arrays. Calling np.linalg.inv from Cython will not help though, as you incur the same overhead as calling np.linalg.inv with map from Python. Another question is if you really need to compute the inverse. A matrix inversion and subsequent matrix multiplication can be replaced by solving a linear system, which only takes half the amount of computation. Sturla
On Tue, Jul 12, 2011 at 11:19 AM, Sturla Molden <sturla@molden.no> wrote:
Den 11.07.2011 23:01, skrev Daniel Wheeler: To make the loop over N matrices fast, there is nothing that beats a loop in C or Fortran (or Cython) if you have a 3D array. And that brings us to the second issue, which is that it would be nice if the solvers in numpy.linalg (and scipy.linalg) were vectorized for 3D arrays.
Amen.
Another question is if you really need to compute the inverse. A matrix inversion and subsequent matrix multiplication can be replaced by solving a linear system, which only takes half the amount of computation.
Good idea. One possibility for avoiding cython for the inverse is to populate a sparse matrix in pysparse (or scipy) with the small matrices and then linear solve as I don't need the explicit inverse. However, that doesn't help with the eigenvalues. -- Daniel Wheeler
On Mon, Jul 11, 2011 at 05:01:07PM -0400, Daniel Wheeler wrote:
Hi, I am trying to find the eigenvalues and eigenvectors as well as the inverse for a large number of small matrices. The matrix size (MxM) will typically range from 2x2 to 8x8 at most.
If you really care about speed, for matrices of this size you shouldn't call a linear algebra pack, but simply precompute the close-form solution. Here is sympy code that I used a while ago to generate fast code to do inverse of SPD matrices. G
On Tue, Jul 12, 2011 at 11:30 AM, Gael Varoquaux <gael.varoquaux@normalesup.org> wrote:
On Mon, Jul 11, 2011 at 05:01:07PM -0400, Daniel Wheeler wrote:
Hi, I am trying to find the eigenvalues and eigenvectors as well as the inverse for a large number of small matrices. The matrix size (MxM) will typically range from 2x2 to 8x8 at most.
If you really care about speed, for matrices of this size you shouldn't call a linear algebra pack, but simply precompute the close-form solution. Here is sympy code that I used a while ago to generate fast code to do inverse of SPD matrices.
G
This has been a very timely discussion for me since I'm looking to do the same thing with a different application. My interest in diagonalizing lots of small matrices (not inverting). I believe the OP was also interested in eigenvalues. Gael, your code addresses inverses, but I take it something similar for eigenvalues of a matrix bigger than 5x5 doesn't exists since a closed-form solution doesn't exist for finding polynomials roots for order > 5? The ones I'm looking at now happen to be 3x3, so I was thinking of using http://en.wikipedia.org/wiki/Eigenvalue_algorithm#Eigenvalues_of_a_Symmetric... but I might have anywhere from 2 to 10 at some point. (To add another spin to this, I recently acquired an NVIDIA Tesla card and am thinking of using it for this problem.) Thanks, Greg
On Tue, Jul 12, 2011 at 12:16:29PM -0400, greg whittier wrote:
Gael, your code addresses inverses, but I take it something similar for eigenvalues of a matrix bigger than 5x5 doesn't exists since a closed-form solution doesn't exist for finding polynomials roots for order > 5?
I guess so :).
The ones I'm looking at now happen to be 3x3, so I was thinking of using http://en.wikipedia.org/wiki/Eigenvalue_algorithm#Eigenvalues_of_a_Symmetric... but I might have anywhere from 2 to 10 at some point.
I am afraid that this is beyond my skills. Sorry ;$. G
participants (6)
-
Dag Sverre Seljebotn
-
Daniel Wheeler
-
David Warde-Farley
-
Gael Varoquaux
-
greg whittier
-
Sturla Molden