[Numpy-discussion] very large matrices.

Anne Archibald peridot.faceted at gmail.com
Sun May 13 00:45:24 EDT 2007


On 12/05/07, Dave P. Novakovic <davidnovakovic at gmail.com> wrote:

> core 2 duo with 4gb RAM.
>
> I've heard about iterative svd functions. I actually need a complete
> svd, with all eigenvalues (not LSI). I'm actually more interested in
> the individual eigenvectors.
>
> As an example, a single row could probably have about 3000 non zero elements.

I think you need to think hard about whether your problem can be done
in another way.

First of all, the singular values (as returned from the svd) are not
eigenvalues - eigenvalue decomposition is a much harder problem,
numerically.

Second, your full non-sparse matrix will be 8*75000*75000 bytes, or
about 42 gibibytes. Put another way, the representation of your data
alone is ten times the size of the RAM on the machine you're using.

Third, your matrix has 225 000 000 nonzero entries; assuming a perfect
sparse representation with no extra bytes (at least two bytes per
entry is typical, usually more), that's 1.7 GiB.

Recall that basically any matrix operation is at least O(N^3), so you
can expect order 10^14 floating-point operations to be required. This
is actually the *least* significant constraint; pushing stuff into and
out of disk caches will be taking most of your time.

Even if you can represent your matrix sparsely (using only a couple of
gibibytes), you've said you want the full set of eigenvectors, which
is not likely to be a sparse matrix - so your result is back up to 42
GiB. And you should expect an eigenvalue algorithm, if it even
survives massive roundoff problems, to require something like that
much working space; thus your problem probably has a working size of
something like 84 GiB.

SVD is a little easier, if that's what you want, but the full solution
is twice as large, though if you discard entries corresponding to
small values it might be quite reasonable. You'll still need some
fairly specialized code, though. Which form are you looking for?

Solving your problem in a reasonable amount of time, as described and
on the hardware you specify, is going to require some very specialized
algorithms; you could try looking for an out-of-core eigenvalue
package, but I'd first look to see if there's any way you can simplify
your problem - getting just one eigenvector, maybe.

Anne



More information about the NumPy-Discussion mailing list