Hi! I have a very large matrix that I am using with the scipy.optimize.nnls function, however the matrix is so large that it takes 30Gb of memory to load with python! I was thinking about using a sparse matrix since it is relatively sparse, but the problem is that even if I was to use the sparse matricies, the nnls function only accepts a ndarray, not a sparse matrix. (when I try and throw a sparse matrix at it I get an error.) So of course I'd need to convert it to a dense array before passing it into nnls, but that would totally void the whole point of a sparse array, because then python would still have to allocate the full dense matrix. Does anyone have an idea about how to efficiently pass a matrix and store it without blowing up the memory usage? Thank you, Calvin
Hey Calvin, I was just looking into this same issue recently. Here's the solution someone recommended on stackoverflow: http://stackoverflow.com/questions/1053928/python-numpy-very-large-matrices I haven't actually used it yet, but it seems Pytables is the way to go. Good Luck, Mike On Thu, Mar 28, 2013 at 12:46 PM, Calvin Morrison <mutantturkey@gmail.com>wrote:
Hi!
I have a very large matrix that I am using with the scipy.optimize.nnls function, however the matrix is so large that it takes 30Gb of memory to load with python!
I was thinking about using a sparse matrix since it is relatively sparse, but the problem is that even if I was to use the sparse matricies, the nnls function only accepts a ndarray, not a sparse matrix. (when I try and throw a sparse matrix at it I get an error.)
So of course I'd need to convert it to a dense array before passing it into nnls, but that would totally void the whole point of a sparse array, because then python would still have to allocate the full dense matrix.
Does anyone have an idea about how to efficiently pass a matrix and store it without blowing up the memory usage?
Thank you,
Calvin _______________________________________________ SciPy-User mailing list SciPy-User@scipy.org http://mail.scipy.org/mailman/listinfo/scipy-user
28.03.2013 21:46, Calvin Morrison kirjoitti: [clip]
I was thinking about using a sparse matrix since it is relatively sparse, but the problem is that even if I was to use the sparse matricies, the nnls function only accepts a ndarray, not a sparse matrix. (when I try and throw a sparse matrix at it I get an error.)
The nnls algorithm in Scipy relies on dense matrix algebra, and is moreover written in Fortran. So, there is no way to tell it to use sparse matrices. You'll need to find an implementation of NNLS algorithm that either is matrix-free or works for sparse problems. If you find such code, be sure to reply to this list --- it might be useful to include it in Scipy, provided the license is compatible. -- Pauli Virtanen
Paul, It seems nobody wants to touch the nnls algorithm because the only implementation that is floating around is the one from the original publication or automatic conversions of it. Calvin On Mar 28, 2013 4:48 PM, "Pauli Virtanen" <pav@iki.fi> wrote:
28.03.2013 21:46, Calvin Morrison kirjoitti: [clip]
I was thinking about using a sparse matrix since it is relatively sparse, but the problem is that even if I was to use the sparse matricies, the nnls function only accepts a ndarray, not a sparse matrix. (when I try and throw a sparse matrix at it I get an error.)
The nnls algorithm in Scipy relies on dense matrix algebra, and is moreover written in Fortran. So, there is no way to tell it to use sparse matrices.
You'll need to find an implementation of NNLS algorithm that either is matrix-free or works for sparse problems. If you find such code, be sure to reply to this list --- it might be useful to include it in Scipy, provided the license is compatible.
-- Pauli Virtanen
_______________________________________________ SciPy-User mailing list SciPy-User@scipy.org http://mail.scipy.org/mailman/listinfo/scipy-user
On Mar 28, 2013, at 5:33 PM, Calvin Morrison wrote:
It seems nobody wants to touch the nnls algorithm because the only implementation that is floating around is the one from the original publication or automatic conversions of it.
For whatever it's worth, the second google hit for "nnls sparse" is http://www.michaelpiatek.com/papers/tsnnls.pdf "tsnnls: A solver for large sparse least squares problems with non-negative variables The solution of large, sparse constrained least-squares problems is a staple in scientific and engineering applications. However, currently available codes for such problems are proprietary or based on MATLAB. We announce a freely available C implementation of the fast block pivoting algorithm of Portugal, Judice, and Vicente. Our version is several times faster than Matstoms’ MATLAB implementation of the same algorithm. Further, our code matches the accuracy of MATLAB’s built-in lsqnonneg function." All links to the code seem to be dead, but it's probably worth contacting the authors.
Unforunately, Tsnnls might have been fast in 2001, trying it on a moderatley sized dataset is beyond slow Calvin On Apr 1, 2013 8:57 AM, "Jonathan Guyer" <guyer@nist.gov> wrote:
On Mar 28, 2013, at 5:33 PM, Calvin Morrison wrote:
It seems nobody wants to touch the nnls algorithm because the only implementation that is floating around is the one from the original publication or automatic conversions of it.
For whatever it's worth, the second google hit for "nnls sparse" is
http://www.michaelpiatek.com/papers/tsnnls.pdf
"tsnnls: A solver for large sparse least squares problems with non-negative variables
The solution of large, sparse constrained least-squares problems is a staple in scientific and engineering applications. However, currently available codes for such problems are proprietary or based on MATLAB. We announce a freely available C implementation of the fast block pivoting algorithm of Portugal, Judice, and Vicente. Our version is several times faster than Matstoms’ MATLAB implementation of the same algorithm. Further, our code matches the accuracy of MATLAB’s built-in lsqnonneg function."
All links to the code seem to be dead, but it's probably worth contacting the authors. _______________________________________________ SciPy-User mailing list SciPy-User@scipy.org http://mail.scipy.org/mailman/listinfo/scipy-user
Hey Calvin, On Mon, Apr 1, 2013 at 6:07 AM, Calvin Morrison <mutantturkey@gmail.com>wrote:
Unforunately,
Tsnnls might have been fast in 2001, trying it on a moderatley sized dataset is beyond slow
Calvin On Apr 1, 2013 8:57 AM, "Jonathan Guyer" <guyer@nist.gov> wrote:
On Mar 28, 2013, at 5:33 PM, Calvin Morrison wrote:
It seems nobody wants to touch the nnls algorithm because the only implementation that is floating around is the one from the original publication or automatic conversions of it.
For whatever it's worth, the second google hit for "nnls sparse" is
http://www.michaelpiatek.com/papers/tsnnls.pdf
"tsnnls: A solver for large sparse least squares problems with non-negative variables
The solution of large, sparse constrained least-squares problems is a staple in scientific and engineering applications. However, currently available codes for such problems are proprietary or based on MATLAB. We announce a freely available C implementation of the fast block pivoting algorithm of Portugal, Judice, and Vicente. Our version is several times faster than Matstoms’ MATLAB implementation of the same algorithm. Further, our code matches the accuracy of MATLAB’s built-in lsqnonneg function."
All links to the code seem to be dead, but it's probably worth contacting the authors. _______________________________________________ SciPy-User mailing list SciPy-User@scipy.org http://mail.scipy.org/mailman/listinfo/scipy-user
_______________________________________________ SciPy-User mailing list SciPy-User@scipy.org http://mail.scipy.org/mailman/listinfo/scipy-user
I've found stochastic gradient descent to be very useful for this kind of thing. Here's an implementation, adapted from a colleague's Matlab implementation: https://gist.github.com/arokem/5389417 HTH, Ariel
Ariel, Thank you! I will sure to take a look at it! Calvin On 15 April 2013 12:39, Ariel Rokem <arokem@gmail.com> wrote:
Hey Calvin,
On Mon, Apr 1, 2013 at 6:07 AM, Calvin Morrison <mutantturkey@gmail.com> wrote:
Unforunately,
Tsnnls might have been fast in 2001, trying it on a moderatley sized dataset is beyond slow
Calvin
On Apr 1, 2013 8:57 AM, "Jonathan Guyer" <guyer@nist.gov> wrote:
On Mar 28, 2013, at 5:33 PM, Calvin Morrison wrote:
It seems nobody wants to touch the nnls algorithm because the only implementation that is floating around is the one from the original publication or automatic conversions of it.
For whatever it's worth, the second google hit for "nnls sparse" is
http://www.michaelpiatek.com/papers/tsnnls.pdf
"tsnnls: A solver for large sparse least squares problems with non-negative variables
The solution of large, sparse constrained least-squares problems is a staple in scientific and engineering applications. However, currently available codes for such problems are proprietary or based on MATLAB. We announce a freely available C implementation of the fast block pivoting algorithm of Portugal, Judice, and Vicente. Our version is several times faster than Matstoms’ MATLAB implementation of the same algorithm. Further, our code matches the accuracy of MATLAB’s built-in lsqnonneg function."
All links to the code seem to be dead, but it's probably worth contacting the authors. _______________________________________________ SciPy-User mailing list SciPy-User@scipy.org http://mail.scipy.org/mailman/listinfo/scipy-user
_______________________________________________ SciPy-User mailing list SciPy-User@scipy.org http://mail.scipy.org/mailman/listinfo/scipy-user
I've found stochastic gradient descent to be very useful for this kind of thing.
Here's an implementation, adapted from a colleague's Matlab implementation:
https://gist.github.com/arokem/5389417
HTH,
Ariel
_______________________________________________ SciPy-User mailing list SciPy-User@scipy.org http://mail.scipy.org/mailman/listinfo/scipy-user
participants (5)
-
Ariel Rokem -
Calvin Morrison -
Jonathan Guyer -
Michael Morrison -
Pauli Virtanen