Hi all, I'd created a new SVN branch with some new functionality and changes to sparse matrices. This post describes what's new and what's different versus the main branch. If you have an existing SVN tree, you can pull in my branch with the command "svn switch http://svn.scipy.org/svn/scipy/branches/ejs". One of the conclusions of Jonathan Guyer's comparison between scipy.sparse and PySparse in November (http://article.gmane.org/gmane.comp.python.scientific.devel/3187/match=scipy... and http://old.scipy.org/wikis/featurerequests/SparseSolvers) was that SciPy's support for efficient construction of sparse matrices is weak. My patch adds a new data type, lil_matrix, that stores non-zeros as a list of sorted Python lists. For the simple benchmark of creating a new 10^3 x 10^5 matrix with 10^4 non-zero elements in random locations and converting to CSR and CSC matrices, the lil_matrix format is slightly more than twice as fast as dok_matrix. It's a row-wise format, so conversion to CSR is very fast, whereas conversion to CSC goes through CSR internally. Index lookups use binary searches, so they take log time. I think the implementation is already complete enough for most uses -- so please try it out and tell me how you go! Another of Jonathan's observations was that SciPy had no support for slicing matrices with A[i, :] like in PySparse. My patch adds support for slice notation and NumPy-style fancy indexing to dok_matrix and lil_matrix objects. With this it's possible to build sparse matrices quickly -- for example:
a = lil_matrix((3,5)) a[1,[2,3,4]] = range(3) a[2,:] = 3 * a[1,:]
A third new feature is a .sum() method, which takes a single axis argument like in NumPy. My branch also changes one aspect of the existing behaviour: the todense() method now returns a dense (NumPy) matrix, rather than a dense array. Converting to a dense array is now available under a toarray() method (or .A attribute) instead. The rationale behind this change is to emphasize that sparse matrices are closer to dense matrices than to dense arrays in their attributes (e.g. .T for transpose) and behaviour (e.g. * means inner product). I've also been careful to make multiplication between sparse matrices and dense row or column vectors (matrix objects with unit length or height) return matrices of the correct dimensions, rather than arrays. Several unit tests relied on the old behaviour, and I've changed these accordingly. Most of these test changes are just simplifications -- for example assert_array_equal((a*c).todense(), a.todense() * c) instead of assert_array_equal((a*c).todense(), dot(a.todense(), c)) -- but I'd appreciate some criticism and feedback on which behaviour people prefer. These changes have highlighted a problem present in both the main trunk and my branch: that multiplying a dense matrix 'a' by a sparse matrix 'b' is not possible using the syntax 'a*b'. I'll follow this up with a proposal to numpy-discussion on how we can solve this. -- Ed
Ed Schofield wrote:
Hi all,
I'd created a new SVN branch with some new functionality and changes to sparse matrices. This post describes what's new and what's different versus the main branch. If you have an existing SVN tree, you can pull in my branch with the command "svn switch http://svn.scipy.org/svn/scipy/branches/ejs".
One of the conclusions of Jonathan Guyer's comparison between scipy.sparse and PySparse in November (http://article.gmane.org/gmane.comp.python.scientific.devel/3187/match=scipy... and http://old.scipy.org/wikis/featurerequests/SparseSolvers) was that SciPy's support for efficient construction of sparse matrices is weak. My patch adds a new data type, lil_matrix, that stores non-zeros as a list of sorted Python lists. For the simple benchmark of creating a new 10^3 x 10^5 matrix with 10^4 non-zero elements in random locations and converting to CSR and CSC matrices, the lil_matrix format is slightly more than twice as fast as dok_matrix. It's a row-wise format, so conversion to CSR is very fast, whereas conversion to CSC goes through CSR internally. Index lookups use binary searches, so they take log time. I think the implementation is already complete enough for most uses -- so please try it out and tell me how you go!
Another of Jonathan's observations was that SciPy had no support for slicing matrices with A[i, :] like in PySparse. My patch adds support for slice notation and NumPy-style fancy indexing to dok_matrix and lil_matrix objects. With this it's possible to build sparse matrices quickly -- for example:
a = lil_matrix((3,5)) a[1,[2,3,4]] = range(3) a[2,:] = 3 * a[1,:]
A third new feature is a .sum() method, which takes a single axis argument like in NumPy.
My branch also changes one aspect of the existing behaviour: the todense() method now returns a dense (NumPy) matrix, rather than a dense array. Converting to a dense array is now available under a toarray() method (or .A attribute) instead. The rationale behind this change is to emphasize that sparse matrices are closer to dense matrices than to dense arrays in their attributes (e.g. .T for transpose) and behaviour (e.g. * means inner product). I've also been careful to make multiplication between sparse matrices and dense row or column vectors (matrix objects with unit length or height) return matrices of the correct dimensions, rather than arrays. Several unit tests relied on the old behaviour, and I've changed these accordingly. Most of these test changes are just simplifications -- for example
assert_array_equal((a*c).todense(), a.todense() * c)
instead of
assert_array_equal((a*c).todense(), dot(a.todense(), c))
-- but I'd appreciate some criticism and feedback on which behaviour people prefer.
These changes have highlighted a problem present in both the main trunk and my branch: that multiplying a dense matrix 'a' by a sparse matrix 'b' is not possible using the syntax 'a*b'. I'll follow this up with a proposal to numpy-discussion on how we can solve this.
-- Ed
_______________________________________________ SciPy-user mailing list SciPy-user@scipy.net http://www.scipy.net/mailman/listinfo/scipy-user
Hi Ed, That sounds very interesting ! How do I install your branch ? And can I use the current implementation of sparse once I have installed your branch ? Nils
Nils Wagner wrote:
Hi Ed,
That sounds very interesting ! How do I install your branch ? And can I use the current implementation of sparse once I have installed your branch ?
You can switch to it using: svn switch http://svn.scipy.org/svn/scipy/branches/ejs from your current SVN directory. Then type "python setup.py install" as normal. To revert to the mainline, use: svn switch http://svn.scipy.org/svn/scipy/trunk and reinstall. -- Ed
On Mon, 27 Feb 2006 11:00:05 +0100 Ed Schofield <schofield@ftw.at> wrote:
Nils Wagner wrote:
Hi Ed,
That sounds very interesting ! How do I install your branch ? And can I use the current implementation of sparse once I have installed your branch ?
You can switch to it using: svn switch http://svn.scipy.org/svn/scipy/branches/ejs from your current SVN directory. Then type "python setup.py install" as normal.
To revert to the mainline, use: svn switch http://svn.scipy.org/svn/scipy/trunk and reinstall.
-- Ed
_______________________________________________ SciPy-user mailing list SciPy-user@scipy.net http://www.scipy.net/mailman/listinfo/scipy-user
Hi Ed, Just curious. Is it planned to integrate it in the main tree soon ? Nils
Hi Ed, Ed Schofield wrote:
My branch also changes one aspect of the existing behaviour: the todense() method now returns a dense (NumPy) matrix, rather than a dense array. Converting to a dense array is now available under a toarray() method (or .A attribute) instead. The rationale behind this change is to emphasize that sparse matrices are closer to dense matrices than to dense arrays in their attributes (e.g. .T for transpose) and behaviour (e.g. * means inner product). I've also been careful to make multiplication between sparse matrices and dense row or column vectors (matrix objects with unit length or height) return matrices of the correct dimensions, rather than arrays. Several unit tests relied on the old behaviour, and I've changed these accordingly. Most of these test changes are just simplifications -- for example
assert_array_equal((a*c).todense(), a.todense() * c)
instead of
assert_array_equal((a*c).todense(), dot(a.todense(), c)) -- but I'd appreciate some criticism and feedback on which behaviour people prefer.
well done! I think it's better your way - if a vector (i.e. 2D array) is put in, it is expected that a vector gous out as well. Do you also plan to add the c-based linked-list matrix as in PySparse (ll_mat.c there)? This could be even faster than using the Python lists (IMHO...).
These changes have highlighted a problem present in both the main trunk and my branch: that multiplying a dense matrix 'a' by a sparse matrix 'b' is not possible using the syntax 'a*b'. I'll follow this up with a proposal to numpy-discussion on how we can solve this.
I am very curious how to do this. Surely NumPy will have to be aware of existence of sparse matrix objects, right? r.
Hi Robert, On 27/02/2006, at 11:23 AM, Robert Cimrman wrote:
well done! I think it's better your way - if a vector (i.e. 2D array) is put in, it is expected that a vector goes out as well.
Okay, great. This was probably the only change that could be controversial, so if you're happy (and nobody else objects) I'll merge the whole patch.
Do you also plan to add the c-based linked-list matrix as in PySparse (ll_mat.c there)? This could be even faster than using the Python lists (IMHO...).
Well, I guess it would be nice to have, and the code's already written, but I don't know how we'd make it derive from the spmatrix base class, which is written in Python. Travis mentioned back in October that this is possible but not easy. So it would require some work. I don't need the extra speed personally -- the new class seems to be fast enough for my needs (the bottleneck for my work is now elsewhere :)
These changes have highlighted a problem present in both the main trunk and my branch: that multiplying a dense matrix 'a' by a sparse matrix 'b' is not possible using the syntax 'a*b'. I'll follow this up with a proposal to numpy-discussion on how we can solve this.
I am very curious how to do this. Surely NumPy will have to be aware of existence of sparse matrix objects, right?
An update: I've changed the matrix.__mul__ function in NumPy SVN to return NotImplemented if the right operand defines __rmul__ and isn't a NumPy-compatible type. This seems to work fine for * now. Functions like numpy.dot() still won't work on sparse matrices, but I don't really have a problem with this ;) -- Ed
Do you also plan to add the c-based linked-list matrix as in PySparse (ll_mat.c there)? This could be even faster than using the Python lists (IMHO...).
Well, I guess it would be nice to have, and the code's already written, but I don't know how we'd make it derive from the spmatrix base class, which is written in Python. Travis mentioned back in October that this is possible but not easy. So it would require some
well, we might eventually write a llmat class in Python that will 'borrow' (and appreciate, of course ;-)) relevant ll_mat functions - we do not need the ll_mat Python object. I can live without it for now, though :-)
work. I don't need the extra speed personally -- the new class seems to be fast enough for my needs (the bottleneck for my work is now elsewhere :)
OK, I see... Can you tell me where? Just curious :-)
An update: I've changed the matrix.__mul__ function in NumPy SVN to return NotImplemented if the right operand defines __rmul__ and isn't a NumPy-compatible type. This seems to work fine for * now. Functions like numpy.dot() still won't work on sparse matrices, but I don't really have a problem with this ;)
Fine with me... In the meantime, I have added a rudimentary umfpack support to the sparse module - it is used when present by 'solve' (and can be switched off). I have also fixed the umfpack module in the sandbox for complex matrices. (At least I hope so :)) Still, the umfpack must be installed separately, doing the classical 'python setup.py install' in its sandbox home, because I am still struggling with a proper system_info class to detect the umfpack libraries in the system. Any help/ideas would be appreciated. r.
participants (3)
-
Ed Schofield -
Nils Wagner -
Robert Cimrman