[Numpy-discussion] sparse vectors / matrices / tensors

Yannick Versley yversley at gmail.com
Tue Sep 20 10:42:18 EDT 2011

Hi all,

I've been working quite a lot with sparse vectors and sparse matrices
as feature vectors in the context of machine learning), and have noticed
that they
do crop up in a lot of places (e.g. the CVXOPT library, in scikits, ...) and
that people
tend to either reinvent the wheel (i.e. implement a complete sparse matrix
library) or
pretend that no separate data structure is needed (i.e. always passing along
pairs of
coordinate and data arrays).
I do think there would be some benefit to having sparse vectors/matrices or
(in parallel to numpy's arrays, which can be vectors or arbitrary-order
tensors) with a
standardized interface so that different packages (e.g. eigenvalue/SVD
least-squares and other QPs, but possibly also things like numpy.bincount as
well as
computations that are more domain-specific) can be more interoperable than
they are now.

One problem that I see is that people doing PDE solving usually want banded
whereas other people (including me) do most of their work with
coordinate-list or CSR
matrices, which normally means some variation in the actual implementations
for different
domains, and it's also possible that the most convenient interface for a
sparse matrix is not
the most convenient one for a dense matrix (and vice-versa), but I think it
would be nice
if there were some kind of standardized data structure or maybe just a
standardized vtable-based
interface (similar to Python's buffer interface) that would allow all sparse
matrix packages to
interoperate with each other in some meaningful (even if not most-efficient)

I'd be willing to adapt the code that I have (C++- and Cython-based) to this
kind of interface
to provide some kind of 'reference' implementation, but before inventing the
N+1th solution
to the problem, I'd be curious what other people's opinions are.

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20110920/8d45540e/attachment.html>

More information about the NumPy-Discussion mailing list