
Another alternative: https://github.com/perimosocordiae/sparray I tried to follow the numpy API as closely as I could, to see if something like this could work. The code isn't particularly optimized, but it does beat CSR/CSC spmatrix formats on a few benchmarks. My plan going forward is to have swappable (opaque) backends to allow for tradeoffs in speed/memory for access/construction/math operations. Apologies for the lacking documentation! The only good listing of current capabilities at the moment is the test suite ( https://github.com/perimosocordiae/sparray/tree/master/sparray/tests). On Mon, Mar 28, 2016 at 11:19 AM, Evgeni Burovski < evgeny.burovskiy@gmail.com> wrote:
For completeness I'd mention an alternative: https://github.com/scipy/scipy/pull/6004 This is a completely independent effort framed better for the scipy.sparse framework.
Hi,
TL;DR: Here's a draft implementation of a 'sparse array', suitable for incremental assembly. Comments welcome! https://github.com/ev-br/sparr
Scipy sparse matrices are generally great. One area where they are not very great is the formats suitable for incremental assembly (esp LIL and DOK matrices). This has been discussed for a while, see, for instance, https://github.com/scipy/scipy/pull/2908#discussion_r16955546
I spent some time to see how hard is it to build a dictionary-of-keys data structure without Python overhead. Turns out, not very hard.
Also, when constructing a sparse data structure now, it seems to make sense to make it a sparse *array* rather than *matrix*, so that it supports both elementwise and matrix multiplication.
A draft implementation is available at https://github.com/ev-br/sparr and I'd be grateful for any comments, large or small.
Some usage examples are in the README file which Github shows at the front page above.
First and foremost, I'd like to gauge interest in the community ;-). Does it actually make sense? Would you use such a data structure? What is missing in the current version?
Short to medium term, some issues I see are:
* 32-bit vs 64-bit indices. Scipy sparse matrices switch between index types. I wonder if this is purely backwards compatibility. Naively, it seems to me that a new class could just always use 64-bit indices, but this might be too naive?
* Data types and casting rules. For now, I basically piggy-back on numpy's rules. There are several slightly different ones (numba has one?), and there might be an opportunity to simplify the rules. OTOH, inventing one more subtly different set of rules might be a bad idea.
* "Object" dtype. So far, there isn't one. I wonder if it's needed or having only numeric types would be enough.
* Interoperation with numpy arrays and other sparse matrices. I guess __numpy_ufunc__ would *the* solution here, when available. For now, I do something simple based on special-casing and __array_priority__. Sparse matrices almost work, but there are glitches.
This is just a non-exhaustive list of things I'm thinking about. If
On Mon, Mar 28, 2016 at 11:29 AM, Evgeni Burovski <evgeny.burovskiy@gmail.com> wrote: there's
something else (and I'm sure there is plenty!) I'm all ears --- either in this email thread or in the issue tracker. If someone feels like commenting on implementation details, sure, none of those is cast in stone and plumbing is definitely in a bit of a flux.
Cheers,
Evgeni
SciPy-Dev mailing list SciPy-Dev@scipy.org https://mail.scipy.org/mailman/listinfo/scipy-dev