On Sat, Oct 27, 2018 at 11:10 AM Stefan van der Walt <stefanv@berkeley.edu> wrote:
On Sat, 27 Oct 2018 10:27:49 +1300, Ralf Gommers wrote:
> Just to make sure we're talking about the same things here: Stefan, I think
> with "sparray" you mean "an n-D sparse array implementation that lives in
> SciPy", nothing more specific? In that case pydata/sparse is the one
> implementation, and including it in scipy.sparse would make it "sparray".
> I'm currently indeed leaning towards depending on pydata/sparse rather than
> including it in scipy.

I want to double check: when we last spoke, it seemed as though certain
refactorings inside of SciPy (specifically, sparray was mentioned) would
simplify the life of pydata/sparse devs.  That no longer seems to be the
case?

There's no such thing as `sparray` anywhere in SciPy. There's two inactive projects to create an n-D sparse array implementation, one of which is called sparray (https://github.com/perimosocordiae/sparray). And there's one very active project to do that same thing which is https://github.com/pydata/sparse


If our recommended route is to tell users to use pydata/sparse instead
of SciPy (for the sparse array object), we probably want to get rid of
our own internal implementation, and deprecate spmatrix

Doc-deprecate I think; the sparse matrix classes in SciPy are very heavily used, so it doesn't make sense to start emitting deprecation warnings for them. But at some point we'll want to point users to pydata/sparse for new code.
 
(or, build
spmatrix on top of pydata/sparse)?

It's the matrix vs. array semantics that are the issue, so not sure that building one on top of the other would be useful.


Once we can define a clear API for sparse arrays, we can include some
algorithms that ingest those objects in SciPy.  But, I'm not sure we
have an API in place that will allow handover of such objects to the
existing C/FORTRAN-level code.

I don't think the constructors for sparse matrix/array care about C/F order. pydata/sparse is pure Python (and uses Numba). For reusing scipy.sparse.linalg and scipy.sparse.csgraph you're right I think that that will need some careful design work. Not sure anyone has thought about that in a lot of detail yet.

There are interesting API questions probably, such as how to treat explicit zeros (that debate still isn't settled for the matrix classes IIRC). And there's an interesting transition puzzle to figure out (which also includes np.matrix). At the moment the discussion on that is spread out over many mailing list threads and Github issues, at some point we'll need to summarize that. Probably around the time that the CSR/CSC replacement that Hameer mentioned is finished.

Cheers,
Ralf