One difference I see is that you seem to be using a flat index. This
likely does not play too well with assembly, which is what I tried to
take into account.

 
Yeah, I took the easy route of using a single, sorted, flat index. This makes a lot of operations easy, especially with the help of np.ravel_multi_index and friends. You're correct that it makes assembly and other modifications to the sparsity structure difficult, though! For example: https://github.com/perimosocordiae/sparray/blob/master/sparray/base.py#L335
 
Would you be interested to join forces?


I've been meaning to add multiple backends to work around that problem, and I expect that merging our efforts will make that possible. Let me know how you want to proceed.

-CJ