[SciPy-User] Sparse vector

Wed Apr 28 14:04:33 EDT 2010

A Wednesday 28 April 2010 06:36:48 Felix Schlesinger escrigué:
> I use pytables a lot (thanks for the great work) and compression can help
>  with some sparse data tasks, but in my case (and probably many others)
>  random access is important. And maybe performing some operation only on
>  those entries that are not 0.
> 
> > In [21]: timeit -r1 -n1 v0[np.random.randint(0, N, 10000)]
> > 1 loops, best of 1: 165 ms per loop
> > In [24]: timeit -r1 -n1 a0[np.random.randint(0, N, 10000)]
> > 1 loops, best of 1: 1.39 ms per loop>
> 
> This difference is pretty dramatic and the scipy.sparse matrices
>  (compressed row or dictionary of keys) are much closer to numpy
>  performance. In some use cases this matters, for others the very good
>  performance on expression evaluation with large on-disk arrays of pytables
>  works fine.
> 
> I think there is room for a general sparse vector class in numpy (maybe
>  based on something in scipy.sparse). But there are many trade-offs in the
>  design.

True.  I, for one, have been lately pondering about a container that would use 
compression specific for sparse matrices with emphasis in fast random access.  
I'd say that a combination of Blosc and Cython would be really good at this, 
and the implementation would not be too difficult.

The hairy think is how to provide mathematical functionality for this sparse 
container (making numexpr to understand these containers is appealing, but 
would require a non-negligible amount of time).  Mmh, food for thought.

-- 
Francesc Alted