[SciPy-User] Sparse vector
Francesc Alted
faltet at pytables.org
Wed Apr 28 14:04:33 EDT 2010
A Wednesday 28 April 2010 06:36:48 Felix Schlesinger escrigué:
> I use pytables a lot (thanks for the great work) and compression can help
> with some sparse data tasks, but in my case (and probably many others)
> random access is important. And maybe performing some operation only on
> those entries that are not 0.
>
> > In [21]: timeit -r1 -n1 v0[np.random.randint(0, N, 10000)]
> > 1 loops, best of 1: 165 ms per loop
> > In [24]: timeit -r1 -n1 a0[np.random.randint(0, N, 10000)]
> > 1 loops, best of 1: 1.39 ms per loop>
>
> This difference is pretty dramatic and the scipy.sparse matrices
> (compressed row or dictionary of keys) are much closer to numpy
> performance. In some use cases this matters, for others the very good
> performance on expression evaluation with large on-disk arrays of pytables
> works fine.
>
> I think there is room for a general sparse vector class in numpy (maybe
> based on something in scipy.sparse). But there are many trade-offs in the
> design.
True. I, for one, have been lately pondering about a container that would use
compression specific for sparse matrices with emphasis in fast random access.
I'd say that a combination of Blosc and Cython would be really good at this,
and the implementation would not be too difficult.
The hairy think is how to provide mathematical functionality for this sparse
container (making numexpr to understand these containers is appealing, but
would require a non-negligible amount of time). Mmh, food for thought.
--
Francesc Alted
More information about the SciPy-User
mailing list