[Numpy-discussion] sparse array data

Nathaniel Smith njs at pobox.com
Wed May 2 17:20:19 EDT 2012


On Wed, May 2, 2012 at 9:53 PM, Francesc Alted <francesc at continuum.io> wrote:
> On 5/2/12 11:16 AM, Wolfgang Kerzendorf wrote:
>> Hi all,
>>
>> I'm currently writing a code that needs three dimensional data (for the physicists it's dimensions are atom, ion, level). The problem is that not all combinations do exist (a sparse array). Sparse matrices in scipy only deal with two dimensions. The operations that I need to do on those are running functions like exp(item/constant) on all of the items. I also want to sum them up in the last dimension. What's the best way to make a class that takes this kind of data and does the required operations fast. Maybe some phycisists have implemented these things already. Any thoughts?
>
> Curiously enough, I have recently been discussing with Travis O. about
> how to represent sparse matrices with complete generality.  One of the
> possibilities is to use what Travis call "synthetic dimensions".  The
> idea behind it is easy: use a table with as many columns as dimensions,
> and add another one for the actual values of the array.  For a 3-D
> sparse array, this looks like:
>
> dim0 | dim1 | dim2 | value
> ==========================
>    0 |   0  |   0  | val0
>    0 |  10  | 100  | val1
>   20 |   5  | 202  | val2

This coordinate format is also what's used by the MATLAB Tensor
Toolbox. They have a paper justifying this choice and describing some
tricks for how to work with them:
  http://epubs.siam.org/sisc/resource/1/sjoce3/v30/i1/p205_s1
(Spoiler: you use a lot of sort operations. Conveniently, timsort
appears to be perfectly adapted for their algorithmic requirements.)

I'm not sure why one would make up a new term like "synthetic
dimensions" though, it's just standard coordinate format...

Though, for the original poster, depending on their exact problem,
they might be better off just using a list or object ndarray of
scipy.sparse matrices. Or some coordinate arrays like above, plus
add.reduceat for the sums they mentioned.

-- Nathaniel



More information about the NumPy-Discussion mailing list