[Numpy-discussion] sparse array data

Nathaniel Smith njs at pobox.com
Wed May 2 19:12:29 EDT 2012


On Wed, May 2, 2012 at 11:26 PM, Francesc Alted <francesc at continuum.io> wrote:
> On 5/2/12 4:20 PM, Nathaniel Smith wrote:
>> On Wed, May 2, 2012 at 9:53 PM, Francesc Alted<francesc at continuum.io>  wrote:
>>> On 5/2/12 11:16 AM, Wolfgang Kerzendorf wrote:
>>>> Hi all,
>>>>
>>>> I'm currently writing a code that needs three dimensional data (for the physicists it's dimensions are atom, ion, level). The problem is that not all combinations do exist (a sparse array). Sparse matrices in scipy only deal with two dimensions. The operations that I need to do on those are running functions like exp(item/constant) on all of the items. I also want to sum them up in the last dimension. What's the best way to make a class that takes this kind of data and does the required operations fast. Maybe some phycisists have implemented these things already. Any thoughts?
>>> Curiously enough, I have recently been discussing with Travis O. about
>>> how to represent sparse matrices with complete generality.  One of the
>>> possibilities is to use what Travis call "synthetic dimensions".  The
>>> idea behind it is easy: use a table with as many columns as dimensions,
>>> and add another one for the actual values of the array.  For a 3-D
>>> sparse array, this looks like:
>>>
>>> dim0 | dim1 | dim2 | value
>>> ==========================
>>>     0 |   0  |   0  | val0
>>>     0 |  10  | 100  | val1
>>>    20 |   5  | 202  | val2
>> This coordinate format is also what's used by the MATLAB Tensor
>> Toolbox. They have a paper justifying this choice and describing some
>> tricks for how to work with them:
>>    http://epubs.siam.org/sisc/resource/1/sjoce3/v30/i1/p205_s1
>> (Spoiler: you use a lot of sort operations. Conveniently, timsort
>> appears to be perfectly adapted for their algorithmic requirements.)
>
> Uh, I do not have access to the article, but yes, sorting makes a lot of
> sense for these scenarios (this is why I was suggesting using indexes,
> which is sort of sorting too).

Doh, sorry. Sticking the title into google scholar gives me this:
  http://csmr.ca.sandia.gov/~tgkolda/pubs/bibtgkfiles/SIAM-67648.pdf

- N



More information about the NumPy-Discussion mailing list