[Numpy-discussion] sparse array data

Charles R Harris charlesr.harris at gmail.com
Wed May 2 23:44:35 EDT 2012

```On Wed, May 2, 2012 at 3:20 PM, Nathaniel Smith <njs at pobox.com> wrote:

> On Wed, May 2, 2012 at 9:53 PM, Francesc Alted <francesc at continuum.io>
> wrote:
> > On 5/2/12 11:16 AM, Wolfgang Kerzendorf wrote:
> >> Hi all,
> >>
> >> I'm currently writing a code that needs three dimensional data (for the
> physicists it's dimensions are atom, ion, level). The problem is that not
> all combinations do exist (a sparse array). Sparse matrices in scipy only
> deal with two dimensions. The operations that I need to do on those are
> running functions like exp(item/constant) on all of the items. I also want
> to sum them up in the last dimension. What's the best way to make a class
> that takes this kind of data and does the required operations fast. Maybe
> some phycisists have implemented these things already. Any thoughts?
> >
> > Curiously enough, I have recently been discussing with Travis O. about
> > how to represent sparse matrices with complete generality.  One of the
> > possibilities is to use what Travis call "synthetic dimensions".  The
> > idea behind it is easy: use a table with as many columns as dimensions,
> > and add another one for the actual values of the array.  For a 3-D
> > sparse array, this looks like:
> >
> > dim0 | dim1 | dim2 | value
> > ==========================
> >    0 |   0  |   0  | val0
> >    0 |  10  | 100  | val1
> >   20 |   5  | 202  | val2
>
> This coordinate format is also what's used by the MATLAB Tensor
> Toolbox. They have a paper justifying this choice and describing some
> tricks for how to work with them:
>  http://epubs.siam.org/sisc/resource/1/sjoce3/v30/i1/p205_s1
> (Spoiler: you use a lot of sort operations. Conveniently, timsort
> appears to be perfectly adapted for their algorithmic requirements.)
>
>
Timsort probably isn't the best choice here, it is optimized for python
lists of python objects where there is at least one level of indirection
and compares are expensive, even more expensive for compound objects. If
the coordinates are stored in numpy structured arrays an ordinary sort is
likely to be faster. Lexsort might even be faster as it could access
aligned integer data and not have to move lists of indexes around.

I'm not sure why one would make up a new term like "synthetic
> dimensions" though, it's just standard coordinate format...
>
> Though, for the original poster, depending on their exact problem,
> they might be better off just using a list or object ndarray of
> scipy.sparse matrices. Or some coordinate arrays like above, plus
> add.reduceat for the sums they mentioned.
>
>
Chuck
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20120502/7b6d25b9/attachment.html>
```