[Numpy-discussion] sparse array data

Thu May 3 04:23:45 EDT 2012

On 05/03/2012 06:27 AM, Travis Oliphant wrote:
>
> On May 2, 2012, at 10:03 PM, Stéfan van der Walt wrote:
>
>> On Wed, May 2, 2012 at 6:25 PM, Travis Oliphant<travis at continuum.io>  wrote:
>>> The only new principle (which is not strictly new --- but new to NumPy's world-view) is using one (or more) fields of a structured array as "synthetic dimensions" which replace 1 or more of the raw table dimensions.
>>
>> Ah, thanks--that's the detail I was missing.  I wonder if the
>> contiguity requirement will hamper us here, though.  E.g., I could
>> imagine that some tree structure might be more suitable to storing and
>> organizing indices, and for large arrays we wouldn't like to make a
>> copy for each operation.  I guess we can't wait for discontiguous
>> arrays to come along, though :)
>
> Actually, it's better to keep the actual data together as much as possible, I think, and simulate a tree structure with a layer on top --- i.e. an index.
>
> Different algorithms will prefer different orderings of the underlying data just as today different algorithms prefer different striding patterns on the standard, strided view of a dense array.

Two examples:

1) If you know the distribution of your indices in N dimensions ahead of 
time (e.g., roughly evenly particles in 3D space), you better fill that 
volume using a fractal, and use indices along that fractal, so that you 
get at least some spatial locality in all N dimensions. (This is 
standard procedure in parallelization codes)

2) For standard dense linear algebra code, it is better to store the 
data blocked than either contiguous ordering -- to the point that 
LAPACK/BLAS repacks to a blocked ordering internally on each operation.

Dag