[Numpy-discussion] sparse array data
Dag Sverre Seljebotn
d.s.seljebotn at astro.uio.no
Thu May 3 04:23:45 EDT 2012
On 05/03/2012 06:27 AM, Travis Oliphant wrote:
> On May 2, 2012, at 10:03 PM, Stéfan van der Walt wrote:
>> On Wed, May 2, 2012 at 6:25 PM, Travis Oliphant<travis at continuum.io> wrote:
>>> The only new principle (which is not strictly new --- but new to NumPy's world-view) is using one (or more) fields of a structured array as "synthetic dimensions" which replace 1 or more of the raw table dimensions.
>> Ah, thanks--that's the detail I was missing. I wonder if the
>> contiguity requirement will hamper us here, though. E.g., I could
>> imagine that some tree structure might be more suitable to storing and
>> organizing indices, and for large arrays we wouldn't like to make a
>> copy for each operation. I guess we can't wait for discontiguous
>> arrays to come along, though :)
> Actually, it's better to keep the actual data together as much as possible, I think, and simulate a tree structure with a layer on top --- i.e. an index.
> Different algorithms will prefer different orderings of the underlying data just as today different algorithms prefer different striding patterns on the standard, strided view of a dense array.
1) If you know the distribution of your indices in N dimensions ahead of
time (e.g., roughly evenly particles in 3D space), you better fill that
volume using a fractal, and use indices along that fractal, so that you
get at least some spatial locality in all N dimensions. (This is
standard procedure in parallelization codes)
2) For standard dense linear algebra code, it is better to store the
data blocked than either contiguous ordering -- to the point that
LAPACK/BLAS repacks to a blocked ordering internally on each operation.
More information about the NumPy-Discussion