[Numpy-discussion] sparse array data
Dag Sverre Seljebotn
d.s.seljebotn at astro.uio.no
Thu May 3 04:39:35 EDT 2012
On 05/03/2012 03:25 AM, Travis Oliphant wrote:
>
> On May 2, 2012, at 5:28 PM, Stéfan van der Walt wrote:
>
>> On Wed, May 2, 2012 at 3:20 PM, Francesc Alted<francesc at continuum.io> wrote:
>>> On 5/2/12 4:07 PM, Stéfan van der Walt wrote:
>>> Well, as the OP said, coo_matrix does not support dimensions larger than
>>> 2, right?
>>
>> That's just an implementation detail, I would imagine--I'm trying to
>> figure out if there is a new principle behind "synthetic dimensions"?
>> By the way, David Cournapeau mentioned using b-trees for sparse ops a
>> while ago; did you ever talk to him about those ideas?
>
> The only new principle (which is not strictly new --- but new to NumPy's world-view) is using one (or more) fields of a structured array as "synthetic dimensions" which replace 1 or more of the raw table dimensions. Thus, you could create a "view" of a NumPy array (or a group of NumPy arrays) where 1 or more dimensions is replaced with these "sparse dimensions". This is a fully-general way to handle a mixture of sparse and dense structures in one general array interface.
>
> However, you lose the O(1) lookup as now you must search for the non-zero items in order to implement algorithms (indexes are critical and Francesc has some nice indexes in PyTables).
>
> A group-by operation can be replaced by an operation on "a sparse dimension" where you have mapped attributes to 1 or more dimensions in the underlying array.
>
> coo_matrix is just a special case of this more general idea. If you add the ability to compress attributes, then you get csr, csc, and various other forms of matrices as well.
>
> More to come.... If you are interested in this sort of thing please let me know....
I am very interested in such developments, since a NumPy axis very often
doesn't mean anything to itself to me, but I'll do stuff like
arr[l*l + l + m - lmin * lmin] = f(l, m)
Something like a function you could register that maps virtual
dimensions to actual dimensions would be great:
void lm_to_idx(Py_ssize_t *virtual_indices, Py_ssize_t *indices,
void *ctx) {
Py_ssize_t l, m, lmin;
lmin = ((my_context_t*)ctx)->lmin;
for (Py_ssize_t i = 0; i != NPY_INDEX_TRANSLATION_BLOCKSIZE; ++i) {
l = virtual_indices[2 * i];
m = virtual_indiecs[2 * i + 1];
indices[i] = l * l + l + m - lmin * lmin;
}
}
Then register the function with an array (somehow), and then you can do
arr[l, m] = f(l, m)
:-)
But, doing it manually isn't *that* much of a pain, I won't be putting
in any work in this direction myself.
Dag
More information about the NumPy-Discussion
mailing list