[Numpy-discussion] sparse array data

Travis Oliphant travis at continuum.io
Wed May 2 21:25:02 EDT 2012

On May 2, 2012, at 5:28 PM, Stéfan van der Walt wrote:

> On Wed, May 2, 2012 at 3:20 PM, Francesc Alted <francesc at continuum.io> wrote:
>> On 5/2/12 4:07 PM, Stéfan van der Walt wrote:
>> Well, as the OP said, coo_matrix does not support dimensions larger than
>> 2, right?
> That's just an implementation detail, I would imagine--I'm trying to
> figure out if there is a new principle behind "synthetic dimensions"?
> By the way, David Cournapeau mentioned using b-trees for sparse ops a
> while ago; did you ever talk to him about those ideas?

The only new principle (which is not strictly new --- but new to NumPy's world-view) is using one (or more) fields of a structured array as "synthetic dimensions" which replace 1 or more of the raw table dimensions.     Thus, you could create a "view" of a NumPy array (or a group of NumPy arrays) where 1 or more dimensions is replaced with these "sparse dimensions".      This is a fully-general way to handle a mixture of sparse and dense structures in one general array interface.  

However, you lose the O(1) lookup as now you must search for the non-zero items in order to implement algorithms (indexes are critical and Francesc has some nice indexes in PyTables).  

A group-by operation can be replaced by an operation on "a sparse dimension" where you have mapped attributes to 1 or more dimensions in the underlying array. 

coo_matrix is just a special case of this more general idea.    If you add the ability to compress attributes, then you get csr, csc, and various other forms of matrices as well.  

More to come....  If you are interested in this sort of thing please let me know....    


More information about the NumPy-Discussion mailing list