Mon Nov 21 20:18:36 EST 2011

Hi folks,

I'm working on a "ragged array" class -- an array that can store and 
work with what can be considered tabular data, with the rows of 
different lengths:


A "ragged" array class -- build on numpy

The idea is to be able to store data that is essentially 2-d, but each 
row is
an arbitrary length, like:

1   2   3
4   5   6   7   8   9
10 11
12 13  14  15  16  17  18
19 20  21

At the moment, my implementation (see enclosed) stores the data in a 1-d 
numpy array as an attribute, and also an index array that stores the 
indexes into the rows. This is working fine.

However, I'd like to have it support any of the usual numpy operations 
that make sense for a ragged array:

arr *= a_scalar
arr * a_scalar

etc, etc, etc.

So I thought maybe I'd do a subclass, instead of having the data array 
an attribute of the class. But I can't figure out how to solve the 
indexing problem:

I want to re-map indexing, so that:

arr[i] returns the ith "row":

In [2]: ra = ragged_array([(1,2), (3,4,5), (6,7)])

In [4]: print ra
ragged array:
[1 2]
[3 4 5]
[6 7]

In [5]: ra[1]
Out[5]: array([3, 4, 5])

I'm currently doing (error checking removed):

def __getitem__(self,index):
     returns a numpy array of one row.
     row = 
(self._data_array[self._index_array[index]:self._index_array[index+1]] )

         return row

But if I subclass ndarray, then self._data_array becomes jsut plain 
"self", and I've overloaded indexing (and slicing), so I don't know how 
I could index into the "flat" array to get the subset of the array I need.

any ideas?

Other comments about the class would be great, too.


