I'm working on a "ragged array" class -- an array that can store and work with what can be considered tabular data, with the rows of different lengths:
A "ragged" array class -- build on numpy
The idea is to be able to store data that is essentially 2-d, but each row is an arbitrary length, like:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 ...
At the moment, my implementation (see enclosed) stores the data in a 1-d numpy array as an attribute, and also an index array that stores the indexes into the rows. This is working fine.
However, I'd like to have it support any of the usual numpy operations that make sense for a ragged array:
arr.sum() arr *= a_scalar arr * a_scalar
etc, etc, etc.
So I thought maybe I'd do a subclass, instead of having the data array an attribute of the class. But I can't figure out how to solve the indexing problem:
I want to re-map indexing, so that:
arr[i] returns the ith "row":
In : ra = ragged_array([(1,2), (3,4,5), (6,7)])
In : print ra ragged array: [1 2] [3 4 5] [6 7]
In : ra Out: array([3, 4, 5])
I'm currently doing (error checking removed):
def __getitem__(self,index): """ returns a numpy array of one row. """ row = (self._data_array[self._index_array[index]:self._index_array[index+1]] )
But if I subclass ndarray, then self._data_array becomes jsut plain "self", and I've overloaded indexing (and slicing), so I don't know how I could index into the "flat" array to get the subset of the array I need.
Other comments about the class would be great, too.