Hi folks,
I'm working on a "ragged array" class -- an array that can store and
work with what can be considered tabular data, with the rows of
different lengths:
"""
ragged_array
A "ragged" array class -- build on numpy
The idea is to be able to store data that is essentially 2-d, but each
row is
an arbitrary length, like:
1 2 3
4 5 6 7 8 9
10 11
12 13 14 15 16 17 18
19 20 21
...
At the moment, my implementation (see enclosed) stores the data in a 1-d
numpy array as an attribute, and also an index array that stores the
indexes into the rows. This is working fine.
However, I'd like to have it support any of the usual numpy operations
that make sense for a ragged array:
arr.sum()
arr *= a_scalar
arr * a_scalar
etc, etc, etc.
So I thought maybe I'd do a subclass, instead of having the data array
an attribute of the class. But I can't figure out how to solve the
indexing problem:
I want to re-map indexing, so that:
arr[i] returns the ith "row":
In [2]: ra = ragged_array([(1,2), (3,4,5), (6,7)])
In [4]: print ra
ragged array:
[1 2]
[3 4 5]
[6 7]
In [5]: ra[1]
Out[5]: array([3, 4, 5])
I'm currently doing (error checking removed):
def __getitem__(self,index):
"""
returns a numpy array of one row.
"""
row =
(self._data_array[self._index_array[index]:self._index_array[index+1]] )
return row
But if I subclass ndarray, then self._data_array becomes jsut plain
"self", and I've overloaded indexing (and slicing), so I don't know how
I could index into the "flat" array to get the subset of the array I need.
any ideas?
Other comments about the class would be great, too.
-Chris
--
Christopher Barker, Ph.D.
Oceanographer
Emergency Response Division
NOAA/NOS/OR&R (206) 526-6959 voice
7600 Sand Point Way NE (206) 526-6329 fax
Seattle, WA 98115 (206) 526-6317 main reception
Chris.Barker(a)noaa.gov