Hi all,
I've read some discussions about adding labeled axes, and even ticks, to numpy arrays (such as in Luis' dataarray).
I have recently found that the ability to label axes would be very helpful to me, but I'd like to keep the implementation as lightweight as possible.
The reason I would find this useful is because I am writing a ndarray subclass that loads image/volume file formats into numpy arrays. Some of these files might have multiple images/volumes, I'll call them channels, and also may have an additional dimension for vectors associated with each pixel/voxel, like color. The max dims of the array would then be 5.
Example: data = ndarray([1023,128,128,128,3]) might mean (channels,z,y,x,rgb) for one array. Now I want to keep as much of the fancy indexing capabilities of numpy as I can, but I am finding it difficult to track the removal of axes that can occur from indexing. For example data[2,2] would return an array of shape (128,128,3), or the third slice through the third volume in the dataset, but the returned array has lost the meaning associated with its axes, so saving it back out would require manual relabeling of the axes. I'd like to be able to track the axes as metadata and retain all the fancy numpy indexing.
There are two ways I could accomplish this with minimal code on the python side:
One would be if indexing of the array always returned an array of the same dimensionality, that is data[2,2] returned an array of shape (1,1,128,128,3). I could then delete the degenerate axes labels from the metadata, and return the compressed array, resulting in the same output:
class Data(np.ndarray):
def __getitem__(self,indices):
data = np.ndarray.__getitem__(self,indices,donotcompress=True) # as an example
data.axeslabels = [label for label,dim in zip(self.axeslabels,data.shape) if dim > 1]
return data.compress()
def __getslice__(self,s1,s2,step):
# trivial case
Another approach would be if there is some function in the numpy internals that I could use to get the needed information before calling the ndarray's __getitem__ function:
class Data(np.ndarray):
def __getitem__(self,indices):
unique = np.uniqueIndicesPerDimension(indices)
data = np.ndarray.__getitem__(self,indices)
data.axeslabels = [label for label,dim in zip(self.axeslabels, unique) if dim > 1]
return data
Finally, I could implement my own parser for the passed indices to figure this out myself. This would be bad since I'd have to recreate a lot of the same code that must go on inside numpy, and it would be slower, error-prone, etc. :
class Data(np.ndarray):
def __getitem__(self,indices):
indices = self.uniqueDimensionIndices(indices)
data = np.ndarray.__getitem__(self,indices)
data.axeslabels = [label for label,dim in zip(self.axeslabels,indices) if dim > 1]
return data
def uniqueDimensionIndices(self,indices):
if isinstance(indices,int):
indices = (indices,)
if isinstance(indices,tuple):
....
elif isinstance(indices,list):
...
Is there anything in the numpy internals already that would allow me to do #1 or #2?, I don't think #3 is a very good option.
Thanks!