[Numpy-discussion] labeled axes

Wes McKinney wesmckinn at gmail.com
Thu May 26 07:53:24 EDT 2011


On Tue, May 24, 2011 at 6:39 PM, Craig Yoshioka <craigyk at me.com> wrote:
> Hi all,
> I've read some discussions about adding labeled axes, and even ticks, to
> numpy arrays  (such as in Luis' dataarray).
> I have recently found that the ability to label axes would be very helpful
> to me, but I'd like to keep the implementation as lightweight as possible.
> The reason I would find this useful is because I am writing a ndarray
> subclass that loads image/volume file formats into numpy arrays.  Some of
> these files might have multiple images/volumes, I'll call them channels, and
> also may have an additional dimension for vectors associated with each
> pixel/voxel, like color.  The max dims of the array would then be 5.
> Example: data = ndarray([1023,128,128,128,3]) might mean
> (channels,z,y,x,rgb) for one array.  Now I want to keep as much of the fancy
> indexing capabilities of numpy as I can, but I am finding it difficult to
> track the removal of axes that can occur from indexing.  For example
> data[2,2] would return an array of shape (128,128,3), or the third slice
> through the third volume in the dataset, but the returned array has lost the
> meaning associated with its axes, so saving it back out would require manual
> relabeling of the axes.   I'd like to be able to track the axes as metadata
> and retain all the fancy numpy indexing.
> There are two ways I could accomplish this with minimal code on the python
> side:
>  One would be if indexing of the array always returned an array of the same
> dimensionality, that is data[2,2] returned an array of shape
> (1,1,128,128,3).  I could then delete the degenerate axes labels from the
> metadata, and return the compressed array, resulting in the same output:
>
> class Data(np.ndarray):
> def __getitem__(self,indices):
> data = np.ndarray.__getitem__(self,indices,donotcompress=True) # as an
> example
> data.axeslabels = [label for label,dim in zip(self.axeslabels,data.shape) if
> dim > 1]
> return data.compress()
> def __getslice__(self,s1,s2,step):
> # trivial case
>
> Another approach would be if there is some function in the numpy internals
> that I could use to get the needed information before calling the ndarray's
> __getitem__ function:
>
> class Data(np.ndarray):
> def __getitem__(self,indices):
> unique = np.uniqueIndicesPerDimension(indices)
> data = np.ndarray.__getitem__(self,indices)
> data.axeslabels = [label for label,dim in zip(self.axeslabels, unique) if
> dim > 1]
> return data
>
> Finally, I could implement my own parser for the passed indices to figure
> this out myself.  This would be bad since I'd have to recreate a lot of the
> same code that must go on inside numpy, and it would be slower, error-prone,
> etc. :
>
> class Data(np.ndarray):
> def __getitem__(self,indices):
> indices = self.uniqueDimensionIndices(indices)
> data = np.ndarray.__getitem__(self,indices)
> data.axeslabels = [label for label,dim in zip(self.axeslabels,indices) if
> dim > 1]
> return data
> def uniqueDimensionIndices(self,indices):
> if isinstance(indices,int):
> indices = (indices,)
> if isinstance(indices,tuple):
> ....
> elif isinstance(indices,list):
> ...
>
> Is there anything in the numpy internals already that would allow me to do
> #1 or #2?, I don't think #3 is a very good option.
> Thanks!
>
>
>
>
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>
>

I would recommend joining or at least following the datarray project--
I believe it will do what you want, and we are working actively to
build out this functionality. Here are some links:

Datarray
Github: https://github.com/fperez/datarray
docs: http://fperez.github.com/datarray-doc/

Podcast we did about recent meeting about datarray
http://inscight.org/2011/05/18/episode_13/

Other projects to consider using: larry and pandas-- these support
data alignment which you may not care about. In pandas I'm only
concerned with data with ndim <= 3, a bit specific to
statistics/econometrics/finance applications.

- Wes



More information about the NumPy-Discussion mailing list