
Hi all, I've read some discussions about adding labeled axes, and even ticks, to numpy arrays (such as in Luis' dataarray). I have recently found that the ability to label axes would be very helpful to me, but I'd like to keep the implementation as lightweight as possible. The reason I would find this useful is because I am writing a ndarray subclass that loads image/volume file formats into numpy arrays. Some of these files might have multiple images/volumes, I'll call them channels, and also may have an additional dimension for vectors associated with each pixel/voxel, like color. The max dims of the array would then be 5. Example: data = ndarray([1023,128,128,128,3]) might mean (channels,z,y,x,rgb) for one array. Now I want to keep as much of the fancy indexing capabilities of numpy as I can, but I am finding it difficult to track the removal of axes that can occur from indexing. For example data[2,2] would return an array of shape (128,128,3), or the third slice through the third volume in the dataset, but the returned array has lost the meaning associated with its axes, so saving it back out would require manual relabeling of the axes. I'd like to be able to track the axes as metadata and retain all the fancy numpy indexing. There are two ways I could accomplish this with minimal code on the python side: One would be if indexing of the array always returned an array of the same dimensionality, that is data[2,2] returned an array of shape (1,1,128,128,3). I could then delete the degenerate axes labels from the metadata, and return the compressed array, resulting in the same output: class Data(np.ndarray): def __getitem__(self,indices): data = np.ndarray.__getitem__(self,indices,donotcompress=True) # as an example data.axeslabels = [label for label,dim in zip(self.axeslabels,data.shape) if dim > 1] return data.compress() def __getslice__(self,s1,s2,step): # trivial case Another approach would be if there is some function in the numpy internals that I could use to get the needed information before calling the ndarray's __getitem__ function: class Data(np.ndarray): def __getitem__(self,indices): unique = np.uniqueIndicesPerDimension(indices) data = np.ndarray.__getitem__(self,indices) data.axeslabels = [label for label,dim in zip(self.axeslabels, unique) if dim > 1] return data Finally, I could implement my own parser for the passed indices to figure this out myself. This would be bad since I'd have to recreate a lot of the same code that must go on inside numpy, and it would be slower, error-prone, etc. : class Data(np.ndarray): def __getitem__(self,indices): indices = self.uniqueDimensionIndices(indices) data = np.ndarray.__getitem__(self,indices) data.axeslabels = [label for label,dim in zip(self.axeslabels,indices) if dim > 1] return data def uniqueDimensionIndices(self,indices): if isinstance(indices,int): indices = (indices,) if isinstance(indices,tuple): .... elif isinstance(indices,list): ... Is there anything in the numpy internals already that would allow me to do #1 or #2?, I don't think #3 is a very good option. Thanks!

On Tue, May 24, 2011 at 6:39 PM, Craig Yoshioka <craigyk@me.com> wrote:
Hi all, I've read some discussions about adding labeled axes, and even ticks, to numpy arrays (such as in Luis' dataarray). I have recently found that the ability to label axes would be very helpful to me, but I'd like to keep the implementation as lightweight as possible. The reason I would find this useful is because I am writing a ndarray subclass that loads image/volume file formats into numpy arrays. Some of these files might have multiple images/volumes, I'll call them channels, and also may have an additional dimension for vectors associated with each pixel/voxel, like color. The max dims of the array would then be 5. Example: data = ndarray([1023,128,128,128,3]) might mean (channels,z,y,x,rgb) for one array. Now I want to keep as much of the fancy indexing capabilities of numpy as I can, but I am finding it difficult to track the removal of axes that can occur from indexing. For example data[2,2] would return an array of shape (128,128,3), or the third slice through the third volume in the dataset, but the returned array has lost the meaning associated with its axes, so saving it back out would require manual relabeling of the axes. I'd like to be able to track the axes as metadata and retain all the fancy numpy indexing. There are two ways I could accomplish this with minimal code on the python side: One would be if indexing of the array always returned an array of the same dimensionality, that is data[2,2] returned an array of shape (1,1,128,128,3). I could then delete the degenerate axes labels from the metadata, and return the compressed array, resulting in the same output:
class Data(np.ndarray): def __getitem__(self,indices): data = np.ndarray.__getitem__(self,indices,donotcompress=True) # as an example data.axeslabels = [label for label,dim in zip(self.axeslabels,data.shape) if dim > 1] return data.compress() def __getslice__(self,s1,s2,step): # trivial case
Another approach would be if there is some function in the numpy internals that I could use to get the needed information before calling the ndarray's __getitem__ function:
class Data(np.ndarray): def __getitem__(self,indices): unique = np.uniqueIndicesPerDimension(indices) data = np.ndarray.__getitem__(self,indices) data.axeslabels = [label for label,dim in zip(self.axeslabels, unique) if dim > 1] return data
Finally, I could implement my own parser for the passed indices to figure this out myself. This would be bad since I'd have to recreate a lot of the same code that must go on inside numpy, and it would be slower, error-prone, etc. :
class Data(np.ndarray): def __getitem__(self,indices): indices = self.uniqueDimensionIndices(indices) data = np.ndarray.__getitem__(self,indices) data.axeslabels = [label for label,dim in zip(self.axeslabels,indices) if dim > 1] return data def uniqueDimensionIndices(self,indices): if isinstance(indices,int): indices = (indices,) if isinstance(indices,tuple): .... elif isinstance(indices,list): ...
Is there anything in the numpy internals already that would allow me to do #1 or #2?, I don't think #3 is a very good option. Thanks!
_______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
I would recommend joining or at least following the datarray project-- I believe it will do what you want, and we are working actively to build out this functionality. Here are some links: Datarray Github: https://github.com/fperez/datarray docs: http://fperez.github.com/datarray-doc/ Podcast we did about recent meeting about datarray http://inscight.org/2011/05/18/episode_13/ Other projects to consider using: larry and pandas-- these support data alignment which you may not care about. In pandas I'm only concerned with data with ndim <= 3, a bit specific to statistics/econometrics/finance applications. - Wes

Thanks, I will. I was just seeing if there was any intention of adding this to type of support to numpy directly. It would be faster, and I'm sure it would make projects like dataarray much simpler to implement (dataarray does a lot more than my suggestion). On May 26, 2011, at 4:53 AM, Wes McKinney wrote:
I would recommend joining or at least following the datarray project-- I believe it will do what you want, and we are working actively to build out this functionality. Here are some links:
Datarray Github: https://github.com/fperez/datarray docs: http://fperez.github.com/datarray-doc/
Podcast we did about recent meeting about datarray http://inscight.org/2011/05/18/episode_13/
Other projects to consider using: larry and pandas-- these support data alignment which you may not care about. In pandas I'm only concerned with data with ndim <= 3, a bit specific to statistics/econometrics/finance applications.
- Wes _______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion

I'm glad datarray is still active. :) On Thu, May 26, 2011 at 6:36 PM, Craig Yoshioka <craigyk@me.com> wrote:
Thanks, I will. I was just seeing if there was any intention of adding this to type of support to numpy directly. It would be faster, and I'm sure it would make projects like dataarray much simpler to implement (dataarray does a lot more than my suggestion).
On May 26, 2011, at 4:53 AM, Wes McKinney wrote:
I would recommend joining or at least following the datarray project-- I believe it will do what you want, and we are working actively to build out this functionality. Here are some links:
Datarray Github: https://github.com/fperez/datarray docs: http://fperez.github.com/datarray-doc/
Podcast we did about recent meeting about datarray http://inscight.org/2011/05/18/episode_13/
Other projects to consider using: larry and pandas-- these support data alignment which you may not care about. In pandas I'm only concerned with data with ndim <= 3, a bit specific to statistics/econometrics/finance applications.
- Wes _______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
_______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion

On Thu, May 26, 2011 at 6:36 PM, Craig Yoshioka <craigyk@me.com> wrote:
Thanks, I will. I was just seeing if there was any intention of adding this to type of support to numpy directly. It would be faster, and I'm sure it would make projects like dataarray much simpler to implement (dataarray does a lot more than my suggestion).
Datarray was deliberately implemented as a subclass of the numpy array so that it would be as easy as possible to eventually merge things into numpy proper. We simply wanted to experiment with the api and code in a separate project for simplicity of development, but once things are sufficiently baked out and validated by others, there's every intention of pushing this into numpy itself. And yes, we are indeed trying to make some progress on it, we spent some time at Berkeley today on it, and a new grad student from the stats department, Jonathan Terhorst, is putting some solid work into it over the next few days. I really hope it won't be much longer before this is finally ready for production use, I know we need it badly in multiple places. Best, f
participants (4)
-
Craig Yoshioka
-
Fernando Perez
-
John Salvatier
-
Wes McKinney