Re: [Numpy-discussion] named ndarray axes
I'm currently working on interfacing ndarrays with a custom C-representation for n-dimensional arrays. My custom C code provides additional per-axis information (labeling, spacing between samples / range of sample positions along the axis, axis direction, cell vs.node centering, etc.) Subclassing ndarray to hold onto this info is fairly simple, but getting numpy's methods to intelligently modify that information when the array is sliced is something that I'm still trying to figure out. A robust way to attach per-axis info to a given ndarray, whether it just be a label or some more complex structure, would definitely be something I (and likely others) would find useful... That said, I'd love to know more about how the idx_axes() structure in your workaround works... - Sam On 7/13/11 12:00 PM, "numpy-discussion-request@scipy.org" <numpy-discussion-request@scipy.org> wrote:
Date: Tue, 12 Jul 2011 16:39:47 -0700 From: Craig Yoshioka <craigyk@me.com> Subject: [Numpy-discussion] named ndarray axes To: NumPy-Discussion@scipy.org Message-ID: <0FC8B43E-26CD-40ED-A6FA-59DD8D641998@me.com> Content-Type: text/plain; CHARSET=US-ASCII
I brought up a while ago about how it would be nice if numpy arrays could have their axes 'labeled'. = I got an implementation that works pretty well for me and in the process learned quite a few things, and was hoping to foster some more discussion on this topic, as I think I have found a simple/flexible solution to support this at the numpy level.
Here are *some* examples code snippets from my unit tests on 'Array':
a = Array((4,5,6))
# you can assign data to all axes by attribute: a.Axes.Label = (0:'z',1:'y',2:'x'}
# or add metadata to each individually: a.Axes[1].Vector = [0,1,0] a.Axes[2].Vector = [0,0,1]
# simple case int indexing b = a[0] assert b.shape == (5,6) assert b.Axes.Label == {0:'y',1:'x'} assert b.Axes.Vector == {0:[0,1,0],1:[0,0,1]}
# indexing with slices b = a[:,0,:] assert b.shape == (4,6) assert b.Axes.Label == {0:'z',1:'x'} assert b.Axes.Vector == {1:[0,0,1]}
# indexing with ellipsis b = a[...,0] assert b.shape == (4,5) assert b.Axes.Label == {0:'z',1:'y'}
# indexing with ellipsis, newaxis, etc. b = a[newaxis,...,2,newaxis] assert b.shape == (1,4,5,1) assert b.Axes.Label == {1:'z',2:'y'}
# indexing with lists b = a[[1,2],:,[1,2]] assert b.shape == (2,5) assert b.Axes.Label == {0:'z',1:'y'}
# most interesting examples, indexing with axes labels---------------- # I was a bit confused about how to handle indexing with mixed axes/non-axes indexes # IE: what does a['x',2:4] mean? on what axis is the 2:4 slice being applied, the first? the first after 'x'? # One option is to disallow mixing (simpler to implement, understand?) # Instead I chose to treat the axis indexing as a forced assignment of an axis to a position.
# axis indexing that transposes the first two dimensions, but otherwise does nothing b = a['y','z'] assert b.shape == (5,4,6) assert b.Axes.Label == {0:'y',1:'z',2:'x'}
# abusing slices to allow specifying indexes for axes b = a['y':0,'z'] assert b.shape == (4,6) assert b.Axes.Label == {0:'z',1:'x'}
# unfortunately that means a slice index on an axis must be written like so: b = a['y':slice(0,2),'x','z'] assert b.shape == (2,6,4) assert b.Axes.Label == {0:'y',1:'x',2:'z'}
b = a['y':[1,2,3],'x','z':slice(0:1)] # or due to the forced transposition, this is the same as: c = a['y','x','z'][[1,2,3],:,0:1]
assert b.shape == (3,6,1) assert b.Axes.Label == {0:'y',1:'x',2:'z'} assert b.shape == c.shape assert b.Axes == c.Axes
#----------------------------------------------------------------------------- -----------
To do all this I essentially had to recreate the effects of numpy indexing on axes.... This is not ideal, but so far I seem to have addressed most of the indexing I've used, at least. Here is what __getitem__ looks like:
def __getitem__(self,idxs): filtered_idxs,transposed_axes,kept_axes = self.idx_axes(idxs) array = self.view(ndarray).transpose(transposed_axes) array = array[filtered_idxs] if isinstance(array,ndarray): array = array.view(Array) array.Axes = self.Axes.keep(kept_axes) return array
As you can see idx_axes() essentially recreates a lot of ndarray indexing behavior, so that its effects can be explicitly handled.
Having done all this, I think the best way for numpy to support 'labeled' axes in the future is by having numpy itself keep track of a very simple tuple attribute, like shape, and leave more complex axis naming/labeling to subclasses on the python side. As an example, upon creating a new dimension in an array, numpy assigns that dimension a semi-unique id, and this tuple could be used in __array_finalize__.
For example my __array_finalize__ could look like:
def __array_finalize__(self,obj): if hasattr(obj,'axesdata'): for axesid in self.axes: if axesid in obj.axes: self.axesdata[axesid] = obj.axesdata[axesid]
This would cover a lot more situations and lead to much simpler code since the work required on the C side would be minimal, but still allow robust and custom tracking and propagation of axes information. Subclasses that tap into this data would react to the result of numpy operations vs. having to predict/anticipate.
For example, my __getitem__, relying on the __array_finalize__ above, could look like:
def __getitem__(self,idxs): filtered_idxs,transposed_axes= self.idx_axes(idxs) array = self.transpose(transposed_axes) return array[filtered_idxs]
Not shown is how much simpler and robust the code for idx_axes would then be. I estimate it would go from 130 loc to < 20 loc.
Sorry for the extra long e-mail, -Craig
Hey, So I'm working on interfacing numpy ndarrays with an n-dimensional array representation that exists as part of a massive custom C library. Due to the size of the library, hand-coding a c-extension for the library just was not really an option; so we wound up using gcc_xml to generate the proper ctypes code. This works great for accessing our C functions within python, but not so much for trying share memory between numpy and our custom array representations... Passing a pointer to the numpy array data to ctypes is fairly simple, but figuring out the proper way to get memory from ctypes into numpy has been problematic. I know that PEP 3118 is supposed to be superseding the numpy array interface, but PEP 3118 can only be specified on the C side, which is problematic for anybody using ctypes to wrap their C code. The legacy __array_interface__ allows for a python side specification of data buffers, but there appears to be no corresponding interface capability in the PEP 3118 protocol. On top of that add the fact that Python's own support for PEP 3118 has some major bugs (ctypes throwing invalid PEP 3118 codes - http://bugs.python.org/issue10746 :: issues with python's memoryview object - http://bugs.python.org/issue10181), and PEP 3118 seems like a nightmare to deal with. At the same time though, I don't want to simply use the legacy array interface if it's going to be completely deprecated in the near future. How long before the legacy __array_interface__ goes the way of the dodo? When that happens, are there plans to add support for a python side interface to the PEP 3118 protocol? If not, what is the proper way to interface a ctypes wrapped library with PEP 3118? Thanks, - Sam Quinan
Yup exactly. To enable this sort of tracking I needed to explicitly reverse-engineer the effects of indexing on axes. I figure overriding indexing catches most cases that modify axes, but other holes need to be plugged as well... ie: tranpose, swapaxes. This probably means most C functions that change array axes (np.mean(axis=), etc.) need to be covered as well.... that sucks. BTW, it sounds like you're trying to track very similar data. I am trying to load structural biology data formats, and I try to preserve as much of the metadata as possible, ie: I am encoding unit cell length/angle information as vectors, etc. Here is my implementation: def __getitem__(self,idxs): idxs,trans,keep = idx_axes(self,idxs) array = self.view(np.ndarray).transpose(trans) array = array[idxs] if isinstance(array,ndarray): array = array.view(self.__class__) array.axes = self.axes.transpose(keep) return array def idx_axes(array,idxs): # explicitly expand ellipsis expanded_idxs = idx_expanded(array.ndim,idxs) # determine how the axes will be rearranged as a result of axes-based indexing # and the creation of newaxes remapped_axes = idx_axes_remapped(array.ndim,array.axes,expanded_idxs) # determine numpy compatible transpose, before newaxes are created transposed_axes = idx_axes_transposed(remapped_axes) # determine numpy compatible indexes with axes-based indexing removed filtered_idxs = idx_filtered(expanded_idxs) # determine which axes will be kept after numpy indexing kept_axes = idx_axes_kept(remapped_axes,filtered_idxs) return filtered_idxs,transposed_axes,kept_axes def idx_expanded(ndim,idxs): ''' explicitly expands ellipsis taking into account newaxes ''' if not isinstance(idxs,tuple): return idx_expanded(ndim,(idxs,)) # how many dimensions we will end up having ndim = ndim + idxs.count(newaxis) filler = slice(None) def skip_ellipsis(idxs): return tuple([filler if isinstance(x,type(Ellipsis)) else x for x in idxs]) def fill_ellipsis(ndim,l,r): return (filler,)*(ndim-len(l)-len(r)) # expand first ellipsis, treat all other ellipsis as slices if Ellipsis in idxs: idx = idxs.index(Ellipsis) llist = idxs[:idx] rlist = skip_ellipsis(idxs[idx+1:]) cfill = fill_ellipsis(ndim,llist,rlist) idxs = llist + cfill + rlist return idxs def idx_filtered(idxs): ''' replace indexes on axes with normal numpy indexes ''' def axisAsIdx(idx): if isinstance(idx,str): return slice(None) elif isinstance(idx,slice): if isinstance(idx.start,str): return idx.stop return idx return tuple([axisAsIdx(x) for x in idxs]) def idx_axes_remapped(ndim,axes,idxs): ''' if given a set of array indexes that contain labeled axes, return a tuple that maps axes in the source array to the axes as they will end up in the destination array. Must take into account the spaces created by newaxis indexes. ''' # how many dims are we expecting? ndim = ndim + idxs.count(newaxis) # new unique object for marking unassigned locations in mapping unassigned = object() # by default all locations are unsassigned mapping = [unassigned] * ndim # find labels in indexes and set the dims for those locations for dim,label in enumerate(idxs): if label == newaxis: mapping[dim] = label elif isinstance(label,str): mapping[dim] = axes.dimForLabel(label) elif isinstance(label,slice): if isinstance(label.start,str): mapping[dim] = axes.dimForLabel(label.start) # find unassigned dims, in order unmapped = [d for d in range(ndim) if d not in set(mapping)] # fill in remaining unassigned locations with dims for dst,src in enumerate(mapping): if src == unassigned: mapping[dst] = unmapped.pop(0) return tuple(mapping) def idx_axes_transposed(mapping): ''' stripping out newaxes in mapping yields a tuple compatible with transpose ''' return tuple([x for x in mapping if x != newaxis]) def idx_axes_kept(mapping,idxs): ''' remove axes from mapping that will not survive the indexing (ie: ints) ''' kept = [] first_list = True for dst,src in enumerate(mapping): if dst < len(idxs): idx = idxs[dst] if isinstance(idx,int): continue elif isinstance(idx,list): if not first_list: continue first_list = False kept += [src] return tuple(kept) On Jul 13, 2011, at 3:13 PM, Sam Quinan wrote:
I'm currently working on interfacing ndarrays with a custom C-representation for n-dimensional arrays. My custom C code provides additional per-axis information (labeling, spacing between samples / range of sample positions along the axis, axis direction, cell vs.node centering, etc.) Subclassing ndarray to hold onto this info is fairly simple, but getting numpy's methods to intelligently modify that information when the array is sliced is something that I'm still trying to figure out.
A robust way to attach per-axis info to a given ndarray, whether it just be a label or some more complex structure, would definitely be something I (and likely others) would find useful...
That said, I'd love to know more about how the idx_axes() structure in your workaround works...
- Sam
On 7/13/11 12:00 PM, "numpy-discussion-request@scipy.org" <numpy-discussion-request@scipy.org> wrote:
Date: Tue, 12 Jul 2011 16:39:47 -0700 From: Craig Yoshioka <craigyk@me.com> Subject: [Numpy-discussion] named ndarray axes To: NumPy-Discussion@scipy.org Message-ID: <0FC8B43E-26CD-40ED-A6FA-59DD8D641998@me.com> Content-Type: text/plain; CHARSET=US-ASCII
I brought up a while ago about how it would be nice if numpy arrays could have their axes 'labeled'. = I got an implementation that works pretty well for me and in the process learned quite a few things, and was hoping to foster some more discussion on this topic, as I think I have found a simple/flexible solution to support this at the numpy level.
Here are *some* examples code snippets from my unit tests on 'Array':
a = Array((4,5,6))
# you can assign data to all axes by attribute: a.Axes.Label = (0:'z',1:'y',2:'x'}
# or add metadata to each individually: a.Axes[1].Vector = [0,1,0] a.Axes[2].Vector = [0,0,1]
# simple case int indexing b = a[0] assert b.shape == (5,6) assert b.Axes.Label == {0:'y',1:'x'} assert b.Axes.Vector == {0:[0,1,0],1:[0,0,1]}
# indexing with slices b = a[:,0,:] assert b.shape == (4,6) assert b.Axes.Label == {0:'z',1:'x'} assert b.Axes.Vector == {1:[0,0,1]}
# indexing with ellipsis b = a[...,0] assert b.shape == (4,5) assert b.Axes.Label == {0:'z',1:'y'}
# indexing with ellipsis, newaxis, etc. b = a[newaxis,...,2,newaxis] assert b.shape == (1,4,5,1) assert b.Axes.Label == {1:'z',2:'y'}
# indexing with lists b = a[[1,2],:,[1,2]] assert b.shape == (2,5) assert b.Axes.Label == {0:'z',1:'y'}
# most interesting examples, indexing with axes labels---------------- # I was a bit confused about how to handle indexing with mixed axes/non-axes indexes # IE: what does a['x',2:4] mean? on what axis is the 2:4 slice being applied, the first? the first after 'x'? # One option is to disallow mixing (simpler to implement, understand?) # Instead I chose to treat the axis indexing as a forced assignment of an axis to a position.
# axis indexing that transposes the first two dimensions, but otherwise does nothing b = a['y','z'] assert b.shape == (5,4,6) assert b.Axes.Label == {0:'y',1:'z',2:'x'}
# abusing slices to allow specifying indexes for axes b = a['y':0,'z'] assert b.shape == (4,6) assert b.Axes.Label == {0:'z',1:'x'}
# unfortunately that means a slice index on an axis must be written like so: b = a['y':slice(0,2),'x','z'] assert b.shape == (2,6,4) assert b.Axes.Label == {0:'y',1:'x',2:'z'}
b = a['y':[1,2,3],'x','z':slice(0:1)] # or due to the forced transposition, this is the same as: c = a['y','x','z'][[1,2,3],:,0:1]
assert b.shape == (3,6,1) assert b.Axes.Label == {0:'y',1:'x',2:'z'} assert b.shape == c.shape assert b.Axes == c.Axes
#----------------------------------------------------------------------------- -----------
To do all this I essentially had to recreate the effects of numpy indexing on axes.... This is not ideal, but so far I seem to have addressed most of the indexing I've used, at least. Here is what __getitem__ looks like:
def __getitem__(self,idxs): filtered_idxs,transposed_axes,kept_axes = self.idx_axes(idxs) array = self.view(ndarray).transpose(transposed_axes) array = array[filtered_idxs] if isinstance(array,ndarray): array = array.view(Array) array.Axes = self.Axes.keep(kept_axes) return array
As you can see idx_axes() essentially recreates a lot of ndarray indexing behavior, so that its effects can be explicitly handled.
Having done all this, I think the best way for numpy to support 'labeled' axes in the future is by having numpy itself keep track of a very simple tuple attribute, like shape, and leave more complex axis naming/labeling to subclasses on the python side. As an example, upon creating a new dimension in an array, numpy assigns that dimension a semi-unique id, and this tuple could be used in __array_finalize__.
For example my __array_finalize__ could look like:
def __array_finalize__(self,obj): if hasattr(obj,'axesdata'): for axesid in self.axes: if axesid in obj.axes: self.axesdata[axesid] = obj.axesdata[axesid]
This would cover a lot more situations and lead to much simpler code since the work required on the C side would be minimal, but still allow robust and custom tracking and propagation of axes information. Subclasses that tap into this data would react to the result of numpy operations vs. having to predict/anticipate.
For example, my __getitem__, relying on the __array_finalize__ above, could look like:
def __getitem__(self,idxs): filtered_idxs,transposed_axes= self.idx_axes(idxs) array = self.transpose(transposed_axes) return array[filtered_idxs]
Not shown is how much simpler and robust the code for idx_axes would then be. I estimate it would go from 130 loc to < 20 loc.
Sorry for the extra long e-mail, -Craig
_______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
On Wed, Jul 13, 2011 at 7:15 PM, Craig Yoshioka <craigyk@me.com> wrote:
Yup exactly. To enable this sort of tracking I needed to explicitly reverse-engineer the effects of indexing on axes. I figure overriding indexing catches most cases that modify axes, but other holes need to be plugged as well... ie: tranpose, swapaxes. This probably means most C functions that change array axes (np.mean(axis=), etc.) need to be covered as well.... that sucks.
BTW, it sounds like you're trying to track very similar data. I am trying to load structural biology data formats, and I try to preserve as much of the metadata as possible, ie: I am encoding unit cell length/angle information as vectors, etc.
Here is my implementation:
def __getitem__(self,idxs): idxs,trans,keep = idx_axes(self,idxs) array = self.view(np.ndarray).transpose(trans) array = array[idxs] if isinstance(array,ndarray): array = array.view(self.__class__) array.axes = self.axes.transpose(keep) return array
def idx_axes(array,idxs):
# explicitly expand ellipsis expanded_idxs = idx_expanded(array.ndim,idxs)
# determine how the axes will be rearranged as a result of axes-based indexing # and the creation of newaxes remapped_axes = idx_axes_remapped(array.ndim,array.axes,expanded_idxs)
# determine numpy compatible transpose, before newaxes are created transposed_axes = idx_axes_transposed(remapped_axes)
# determine numpy compatible indexes with axes-based indexing removed filtered_idxs = idx_filtered(expanded_idxs)
# determine which axes will be kept after numpy indexing kept_axes = idx_axes_kept(remapped_axes,filtered_idxs)
return filtered_idxs,transposed_axes,kept_axes
def idx_expanded(ndim,idxs): ''' explicitly expands ellipsis taking into account newaxes '''
if not isinstance(idxs,tuple): return idx_expanded(ndim,(idxs,))
# how many dimensions we will end up having ndim = ndim + idxs.count(newaxis)
filler = slice(None)
def skip_ellipsis(idxs): return tuple([filler if isinstance(x,type(Ellipsis)) else x for x in idxs])
def fill_ellipsis(ndim,l,r): return (filler,)*(ndim-len(l)-len(r))
# expand first ellipsis, treat all other ellipsis as slices if Ellipsis in idxs: idx = idxs.index(Ellipsis) llist = idxs[:idx] rlist = skip_ellipsis(idxs[idx+1:]) cfill = fill_ellipsis(ndim,llist,rlist) idxs = llist + cfill + rlist
return idxs
def idx_filtered(idxs): ''' replace indexes on axes with normal numpy indexes ''' def axisAsIdx(idx): if isinstance(idx,str): return slice(None) elif isinstance(idx,slice): if isinstance(idx.start,str): return idx.stop return idx
return tuple([axisAsIdx(x) for x in idxs])
def idx_axes_remapped(ndim,axes,idxs): ''' if given a set of array indexes that contain labeled axes, return a tuple that maps axes in the source array to the axes as they will end up in the destination array. Must take into account the spaces created by newaxis indexes. '''
# how many dims are we expecting? ndim = ndim + idxs.count(newaxis)
# new unique object for marking unassigned locations in mapping unassigned = object()
# by default all locations are unsassigned mapping = [unassigned] * ndim
# find labels in indexes and set the dims for those locations for dim,label in enumerate(idxs): if label == newaxis: mapping[dim] = label elif isinstance(label,str): mapping[dim] = axes.dimForLabel(label) elif isinstance(label,slice): if isinstance(label.start,str): mapping[dim] = axes.dimForLabel(label.start)
# find unassigned dims, in order unmapped = [d for d in range(ndim) if d not in set(mapping)]
# fill in remaining unassigned locations with dims for dst,src in enumerate(mapping): if src == unassigned: mapping[dst] = unmapped.pop(0)
return tuple(mapping)
def idx_axes_transposed(mapping): ''' stripping out newaxes in mapping yields a tuple compatible with transpose ''' return tuple([x for x in mapping if x != newaxis])
def idx_axes_kept(mapping,idxs): ''' remove axes from mapping that will not survive the indexing (ie: ints) ''' kept = [] first_list = True for dst,src in enumerate(mapping): if dst < len(idxs): idx = idxs[dst] if isinstance(idx,int): continue elif isinstance(idx,list): if not first_list: continue first_list = False kept += [src] return tuple(kept)
On Jul 13, 2011, at 3:13 PM, Sam Quinan wrote:
I'm currently working on interfacing ndarrays with a custom C-representation for n-dimensional arrays. My custom C code provides additional per-axis information (labeling, spacing between samples / range of sample positions along the axis, axis direction, cell vs.node centering, etc.) Subclassing ndarray to hold onto this info is fairly simple, but getting numpy's methods to intelligently modify that information when the array is sliced is something that I'm still trying to figure out.
A robust way to attach per-axis info to a given ndarray, whether it just be a label or some more complex structure, would definitely be something I (and likely others) would find useful...
That said, I'd love to know more about how the idx_axes() structure in your workaround works...
- Sam
On 7/13/11 12:00 PM, "numpy-discussion-request@scipy.org" <numpy-discussion-request@scipy.org> wrote:
Date: Tue, 12 Jul 2011 16:39:47 -0700 From: Craig Yoshioka <craigyk@me.com> Subject: [Numpy-discussion] named ndarray axes To: NumPy-Discussion@scipy.org Message-ID: <0FC8B43E-26CD-40ED-A6FA-59DD8D641998@me.com> Content-Type: text/plain; CHARSET=US-ASCII
I brought up a while ago about how it would be nice if numpy arrays could have their axes 'labeled'. = I got an implementation that works pretty well for me and in the process learned quite a few things, and was hoping to foster some more discussion on this topic, as I think I have found a simple/flexible solution to support this at the numpy level.
Here are *some* examples code snippets from my unit tests on 'Array':
a = Array((4,5,6))
# you can assign data to all axes by attribute: a.Axes.Label = (0:'z',1:'y',2:'x'}
# or add metadata to each individually: a.Axes[1].Vector = [0,1,0] a.Axes[2].Vector = [0,0,1]
# simple case int indexing b = a[0] assert b.shape == (5,6) assert b.Axes.Label == {0:'y',1:'x'} assert b.Axes.Vector == {0:[0,1,0],1:[0,0,1]}
# indexing with slices b = a[:,0,:] assert b.shape == (4,6) assert b.Axes.Label == {0:'z',1:'x'} assert b.Axes.Vector == {1:[0,0,1]}
# indexing with ellipsis b = a[...,0] assert b.shape == (4,5) assert b.Axes.Label == {0:'z',1:'y'}
# indexing with ellipsis, newaxis, etc. b = a[newaxis,...,2,newaxis] assert b.shape == (1,4,5,1) assert b.Axes.Label == {1:'z',2:'y'}
# indexing with lists b = a[[1,2],:,[1,2]] assert b.shape == (2,5) assert b.Axes.Label == {0:'z',1:'y'}
# most interesting examples, indexing with axes labels---------------- # I was a bit confused about how to handle indexing with mixed axes/non-axes indexes # IE: what does a['x',2:4] mean? on what axis is the 2:4 slice being applied, the first? the first after 'x'? # One option is to disallow mixing (simpler to implement, understand?) # Instead I chose to treat the axis indexing as a forced assignment of an axis to a position.
# axis indexing that transposes the first two dimensions, but otherwise does nothing b = a['y','z'] assert b.shape == (5,4,6) assert b.Axes.Label == {0:'y',1:'z',2:'x'}
# abusing slices to allow specifying indexes for axes b = a['y':0,'z'] assert b.shape == (4,6) assert b.Axes.Label == {0:'z',1:'x'}
# unfortunately that means a slice index on an axis must be written like so: b = a['y':slice(0,2),'x','z'] assert b.shape == (2,6,4) assert b.Axes.Label == {0:'y',1:'x',2:'z'}
b = a['y':[1,2,3],'x','z':slice(0:1)] # or due to the forced transposition, this is the same as: c = a['y','x','z'][[1,2,3],:,0:1]
assert b.shape == (3,6,1) assert b.Axes.Label == {0:'y',1:'x',2:'z'} assert b.shape == c.shape assert b.Axes == c.Axes
#----------------------------------------------------------------------------- -----------
To do all this I essentially had to recreate the effects of numpy indexing on axes.... This is not ideal, but so far I seem to have addressed most of the indexing I've used, at least. Here is what __getitem__ looks like:
def __getitem__(self,idxs): filtered_idxs,transposed_axes,kept_axes = self.idx_axes(idxs) array = self.view(ndarray).transpose(transposed_axes) array = array[filtered_idxs] if isinstance(array,ndarray): array = array.view(Array) array.Axes = self.Axes.keep(kept_axes) return array
As you can see idx_axes() essentially recreates a lot of ndarray indexing behavior, so that its effects can be explicitly handled.
Having done all this, I think the best way for numpy to support 'labeled' axes in the future is by having numpy itself keep track of a very simple tuple attribute, like shape, and leave more complex axis naming/labeling to subclasses on the python side. As an example, upon creating a new dimension in an array, numpy assigns that dimension a semi-unique id, and this tuple could be used in __array_finalize__.
For example my __array_finalize__ could look like:
def __array_finalize__(self,obj): if hasattr(obj,'axesdata'): for axesid in self.axes: if axesid in obj.axes: self.axesdata[axesid] = obj.axesdata[axesid]
This would cover a lot more situations and lead to much simpler code since the work required on the C side would be minimal, but still allow robust and custom tracking and propagation of axes information. Subclasses that tap into this data would react to the result of numpy operations vs. having to predict/anticipate.
For example, my __getitem__, relying on the __array_finalize__ above, could look like:
def __getitem__(self,idxs): filtered_idxs,transposed_axes= self.idx_axes(idxs) array = self.transpose(transposed_axes) return array[filtered_idxs]
Not shown is how much simpler and robust the code for idx_axes would then be. I estimate it would go from 130 loc to < 20 loc.
Sorry for the extra long e-mail, -Craig
_______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
_______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Have you guys been following the DataArray discussions or project at all? I think it provides a nearly complete implementation of what you've been describing (named axes): https://github.com/fperez/datarray (no joke: 21 forks!) some links: http://inscight.org/2011/05/18/episode_13/ https://convore.com/python-scientific-computing/data-array-in-numpy/ I'm excited that many more people seem to be excited about making this kind of functionality available in the scientific Python stack so understanding every perspective and set of requirements makes a big difference =) best, Wes
I did take a look at it. It looked way heavier than I needed or wanted, plus last time I looked it didn't support fancy indexing on axes... It does support indexing on 'ticks' though. There is a bit of wheel inventing going on, but I think that's OK, since things should be well worked out and experimented with before becoming lower level... I think my suggestion for adding an index of unique ids as a tuple, like shape, that is maintained though array manipulations is a good one though. It would make implementing any of these axes indexing attempts much easier and more robust. I'm sure the dataarray code could be greatly simplified by this addition to ndarray, just as mine would. This 'uniqueid' tuple could even be extended to make keeping track of ticks easier: On Jul 13, 2011, at 8:26 PM, Wes McKinney wrote:
Have you guys been following the DataArray discussions or project at all? I think it provides a nearly complete implementation of what you've been describing (named axes):
https://github.com/fperez/datarray
(no joke: 21 forks!)
some links: http://inscight.org/2011/05/18/episode_13/ https://convore.com/python-scientific-computing/data-array-in-numpy/
I'm excited that many more people seem to be excited about making this kind of functionality available in the scientific Python stack so understanding every perspective and set of requirements makes a big difference =)
best, Wes
participants (3)
-
Craig Yoshioka
-
Sam Quinan
-
Wes McKinney