New subject: Fate of Numpy's Array Interface

13 Jul 2011

      I'm currently working on interfacing ndarrays with a custom C-representation
for n-dimensional arrays. My custom C code provides additional per-axis
information (labeling, spacing between samples / range of sample positions
along the axis, axis direction, cell vs.node centering, etc.) Subclassing
ndarray to hold onto this info is fairly simple, but getting numpy's methods
to intelligently modify that information when the array is sliced is
something that I'm still trying to figure out.

A robust way to attach per-axis info to a given ndarray, whether it just be
a label or some more complex structure, would definitely be something I (and
likely others) would find useful...

That said, I'd love to know more about how the idx_axes() structure in your
workaround works...

- Sam

On 7/13/11 12:00 PM, "numpy-discussion-request@scipy.org"
<numpy-discussion-request@scipy.org> wrote:
...
Date: Tue, 12 Jul 2011 16:39:47 -0700
From: Craig Yoshioka <craigyk@me.com>
Subject: [Numpy-discussion] named ndarray axes
To: NumPy-Discussion@scipy.org
Message-ID: <0FC8B43E-26CD-40ED-A6FA-59DD8D641998@me.com>
Content-Type: text/plain; CHARSET=US-ASCII
I brought up a while ago about how it would be nice if numpy arrays could have
their axes 'labeled'.    = I got an implementation that works pretty well for
me and in the process learned quite a few things, and was hoping to foster
some more discussion on this topic, as I think I have found a simple/flexible
solution to support this at the numpy level.
Here are *some* examples code snippets from my unit tests on 'Array':
a = Array((4,5,6))
# you can assign data to all axes by attribute:
    a.Axes.Label = (0:'z',1:'y',2:'x'}
# or add metadata to each individually:
    a.Axes[1].Vector = [0,1,0]
    a.Axes[2].Vector = [0,0,1]
# simple case int indexing
    b = a[0]
    assert b.shape == (5,6)
    assert b.Axes.Label == {0:'y',1:'x'}
    assert b.Axes.Vector == {0:[0,1,0],1:[0,0,1]}
# indexing with slices
    b = a[:,0,:]
    assert b.shape == (4,6)
    assert b.Axes.Label == {0:'z',1:'x'}
    assert b.Axes.Vector == {1:[0,0,1]}
# indexing with ellipsis
    b = a[...,0]
    assert b.shape == (4,5)
    assert b.Axes.Label == {0:'z',1:'y'}
# indexing with ellipsis, newaxis, etc.
    b = a[newaxis,...,2,newaxis]
    assert b.shape == (1,4,5,1)
    assert b.Axes.Label == {1:'z',2:'y'}
# indexing with lists
    b = a[[1,2],:,[1,2]]
    assert b.shape == (2,5)
    assert b.Axes.Label == {0:'z',1:'y'}
# most interesting examples, indexing with axes labels----------------
    # I was a bit confused about how to handle indexing with mixed
axes/non-axes indexes
    # IE: what does a['x',2:4]  mean?  on what axis is the 2:4 slice being
applied, the first? the first after 'x'?
    #       One option is to disallow mixing (simpler to implement,
understand?)
    #       Instead I chose to treat the axis indexing as a forced assignment
of an axis to a position.
# axis indexing that transposes the first two dimensions, but otherwise
does nothing
    b = a['y','z']
    assert b.shape == (5,4,6)
    assert b.Axes.Label == {0:'y',1:'z',2:'x'}
# abusing slices to allow specifying indexes for axes
    b = a['y':0,'z']
    assert b.shape == (4,6)
    assert b.Axes.Label == {0:'z',1:'x'}
# unfortunately that means a slice index on an axis must be written like
so:
    b = a['y':slice(0,2),'x','z']
    assert b.shape == (2,6,4)
    assert b.Axes.Label == {0:'y',1:'x',2:'z'}
b = a['y':[1,2,3],'x','z':slice(0:1)]
    # or due to the forced transposition, this is the same as:
    c = a['y','x','z'][[1,2,3],:,0:1]
assert b.shape == (3,6,1)
    assert b.Axes.Label == {0:'y',1:'x',2:'z'}
    assert b.shape == c.shape
    assert b.Axes == c.Axes
#-----------------------------------------------------------------------------
-----------
To do all this I essentially had to recreate the effects of numpy indexing on
axes....  This is not ideal, but so far I seem to have addressed most of the
indexing I've used, at least. Here is what __getitem__ looks like:
def __getitem__(self,idxs):
        filtered_idxs,transposed_axes,kept_axes = self.idx_axes(idxs)
        array = self.view(ndarray).transpose(transposed_axes)
        array = array[filtered_idxs]
        if isinstance(array,ndarray):
            array = array.view(Array)
            array.Axes = self.Axes.keep(kept_axes)
        return array
As you can see idx_axes() essentially recreates a lot of ndarray indexing
behavior, so that its effects can be explicitly handled.
Having done all this, I think the best way for numpy to support 'labeled' axes
in the future is by having numpy itself keep track of a very simple tuple
attribute, like shape, and leave more complex axis naming/labeling to
subclasses on the python side.  As an example, upon creating a new dimension
in an array, numpy assigns that dimension a semi-unique id, and this tuple
could be used in __array_finalize__.
For example my __array_finalize__ could look like:
def __array_finalize__(self,obj):
    if hasattr(obj,'axesdata'):
         for axesid in self.axes:
              if axesid in obj.axes:
                   self.axesdata[axesid] = obj.axesdata[axesid]
This would cover a lot more situations and lead to much simpler code since the
work required on the C side would be minimal, but still allow robust and
custom tracking and propagation of axes information.
Subclasses that tap into this data would react to the result of numpy
operations vs. having to predict/anticipate.
For example, my __getitem__, relying on the __array_finalize__ above, could
look like:
def __getitem__(self,idxs):
        filtered_idxs,transposed_axes= self.idx_axes(idxs)
        array = self.transpose(transposed_axes)
        return array[filtered_idxs]
Not shown is how much simpler and robust the code for idx_axes would then be.
I estimate it would go from 130 loc to < 20 loc.
Sorry for the extra long e-mail,
-Craig

Re: [Numpy-discussion] named ndarray axes

Sam Quinan

Sam Quinan

Craig Yoshioka

Wes McKinney

Craig Yoshioka

tags

participants (3)