[Numpy-discussion] PEP: named axis

Thu Feb 5 23:29:42 EST 2009

On Thu, Feb 5, 2009 at 22:17, Travis Oliphant <oliphant at enthought.com> wrote:
> Gael Varoquaux wrote:
>> On Thu, Feb 05, 2009 at 05:08:49PM -0600, Travis E. Oliphant wrote:
>>
>>> I've been fairly quiet on this list for awhile due to work and family
>>> schedule, but I think about how things can improve regularly.    One
>>> feature that's been requested by a few people is the ability to select
>>> multiple fields from a structured array.
>>>
>>
>> Hey Travis,
>>
>> I have no opinion on the above, as I don't have this use case. However, as
>> you are talking about implementing something, I jump on the occasion to
>> suggest another gadget, slightly related: I would like named axis.
>> Suppose you have a 5D array, I would like to be able to give each axis
>> names, eg (to chose an example you might be familiar with) ('Frontal',
>> 'Lateral', 'Axial', 'Time', 'Subjects'). And if this could be understood
>> be numpy operations (say ufuncs and fancy indexing) so that I could do (a
>> is my 5D array):
>>
>>
> This could be implemented but would require adding information to the
> NumPy array.

More than that, though. Every function and method that takes an axis
or reduces an axis will need to be rewritten. For that reason, I'm -1
on the proposal.

>  I've been thinking for a long time that we ought to add a
> "dictionary" attribute to the NumPy array (i.e. a new member to the
> PyArrayObject data-structure).   A lot of subclasses of NumPy arrays
> just add meta-information that could be stored there.
>
> Then, it would be a trivial thing to check to see if the dictionary had
> say an "axis_mapping" keyword and if so then do the conversions found
> there.
>
> I think this has been brought up before, though.  What do people think
> about adding a default dictionary to every instance of a NumPy array.
>
> The question that always arises in this context which I don't have good
> answers for is what do you do with the dictionary on the output of
> ufuncs?   One approach is to always return NULL for the dictionary and
> don't try and guess.   A slightly different one is to at least handle
> the case where all inputs have the same dictionary and return a new
> "shallow" copy of that.

I'm of the opinion that it should never guess. We have no idea what
semantics are being placed on the dict. Even in the case where all of
the inputs have the same dict, the operation may easily invalidate the
metadata. For example, a reduction on one of these axis-decorated
arrays would make the axis labels incorrect.

-- 
Robert Kern

"I have come to believe that the whole world is an enigma, a harmless
enigma that is made terrible by our own mad attempt to interpret it as
though it had an underlying truth."
  -- Umberto Eco