[Numpy-discussion] BOF notes: Fernando's proposal: NumPy ndarray with named axes

Fri Jul 9 15:42:58 EDT 2010

Now, the one part I've implemented that I just made up instead of
looking to the SciPy consensus (because there was no SciPy consensus)
was how to refer to multiple labeled axes without repeating ".axis"
all over the place. My choice, which I call "magical axis attributes",
is to have arr.somelabel == arr.axis.somelabel whenever it doesn't
mean something else. This turns the call
  arr.axis.country.named['Netherlands'].axis.year[-1]
into:
  arr.country.named['Netherlands'].year[-1]

I got a message from Fernando Perez saying that he didn't like the
magical axis attributes, for the expected reason that it's
inconsistent. You shouldn't have to refer to your axis differently
just because you called it something like "mean". Another problem that
just occurred to me is that
datarray-using code could break just because DataArray, or even
ndarray itself, grew a new method.

I like the syntax that magical attributes provide, but I'm willing to
consider other options. Here's one:

The __getattr__ only does its magic on attribute names that end in
"_index" or "_named", which should not conflict with other method
names. "arr.foo_index[3]" is the same as "arr.axis.foo[3]".
Furthermore, "arr.foo_named['bar']" is the same as
"arr.axis.foo.named['bar']". Then the above lookup becomes:
  arr.country_named['Netherlands'].year_index[-1]

I don't find this as appealing as magical attributes, but perhaps it's
more responsible. I'd like to know what other people think, so let me
summarize and name the existing proposals:

arr.axis.country.named['Netherlands'].axis.year[-1]   # the default
option -- works in any case
arr[ arr.aix.country.named['Netherlands'].year[-1] ]   # the "stuple" option
arr.country.named['Netherlands'].year[-1]                  # the
"magical" option
arr.country_named['Netherlands'].year_index[-1]    # the "semi-magical" option

-- Rob

On Fri, Jul 9, 2010 at 1:39 AM, Rob Speer <rspeer at mit.edu> wrote:
> http://github.com/rspeer/datarray represents my best guess at the
> SciPy BOF consensus. I recently switched the method of accessing named
> ticks from .named() to .named[] based on further discussion here.
>
> My implementation is still missing the case with named ticks but
> positional axes, however. That is, you should be able to use .named
> directly on the top-level datarray without referring to any axis
> labels, to say something like arr.named['Netherlands', 2010], but you
> can't yet.
> -- Rob
>
> On Thu, Jul 8, 2010 at 11:44 PM, Keith Goodman <kwgoodman at gmail.com> wrote:
>> On Thu, Jul 8, 2010 at 1:20 PM, Fernando Perez <fperez.net at gmail.com> wrote:
>>
>>> The consensus at the  BoF (not that it means it's set in stone, simply
>>> that there was  good chance for back-and-forth on the topic with many
>>> voices) was that:
>>>
>>> 1. There are valid use cases for 'integer ticks',  i.e. integers that
>>> index arbitrarily into an  array instead of in 0..N-1 fashion.
>>>
>>> 2. That having plain arr[0] give anything but the first element in arr
>>> would be way too confusing in practice, and likely to cause too many
>>> problems.
>>>
>>> 3. That the  best solution to allow integer ticks while retaining
>>> 'normal' indexing semantics for integers would be to have
>>>
>>> arr[int] -> normal indexing
>>> arr.somethin[int] -> tick-based indexing, where an int can mean anything.
>>
>> Has the Scipy 2010 BOF consensus been implemented in anyone's fork? I
>> don't understand the indexing so I'd like to try it.
>> _______________________________________________
>> NumPy-Discussion mailing list
>> NumPy-Discussion at scipy.org
>> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>>
>