[Numpy-discussion] BOF notes: Fernando's proposal: NumPy ndarray with named axes

Fri Jul 9 16:17:47 EDT 2010

On Fri, Jul 9, 2010 at 11:42 AM, Rob Speer <rspeer at mit.edu> wrote:
> Now, the one part I've implemented that I just made up instead of
> looking to the SciPy consensus (because there was no SciPy consensus)
> was how to refer to multiple labeled axes without repeating ".axis"
> all over the place. My choice, which I call "magical axis attributes",
> is to have arr.somelabel == arr.axis.somelabel whenever it doesn't
> mean something else. This turns the call
>  arr.axis.country.named['Netherlands'].axis.year[-1]
> into:
>  arr.country.named['Netherlands'].year[-1]
>
> I got a message from Fernando Perez saying that he didn't like the
> magical axis attributes, for the expected reason that it's
> inconsistent. You shouldn't have to refer to your axis differently
> just because you called it something like "mean". Another problem that
> just occurred to me is that
> datarray-using code could break just because DataArray, or even
> ndarray itself, grew a new method.
>
> I like the syntax that magical attributes provide, but I'm willing to
> consider other options. Here's one:
>
> The __getattr__ only does its magic on attribute names that end in
> "_index" or "_named", which should not conflict with other method
> names. "arr.foo_index[3]" is the same as "arr.axis.foo[3]".
> Furthermore, "arr.foo_named['bar']" is the same as
> "arr.axis.foo.named['bar']". Then the above lookup becomes:
>  arr.country_named['Netherlands'].year_index[-1]
>
> I don't find this as appealing as magical attributes, but perhaps it's
> more responsible. I'd like to know what other people think, so let me
> summarize and name the existing proposals:
>
> arr.axis.country.named['Netherlands'].axis.year[-1]   # the default
> option -- works in any case
> arr[ arr.aix.country.named['Netherlands'].year[-1] ]   # the "stuple" option
> arr.country.named['Netherlands'].year[-1]                  # the
> "magical" option
> arr.country_named['Netherlands'].year_index[-1]    # the "semi-magical" option
>
> -- Rob
>
> On Fri, Jul 9, 2010 at 1:39 AM, Rob Speer <rspeer at mit.edu> wrote:
>> http://github.com/rspeer/datarray represents my best guess at the
>> SciPy BOF consensus. I recently switched the method of accessing named
>> ticks from .named() to .named[] based on further discussion here.
>>
>> My implementation is still missing the case with named ticks but
>> positional axes, however. That is, you should be able to use .named
>> directly on the top-level datarray without referring to any axis
>> labels, to say something like arr.named['Netherlands', 2010], but you
>> can't yet.
>> -- Rob
>>
>> On Thu, Jul 8, 2010 at 11:44 PM, Keith Goodman <kwgoodman at gmail.com> wrote:
>>> On Thu, Jul 8, 2010 at 1:20 PM, Fernando Perez <fperez.net at gmail.com> wrote:
>>>
>>>> The consensus at the  BoF (not that it means it's set in stone, simply
>>>> that there was  good chance for back-and-forth on the topic with many
>>>> voices) was that:
>>>>
>>>> 1. There are valid use cases for 'integer ticks',  i.e. integers that
>>>> index arbitrarily into an  array instead of in 0..N-1 fashion.
>>>>
>>>> 2. That having plain arr[0] give anything but the first element in arr
>>>> would be way too confusing in practice, and likely to cause too many
>>>> problems.
>>>>
>>>> 3. That the  best solution to allow integer ticks while retaining
>>>> 'normal' indexing semantics for integers would be to have
>>>>
>>>> arr[int] -> normal indexing
>>>> arr.somethin[int] -> tick-based indexing, where an int can mean anything.
>>>
>>> Has the Scipy 2010 BOF consensus been implemented in anyone's fork? I
>>> don't understand the indexing so I'd like to try it.
>>> _______________________________________________
>>> NumPy-Discussion mailing list
>>> NumPy-Discussion at scipy.org
>>> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>>>
>>
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>

I personally find the magic attributes most appealing as well. I don't
like the pseudomagic choice. I think what makes the magic attributes
appealing is that it's so much less verbose than the
alternatives--that is, axis.row --> row. While pseudo-magics is
conceptually like magic attributes with decreased chance of conflicts,
in practice it seems to merely turn that dot into an underscore--that
is, axis.row --> axis_row.

We'd still be able to do axis.row as it is, right? (I've been too busy
being my parents' IT guy to get my hands dirty :( ) Maybe that would
be the way to go--I mean, you have the option of the nice magic
attribute action, but if it bothers you or you want your datarray to
be more robust or whatever, you can use axis.row throughout. Maybe we
could even have an enable/disable flag? I dunno.

I almost feel like we should come up with some sort of hypothetical
case of a datarray that we want to do specific things with, so we can
talk about how we would do those things with a concrete example. It
should probably be at least 3d. Maybe I'll mock one up over my lunch
break.

Oh, and in case anyone missed this email:

On Thu, Jul 8, 2010 at 12:55 PM, Keith Goodman <kwgoodman at gmail.com> wrote:
> What do you think of adding a ticks parameter to DataArray? Would that
> make sense?
>
> Current behavior:
>
>>> x = DataArray([[1, 2], [3, 4]], (('row', ['A','B']), ('col', ['C', 'D'])))
>>> x.axes
> (Axis(label='row', index=0, ticks=['A', 'B']),
>  Axis(label='col', index=1, ticks=['C', 'D']))
>
> Proposed ticks as separate input parameter:
>
>>> x = DataArray([[1, 2], [3, 4]], labels=('row', 'col'), ticks=[['A', 'B'], ['C', 'D']])
>
> I think this would make it easier for new users to construct a
> DataArray with ticks just from looking at the function signature. It
> would match the function signature of Axis. My use case is to use
> ticks only and not names axes (at first), so:
>
>>> x = DataArray([[1, 2], [3, 4]], labels=None, ticks=[['A', 'B'], ['C', 'D']])
>
> instead of the current:
>
>>> x = DataArray([[1, 2], [3, 4]], ((None, ['A','B']), (None, ['C', 'D'])))
>
> It might also cause less typos (parentheses matching) at the command line.
>
> I've only made a few DataArrays so I don't understanding the
> ramifications of what I am suggesting.
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>

I was going to reply to it after I considered its contents but kinda
forgot until now.

Anyways: while I like the idea of having ticks that correspond to
their axis being next to each other as the current behavior goes, I
find this alternative syntax easier to read, probably due to less
parentheses.

At any rate, this is definitely worth discussion imo.

--Josh