[Numpy-discussion] BOF notes: Fernando's proposal: NumPy ndarray with named axes

Rob Speer rspeer at MIT.EDU
Thu Jul 8 02:25:29 EDT 2010


Glad I finally found this discussion.

I implemented some of the ideas from the SciPy BOAF discussion, and
Joshua has already merged them into his datarray on GitHub (thanks,
Joshua, for being so fast on the merge button).

To introduce these changes, here's a couple of examples of how you
could index into a matrix whose rows represent countries, and whose
columns represent something that is observed every four years
(hmm...).
>>> arr.country.named('Netherlands').year.named(2010)
>>> arr.country.named('Spain').year.named(slice(1994, 2010))
>>> arr.year.named(2006).country[0:2]

First of all, a bit of terminology. Axes can have labels. Ticks (which
are particular rows, columns, etc.) can have names. Axes and ticks
also have indices (the sequential numbers they've always had). Feel
free to suggest alternate terminology, I just used what sounded the
most natural to me in the method names.

Addressing by indices and addressing by tick names are separate, which
allows integers to be tick names without a conflict. You use the
"named" method of an axis to address it by name, while __getitem__
only addresses it by indices. You can still take slices of names
(makes sense for things like years), but you have to spell out "slice"
because it's not inside square brackets.

Then, at the axis level: My impression from the SciPy discussion was
that people wanted to be able to look up multiple labeled axes at once
without repeating themselves, and .aix and stuples were not
satisfying, but we didn't come up with anything else during the
discussion.

My choice was to add a bit of attribute magic: if you get an attribute
of a datarray that is (a) not a real attribute and (b) matches the
label of one of its axes, you'll get that axis. So "arr.axis.country"
can be shortened to "arr.country", for example, but if you decided to
name your axis "T", you would be stuck with "arr.axis.T".

So this is the state of the code at http://github.com/rspeer/datarray
(and also at http://github.com/jesusabdullah/datarray now). I'll even
try to make the documentation catch up with this code if people think
the changes are good.
-- Rob



More information about the NumPy-Discussion mailing list