[Numpy-discussion] Datarray BoF, part2
Bruce Southey
bsouthey at gmail.com
Wed Jul 21 13:41:05 EDT 2010
On 07/21/2010 11:56 AM, John Salvatier wrote:
> I don't really know much about this topic, but what about a flag at
> array creation time (or whenever you define labels) that says whether
> valid indexes will be treated as labels or indexes for that array?
>
> On Wed, Jul 21, 2010 at 9:37 AM, Keith Goodman <kwgoodman at gmail.com
> <mailto:kwgoodman at gmail.com>> wrote:
>
> About a dozen people attended what was billed as a continuation of the
> SciPy 2010 datarray BoF. We met at UC Berkeley on July 19 as part of
> the py4science series.
>
> A datarray is a subclass of a Numpy array that adds the ability to
> label the axes and to label the elements along each axis.
>
> We spent most of the time discussing how to index with tick labels.
> The main issue is with integers: is an integer index a tick name or a
> position index?
>
> At the top level, datarrays always use regular Numpy indexing: an int
> is a position, never a label. So darr[0] always returns the first
> element of the datarray.
>
> The ambiguity occurs in specialized indexing methods that allow
> indexing by tick label name (because the name could be an int). To
> break the ambiguity, the proposal was to provide several tick indexing
> methods[1]:
>
> 1. Integers are always labels
> 2. Integers are never treated as labels
> 3. Try 1, then 2
>
> We also discussed allowing axis labels to be any hashable object
> (currently only strings are allowed). The main problem: integers.
> Currently if an axis is labeled, say, "time", you can do
> darr.sum(axis="time"). What happens when an axis is labeled with an
> int? What does the 2 in darr.sum(axis=2) refer to? A position or a
> label? The same problem exists for floats since a float is (currently)
> a valid axis for Numpy arrays.
>
> References:
> [1]
> http://github.com/fperez/datarray/commit/3c5151baa233675b355058eb3ba028d2629bece5
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at scipy.org <mailto:NumPy-Discussion at scipy.org>
> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>
>
>
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>
The current implemented option of allowing strings is the only practical
option and I think that most other related languages also impose this
constraint. Otherwise we will effectively break compatibility with
Python and numpy because darr[0] can result in different answers
depending on the type of object involved - especially if you are using
views and forget the actual object type.
I do think that we do have to avoid adding complexity that increases
runtime like looking for the label 2 when it should be the second axis.
Also we have to avoid situations that lead to input errors like flag
values or extra arguments.
Bruce
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20100721/96016833/attachment.html>
More information about the NumPy-Discussion
mailing list