[Numpy-discussion] Datarray BoF, part2

Bruce Southey bsouthey at gmail.com
Wed Jul 21 13:41:05 EDT 2010


On 07/21/2010 11:56 AM, John Salvatier wrote:
> I don't really know much about this topic, but what about a flag at 
> array creation time (or whenever you define labels) that says whether 
> valid indexes will be treated as labels or indexes for that array?
>
> On Wed, Jul 21, 2010 at 9:37 AM, Keith Goodman <kwgoodman at gmail.com 
> <mailto:kwgoodman at gmail.com>> wrote:
>
>     About a dozen people attended what was billed as a continuation of the
>     SciPy 2010 datarray BoF. We met at UC Berkeley on July 19 as part of
>     the py4science series.
>
>     A datarray is a subclass of a Numpy array that adds the ability to
>     label the axes and to label the elements along each axis.
>
>     We spent most of the time discussing how to index with tick labels.
>     The main issue is with integers: is an integer index a tick name or a
>     position index?
>
>     At the top level, datarrays always use regular Numpy indexing: an int
>     is a position, never a label. So darr[0] always returns the first
>     element of the datarray.
>
>     The ambiguity occurs in specialized indexing methods that allow
>     indexing by tick label name (because the name could be an int). To
>     break the ambiguity, the proposal was to provide several tick indexing
>     methods[1]:
>
>     1. Integers are always labels
>     2. Integers are never treated as labels
>     3. Try 1, then 2
>
>     We also discussed allowing axis labels to be any hashable object
>     (currently only strings are allowed). The main problem: integers.
>     Currently if an axis is labeled, say, "time", you can do
>     darr.sum(axis="time"). What happens when an axis is labeled with an
>     int? What does the 2 in darr.sum(axis=2) refer to? A position or a
>     label? The same problem exists for floats since a float is (currently)
>     a valid axis for Numpy arrays.
>
>     References:
>     [1]
>     http://github.com/fperez/datarray/commit/3c5151baa233675b355058eb3ba028d2629bece5
>     _______________________________________________
>     NumPy-Discussion mailing list
>     NumPy-Discussion at scipy.org <mailto:NumPy-Discussion at scipy.org>
>     http://mail.scipy.org/mailman/listinfo/numpy-discussion
>
>
>
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>    
The current implemented option of allowing strings is the only practical 
option and I think that most other related languages also impose this 
constraint. Otherwise we will effectively break compatibility with 
Python and numpy because darr[0] can result in different answers 
depending on the type of object involved - especially if you are using 
views and forget the actual object type.

I do think that we do have to avoid adding complexity that increases 
runtime like looking for the label 2 when it should be the second axis. 
Also we have to avoid situations that lead to input errors like flag 
values or extra arguments.

Bruce








-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20100721/96016833/attachment.html>


More information about the NumPy-Discussion mailing list