[Numpy-discussion] Fancy-indexing reorders output in corner cases?

Olivier Delalleau shish at keba.be
Tue May 15 09:03:26 EDT 2012


2012/5/15 Travis Oliphant <travis at continuum.io>

>
> On May 14, 2012, at 7:07 PM, Stéfan van der Walt wrote:
>
> > Hi Zach
> >
> > On Mon, May 14, 2012 at 4:33 PM, Zachary Pincus <zachary.pincus at yale.edu>
> wrote:
> >> The below seems to be a bug, but perhaps it's unavoidably part of the
> indexing mechanism?
> >>
> >> It's easiest to show via example... note that using "[0,1]" to pull two
> columns out of the array gives the same shape as using ":2" in the simple
> case, but when there's additional slicing happening, the shapes get
> transposed or something.
> >
> > When fancy indexing and slicing is mixed, the resulting shape is
> > essentially unpredictable.  The "correct" way to do it is to only use
> > fancy indexing, i.e. generate the indices of the sliced dimension as
> > well.
>
> This is not quite accurate.   It is not unpredictable.  It is very
> predictable, but a bit (too) complicated in the most general case.  The
> problem occurs when you "intermingle" fancy indexing with slice notation
> (and for this purpose integer selection is considered "fancy-indexing").
> While in simple cases you can think that [0,1] is equivalent to :2 --- it
> is not because fancy-indexing uses "zip-based ideas" instead of
> cross-product based ideas.
>
> The problem in general is how to make sense of something like
>
> a[:, :, in1, in2]
>
> If you keep fancy indexing to one side of the slice notation only, then
> you get what you expect.   The shape of the output will be the first two
> dimensions of a + the broadcasted shape of in1 and in2 (where integers are
> interpreted as fancy-index arrays).
>
> So, let's say a is (10,9,8,7)  and in1 is (3,4) and in2 is (4,)
>
> The shape of the output will be (10,9,3,4) filled with essentially
> a[:,:,i,j] = a[:,:,in1[i,j], in2[j]]
>
> What happens, though when you have
>
> a[:, in1 :, in2]?
>
> in1 and in2 are broadcasted together to create a two-dimensional
> "sub-space" that must fit somewhere.   Where should it go?   Should it
> replace in1 or in2?    I.e. should the output be
>
> (10,3,4,8) or (10,8,3,4).
>
> To "resolve" this ambiguity, the code sends the (3,4) sub-space to the
> front of the "dimensions" and returns (3,4,10,8).   In retro-spect, the
> code should raise an error as I doubt anyone actually relies on this
> behavior, and then we could have "done the right" thing for situations like
> in1 being an integer which actually makes some sense and should not have
> been confused with the "general case"
>
> In this particular case you might also think that we could say the result
> should be (10,3,8,4) but there is no guarantee that the number of
> dimensions that should be appended by the "fancy-indexing" objects will be
> the same as the number of dimensions replaced.    Again, this is how
> fancy-indexing combines with other fancy-indexing objects.
>
> So, the behavior is actually quite predictable, it's just that in some
> common cases it doesn't do what you would expect --- especially if you
> think that [0,1] is "the same" as :2.   When I wrote this code to begin
> with I should have raised an error and then worked in the cases that make
> sense.    This is a good example of making the mistake of thinking that
> it's better to provide something very general rather than just raise an
> error when an obvious and clear solution is not available.
>
> There is the possibility that we could now raise an error in NumPy when
> this situation is encountered because I strongly doubt anyone is actually
> relying on the current behavior.    I would like to do this, actually, as
> soon as possible.  Comments?
>

+1 to raise an error instead of an unintuitive behavior.

-=- Olivier
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20120515/0305a5bf/attachment.html>


More information about the NumPy-Discussion mailing list