[Numpy-discussion] Fancy-indexing reorders output in corner cases?

Travis Oliphant travis at continuum.io
Tue May 15 00:03:13 EDT 2012


On May 14, 2012, at 7:07 PM, Stéfan van der Walt wrote:

> Hi Zach
> 
> On Mon, May 14, 2012 at 4:33 PM, Zachary Pincus <zachary.pincus at yale.edu> wrote:
>> The below seems to be a bug, but perhaps it's unavoidably part of the indexing mechanism?
>> 
>> It's easiest to show via example... note that using "[0,1]" to pull two columns out of the array gives the same shape as using ":2" in the simple case, but when there's additional slicing happening, the shapes get transposed or something.
> 
> When fancy indexing and slicing is mixed, the resulting shape is
> essentially unpredictable.  The "correct" way to do it is to only use
> fancy indexing, i.e. generate the indices of the sliced dimension as
> well.

This is not quite accurate.   It is not unpredictable.  It is very predictable, but a bit (too) complicated in the most general case.  The problem occurs when you "intermingle" fancy indexing with slice notation (and for this purpose integer selection is considered "fancy-indexing").   While in simple cases you can think that [0,1] is equivalent to :2 --- it is not because fancy-indexing uses "zip-based ideas" instead of cross-product based ideas.   

The problem in general is how to make sense of something like

a[:, :, in1, in2]   

If you keep fancy indexing to one side of the slice notation only, then you get what you expect.   The shape of the output will be the first two dimensions of a + the broadcasted shape of in1 and in2 (where integers are interpreted as fancy-index arrays). 

So, let's say a is (10,9,8,7)  and in1 is (3,4) and in2 is (4,)

The shape of the output will be (10,9,3,4) filled with essentially a[:,:,i,j] = a[:,:,in1[i,j], in2[j]]

What happens, though when you have

a[:, in1 :, in2]? 

in1 and in2 are broadcasted together to create a two-dimensional "sub-space" that must fit somewhere.   Where should it go?   Should it replace in1 or in2?    I.e. should the output be 

(10,3,4,8) or (10,8,3,4).  

To "resolve" this ambiguity, the code sends the (3,4) sub-space to the front of the "dimensions" and returns (3,4,10,8).   In retro-spect, the code should raise an error as I doubt anyone actually relies on this behavior, and then we could have "done the right" thing for situations like in1 being an integer which actually makes some sense and should not have been confused with the "general case"  

In this particular case you might also think that we could say the result should be (10,3,8,4) but there is no guarantee that the number of dimensions that should be appended by the "fancy-indexing" objects will be the same as the number of dimensions replaced.    Again, this is how fancy-indexing combines with other fancy-indexing objects. 

So, the behavior is actually quite predictable, it's just that in some common cases it doesn't do what you would expect --- especially if you think that [0,1] is "the same" as :2.   When I wrote this code to begin with I should have raised an error and then worked in the cases that make sense.    This is a good example of making the mistake of thinking that it's better to provide something very general rather than just raise an error when an obvious and clear solution is not available.  

There is the possibility that we could now raise an error in NumPy when this situation is encountered because I strongly doubt anyone is actually relying on the current behavior.    I would like to do this, actually, as soon as possible.  Comments? 

-Travis




> 
> Stéfan
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion




More information about the NumPy-Discussion mailing list