[Numpy-discussion] Advanced indexing: "fancy" vs. orthogonal

josef.pktd at gmail.com josef.pktd at gmail.com
Thu Apr 2 23:18:35 EDT 2015


On Thu, Apr 2, 2015 at 10:30 PM, Matthew Brett <matthew.brett at gmail.com> wrote:
> Hi,
>
> On Thu, Apr 2, 2015 at 6:09 PM,  <josef.pktd at gmail.com> wrote:
>> On Thu, Apr 2, 2015 at 8:02 PM, Eric Firing <efiring at hawaii.edu> wrote:
>>> On 2015/04/02 1:14 PM, Hanno Klemm wrote:
>>>> Well, I have written quite a bit of code that relies on fancy
>>>> indexing, and I think the question, if the behaviour of the []
>>>> operator should be changed has sailed with numpy now at version 1.9.
>>>> Given the amount packages that rely on numpy, changing this
>>>> fundamental behaviour would not be a clever move.
>>>
>>> Are you *positive* that there is no clever way to make a transition?
>>> It's not worth any further thought?
>>
>> I guess it would be similar to python 3 string versus bytes, but
>> without the overwhelming benefits.
>>
>> I don't think I would be in favor of deprecating fancy indexing even
>> if it were possible. In general, my impression is that if there is a
>> trade-off in numpy between powerful machinery versus easy to learn and
>> teach, then the design philosophy when in favor of power.
>>
>> I think numpy indexing is not too difficult and follows a consistent
>> pattern, and I completely avoid mixing slices and index arrays with
>> ndim > 2.
>
> I'm sure y'all are totally on top of this, but for myself, I would
> like to distinguish:
>
> * fancy indexing with boolean arrays - I use it all the time and don't
> get confused;
> * fancy indexing with non-boolean arrays - horrendously confusing,
> almost never use it, except on a single axis when I can't confuse it
> with orthogonal indexing:
>
> In [3]: a = np.arange(24).reshape(6, 4)
>
> In [4]: a
> Out[4]:
> array([[ 0,  1,  2,  3],
>        [ 4,  5,  6,  7],
>        [ 8,  9, 10, 11],
>        [12, 13, 14, 15],
>        [16, 17, 18, 19],
>        [20, 21, 22, 23]])
>
> In [5]: a[[1, 2, 4]]
> Out[5]:
> array([[ 4,  5,  6,  7],
>        [ 8,  9, 10, 11],
>        [16, 17, 18, 19]])
>
> I also remember a discussion with Travis O where he was also saying
> that this indexing was confusing and that it would be good if there
> was some way to transition to what he called outer product indexing (I
> think that's the same as 'orthogonal' indexing).
>
>> I think it should be DOA, except as a discussion topic for numpy 3000.
>
> I think there are two proposals here:
>
> 1) Add some syntactic sugar to allow orthogonal indexing of numpy
> arrays, no backward compatibility break.
>
> That seems like a very good idea to me - were there any big objections to that?
>
> 2) Over some long time period, move the default behavior of np.array
> non-boolean indexing from the current behavior to the orthogonal
> behavior.
>
> That is going to be very tough, because it will cause very confusing
> breakage of legacy code.
>
> On the other hand, maybe it is worth going some way towards that, like this:
>
> * implement orthogonal indexing as a method arr.sensible_index[...]
> * implement the current non-boolean fancy indexing behavior as a
> method - arr.crazy_index[...]
> * deprecate non-boolean fancy indexing as standard arr[...] indexing;
> * wait a long time;
> * remove non-boolean fancy indexing as standard arr[...] (errors are
> preferable to change in behavior)
>
> Then if we are brave we could:
>
> * wait a very long time;
> * make orthogonal indexing the default.
>
> But the not-brave steps above seem less controversial, and fairly reasonable.
>
> What about that as an approach?

I also thought the transition would have to be something like that or
a clear break point, like numpy 3.0. I would be in favor something
like this for the axis swapping case with ndim>2.

However, before going to that, you would still have to provide a list
of behaviors that will be deprecated, and make a poll in various
libraries for how much it is actually used.

My impression is that fancy indexing is used more often than
orthogonal indexing (beyond the trivial case x[:, idx]).
Also, many usecases for orthogonal indexing moved to using pandas, and
numpy is left with non-orthogonal indexing use cases.
And third, fancy indexing is a superset of orthogonal indexing (with
proper broadcasting), and you still need to justify why everyone
should be restricted to the subset instead of a voluntary constraint
to use code that is easier to understand.

I checked numpy.random.choice which I would have implemented with
fancy indexing, but it uses only `take`, AFAICS.

Switching to using a explicit method is not really a problem for
maintained library code, but I still don't really see why we should do
this.

Josef

>
> Cheers,
>
> Matthew
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion



More information about the NumPy-Discussion mailing list