[Numpy-discussion] Advanced indexing: "fancy" vs. orthogonal

Thu Apr 2 23:20:06 EDT 2015

On Thu, Apr 2, 2015 at 7:30 PM, Matthew Brett <matthew.brett at gmail.com>
wrote:

> Hi,
>
> On Thu, Apr 2, 2015 at 6:09 PM,  <josef.pktd at gmail.com> wrote:
> > On Thu, Apr 2, 2015 at 8:02 PM, Eric Firing <efiring at hawaii.edu> wrote:
> >> On 2015/04/02 1:14 PM, Hanno Klemm wrote:
> >>> Well, I have written quite a bit of code that relies on fancy
> >>> indexing, and I think the question, if the behaviour of the []
> >>> operator should be changed has sailed with numpy now at version 1.9.
> >>> Given the amount packages that rely on numpy, changing this
> >>> fundamental behaviour would not be a clever move.
> >>
> >> Are you *positive* that there is no clever way to make a transition?
> >> It's not worth any further thought?
> >
> > I guess it would be similar to python 3 string versus bytes, but
> > without the overwhelming benefits.
> >
> > I don't think I would be in favor of deprecating fancy indexing even
> > if it were possible. In general, my impression is that if there is a
> > trade-off in numpy between powerful machinery versus easy to learn and
> > teach, then the design philosophy when in favor of power.
> >
> > I think numpy indexing is not too difficult and follows a consistent
> > pattern, and I completely avoid mixing slices and index arrays with
> > ndim > 2.
>
> I'm sure y'all are totally on top of this, but for myself, I would
> like to distinguish:
>
> * fancy indexing with boolean arrays - I use it all the time and don't
> get confused;
> * fancy indexing with non-boolean arrays - horrendously confusing,
> almost never use it, except on a single axis when I can't confuse it
> with orthogonal indexing:
>
> In [3]: a = np.arange(24).reshape(6, 4)
>
> In [4]: a
> Out[4]:
> array([[ 0,  1,  2,  3],
>        [ 4,  5,  6,  7],
>        [ 8,  9, 10, 11],
>        [12, 13, 14, 15],
>        [16, 17, 18, 19],
>        [20, 21, 22, 23]])
>
> In [5]: a[[1, 2, 4]]
> Out[5]:
> array([[ 4,  5,  6,  7],
>        [ 8,  9, 10, 11],
>        [16, 17, 18, 19]])
>
> I also remember a discussion with Travis O where he was also saying
> that this indexing was confusing and that it would be good if there
> was some way to transition to what he called outer product indexing (I
> think that's the same as 'orthogonal' indexing).
>
> > I think it should be DOA, except as a discussion topic for numpy 3000.
>
> I think there are two proposals here:
>
> 1) Add some syntactic sugar to allow orthogonal indexing of numpy
> arrays, no backward compatibility break.
>
> That seems like a very good idea to me - were there any big objections to
> that?
>
> 2) Over some long time period, move the default behavior of np.array
> non-boolean indexing from the current behavior to the orthogonal
> behavior.
>
> That is going to be very tough, because it will cause very confusing
> breakage of legacy code.
>
> On the other hand, maybe it is worth going some way towards that, like
> this:
>
> * implement orthogonal indexing as a method arr.sensible_index[...]
> * implement the current non-boolean fancy indexing behavior as a
> method - arr.crazy_index[...]
> * deprecate non-boolean fancy indexing as standard arr[...] indexing;
> * wait a long time;
> * remove non-boolean fancy indexing as standard arr[...] (errors are
> preferable to change in behavior)
>
> Then if we are brave we could:
>
> * wait a very long time;
> * make orthogonal indexing the default.
>
> But the not-brave steps above seem less controversial, and fairly
> reasonable.
>
> What about that as an approach?
>

Your option 1 was what was being discussed before the posse was assembled
to bring fancy indexing before justice... ;-)

My background is in image processing, and I have used fancy indexing in all
its fanciness far more often than orthogonal or outer product indexing. I
actually have a vivid memory of the moment I fell in love with NumPy: after
seeing a code snippet that ran a huge image through a look-up table by
indexing the LUT with the image. Beautifully simple. And here
<http://stackoverflow.com/questions/12014186/fancier-fancy-indexing-in-numpy>
is a younger me, learning to ride NumPy without the training wheels.

Another obvious use case that you can find all over the place in
scikit-image is drawing a curve on an image from the coordinates.

If there is such strong agreement on an orthogonal indexer, we might as
well go ahead an implement it. But before considering any bolder steps, we
should probably give it a couple of releases to see how many people out
there really use it.

Jaime

P.S. As an aside on the remapping of axes when arrays and slices are mixed,
there really is no better way. Once you realize that the array indexing a
dimension does not have to be 1-D, it should clearly appear that what seems
the obvious way does not generalize to the general case. E.g.:

One may rightfully think that:

>>> a = np.arange(60).reshape(3, 4, 5)
>>> a[np.array([1])[:, None], ::2, [0, 1, 3]].shape
(1, 3, 2)

should not reorder the axes, and return an array of shape (1, 2, 3). But
what do you do in the following case?

>>> idx0 = np.random.randint(3, size=(10, 1, 10))
>>> idx2 = np.random.randint(5, size=(1, 20, 1))
>>> a[idx0, ::2, idx2].shape
(10, 20, 10, 2)

What is the right place for that 2 now?

-- 
(\__/)
( O.o)
( > <) Este es Conejo. Copia a Conejo en tu firma y ayúdale en sus planes
de dominación mundial.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20150402/278c1d3b/attachment.html>