[Numpy-discussion] Advanced indexing: "fancy" vs. orthogonal

Thu Apr 2 10:15:17 EDT 2015

On Thu, Apr 2, 2015 at 1:29 AM, Stephan Hoyer <shoyer at gmail.com> wrote:

> On Wed, Apr 1, 2015 at 7:06 AM, Jaime Fernández del Río <
> jaime.frio at gmail.com> wrote:
>
>> Is there any other package implementing non-orthogonal indexing aside
>> from numpy?
>>
>
> I think we can safely say that NumPy's implementation of broadcasting
> indexing is unique :).
>
> The issue is that many other packages rely on numpy for implementation of
> custom array objects (e.g., scipy.sparse and scipy.io.netcdf). It's not
> immediately obvious what sort of indexing these objects represent.
>
> If the functionality is lacking, e,g, use of slices in `np.ix_`, I'm all
>> for improving that to provide the full functionality of "orthogonal
>> indexing". I just need a little more convincing that those new
>> attributes/indexers are going to ever see any real use.
>>
>
> Orthogonal indexing is close to the norm for packages that implement
> labeled data structures, both because it's easier to understand and
> implement, and because it's difficult to maintain associations with labels
> through complex broadcasting indexing.
>
> Unfortunately, the lack of a full featured implementation of orthogonal
> indexing has lead to that wheel being reinvented at least three times (in
> Iris, xray [1] and pandas). So it would be nice to have a canonical
> implementation that supports slices and integers in numpy for that reason
> alone. This could be done by building on the existing `np.ix_` function,
> but a new indexer seems more elegant: there's just much less noise with
> `arr.ix_[:1, 2, [3]]` than `arr[np.ix_(slice(1), 2, [3])]`.
>
> It's also well known that indexing with __getitem__ can be much slower
> than np.take. It seems plausible to me that a careful implementation of
> orthogonal indexing could close or eliminate this speed gap, because the
> model for orthogonal indexing is so much simpler than that for broadcasting
> indexing: each element of the key tuple can be applied separately along the
> corresponding axis.
>
> So I think there could be a real benefit to having the feature in numpy.
> In particular, if somebody is up for implementing it in C or Cython, I
> would be very pleased.
>
>  Cheers,
> Stephan
>
> [1] Here is my implementation of remapping from orthogonal to broadcasting
> indexing. It works, but it's a real mess, especially because I try to
> optimize by minimizing the number of times slices are converted into arrays:
>
> https://github.com/xray/xray/blob/0d164d848401209971ded33aea2880c1fdc892cb/xray/core/indexing.py#L68
>
>
I believe you can leave all slices unchanged if you later reshuffle your
axes. Basically all the fancy-indexed axes go in the front of the shape in
order, and the subspace follows, e.g.:

>>> a = np.arange(60).reshape(3, 4, 5)
>>> a[np.array([1])[:, None], ::2, np.array([1, 2, 3])].shape
(1, 3, 2)

So you would need to swap the second and last axes and be done. You would
not get a contiguous array without a copy, but that's a different story.
Assigning to an orthogonally indexed subarray is an entirely different
beast, not sure if there is a use case for that.

We probably need more traction on the "should this be done?" discussion
than on the "can this be done?" one, the need for a reordering of the axes
swings me slightly in favor, but I mostly don't see it yet. Nathaniel
usually has good insights on who we are, where do we come from, where are
we going to, type of questions, would be good to have him chime in.

Jaime

-- 
(\__/)
( O.o)
( > <) Este es Conejo. Copia a Conejo en tu firma y ayúdale en sus planes
de dominación mundial.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20150402/22f0868a/attachment.html>