[Numpy-discussion] Advanced indexing: "fancy" vs. orthogonal

Stephan Hoyer shoyer at gmail.com
Thu Apr 2 04:29:26 EDT 2015


On Wed, Apr 1, 2015 at 7:06 AM, Jaime Fernández del Río <
jaime.frio at gmail.com> wrote:

> Is there any other package implementing non-orthogonal indexing aside from
> numpy?
>

I think we can safely say that NumPy's implementation of broadcasting
indexing is unique :).

The issue is that many other packages rely on numpy for implementation of
custom array objects (e.g., scipy.sparse and scipy.io.netcdf). It's not
immediately obvious what sort of indexing these objects represent.

If the functionality is lacking, e,g, use of slices in `np.ix_`, I'm all
> for improving that to provide the full functionality of "orthogonal
> indexing". I just need a little more convincing that those new
> attributes/indexers are going to ever see any real use.
>

Orthogonal indexing is close to the norm for packages that implement
labeled data structures, both because it's easier to understand and
implement, and because it's difficult to maintain associations with labels
through complex broadcasting indexing.

Unfortunately, the lack of a full featured implementation of orthogonal
indexing has lead to that wheel being reinvented at least three times (in
Iris, xray [1] and pandas). So it would be nice to have a canonical
implementation that supports slices and integers in numpy for that reason
alone. This could be done by building on the existing `np.ix_` function,
but a new indexer seems more elegant: there's just much less noise with
`arr.ix_[:1, 2, [3]]` than `arr[np.ix_(slice(1), 2, [3])]`.

It's also well known that indexing with __getitem__ can be much slower than
np.take. It seems plausible to me that a careful implementation of
orthogonal indexing could close or eliminate this speed gap, because the
model for orthogonal indexing is so much simpler than that for broadcasting
indexing: each element of the key tuple can be applied separately along the
corresponding axis.

So I think there could be a real benefit to having the feature in numpy. In
particular, if somebody is up for implementing it in C or Cython, I would
be very pleased.

 Cheers,
Stephan

[1] Here is my implementation of remapping from orthogonal to broadcasting
indexing. It works, but it's a real mess, especially because I try to
optimize by minimizing the number of times slices are converted into arrays:
https://github.com/xray/xray/blob/0d164d848401209971ded33aea2880c1fdc892cb/xray/core/indexing.py#L68
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20150402/28b0dd55/attachment.html>


More information about the NumPy-Discussion mailing list