[Numpy-discussion] Advanced indexing: "fancy" vs. orthogonal

Sebastian Berg sebastian at sipsolutions.net
Thu Apr 2 05:00:37 EDT 2015


On Do, 2015-04-02 at 01:29 -0700, Stephan Hoyer wrote:
> On Wed, Apr 1, 2015 at 7:06 AM, Jaime Fernández del Río
> <jaime.frio at gmail.com> wrote:
>         Is there any other package implementing non-orthogonal
>         indexing aside from numpy?
> 
> 
> I think we can safely say that NumPy's implementation of broadcasting
> indexing is unique :).
> 
> 
> The issue is that many other packages rely on numpy for implementation
> of custom array objects (e.g., scipy.sparse and scipy.io.netcdf). It's
> not immediately obvious what sort of indexing these objects represent.
> 
> 
>         If the functionality is lacking, e,g, use of slices in
>         `np.ix_`, I'm all for improving that to provide the full
>         functionality of "orthogonal indexing". I just need a little
>         more convincing that those new attributes/indexers are going
>         to ever see any real use.
>         
> 
> 
> Orthogonal indexing is close to the norm for packages that implement
> labeled data structures, both because it's easier to understand and
> implement, and because it's difficult to maintain associations with
> labels through complex broadcasting indexing.
> 
> 
> Unfortunately, the lack of a full featured implementation of
> orthogonal indexing has lead to that wheel being reinvented at least
> three times (in Iris, xray [1] and pandas). So it would be nice to
> have a canonical implementation that supports slices and integers in
> numpy for that reason alone. This could be done by building on the
> existing `np.ix_` function, but a new indexer seems more elegant:
> there's just much less noise with `arr.ix_[:1, 2, [3]]` than
> `arr[np.ix_(slice(1), 2, [3])]`.
> 
> 
> It's also well known that indexing with __getitem__ can be much slower
> than np.take. It seems plausible to me that a careful implementation
> of orthogonal indexing could close or eliminate this speed gap,
> because the model for orthogonal indexing is so much simpler than that
> for broadcasting indexing: each element of the key tuple can be
> applied separately along the corresponding axis.
> 

Wrong (sorry, couldn't resist ;)), since 1.9. take is not typically
faster unless you have a small subspace ("subspace" are the
non-indexed/slice-indexed axes, though I guess small subspace is common
in some cases, i.e. Nx3 array), it should typically be noticeably slower
for large subspaces at the moment.

Anyway, unfortunately while orthogonal indexing may seem simpler, as you
probably noticed, mapping it fully featured to advanced indexing does
not seem like a walk in the park due to how axis remapping works when
you have a combination of slices and advanced indices.

It might be possible to basically implement a second MapIterSwapaxis in
addition to adding extra axes to the inputs (which I think would need a
post-processing step, but that is not that bad). If you do that, you can
mostly reuse the current machinery and avoid most of the really annoying
code blocks which set up the iterators for the various special cases.
Otherwise, for hacking it of course you can replace the slices by arrays
as well ;).

> 
> So I think there could be a real benefit to having the feature in
> numpy. In particular, if somebody is up for implementing it in C or
> Cython, I would be very pleased.
> 
> 
>  Cheers,
> 
> Stephan
> 
> 
> [1] Here is my implementation of remapping from orthogonal to
> broadcasting indexing. It works, but it's a real mess, especially
> because I try to optimize by minimizing the number of times slices are
> converted into arrays:
> https://github.com/xray/xray/blob/0d164d848401209971ded33aea2880c1fdc892cb/xray/core/indexing.py#L68
> 
> 
> 
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 819 bytes
Desc: This is a digitally signed message part
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20150402/7d2468a7/attachment.sig>


More information about the NumPy-Discussion mailing list