[Numpy-discussion] NEP 21: Simplified and explicit advanced indexing
jni.soma at gmail.com
Wed Jun 27 01:19:58 EDT 2018
Let me start by thanking Robert for articulating my viewpoints far
better than I could have done myself. I want to explicitly flag the
following statements for endorsement:
> *I would still reserve warnings and deprecation for the cases where
> the current behavior gives us something that no one wants. Those are
> the real traps that people need to be warned away from.*
> *In the post-NEP .oindex/.vindex order, everyone can get the behavior
> that they want. Your argument for deprecation is now just about what
> the default is, the semantics that get pride of place with the
> shortest spelling. I am sympathetic to the feeling like you wish you
> had a time machine to go fix a design with your new insight. But it
> seems to me that just changing which semantics are the default has
> relatively attenuated value while breaking compatibility for a
> fundamental feature of numpy has significant costs. Just introducing
> .oindex is the bulk of the value of this NEP. Everything else is
> window dressing.*
> *If someone is mixing slices and integer indices, that's a really good
> sign that they thought indexing behaved in a different way (e.g.
> orthogonal indexing).*
I would offer the exception of trailing slices to this statement,
In : from skimage import data
In : astro = data.astronaut()
In : astro.shape
Out: (512, 512, 3)
In : rr, cc = np.array([1, 3, 3, 3]), np.array([1, 8, 9, 10])
In : astro[rr, cc].shape
Out: (4, 3)
In : astro[rr, cc, :].shape
Out: (4, 3)
This does exactly what I would expect.
Going back to the motivation for the NEP, I think this bit, emphasis
mine, is crucial:
>> the existing rules for advanced indexing with multiple array indices
>> are typically confusing to both new, **and in many cases even old,**
>> users of NumPy
I think it is ok for advanced indexing to be accessible to advanced
users. I remember that it took me quite a while to grok NumPy advanced
indexing, but once I did I just loved it.
I also like that this syntax translates perfectly from integer indices
to float coordinates in `ndimage.map_coordinates`.
> *I'll go on record as saying that array-likes should respond to `a[rr,
> cc]`, as in Juan's example, with the current behavior. And if they
> don't, they don't deserve to be operated on by skimage functions.**
(I don't think of us highly enough to use the word "deserve", but I
would say that we would hesitate to support arrays that don't use this
> *They didn't get a new feature; they just have to run faster to stay
> in the same place.**
It is also probably true, as mentioned elsewhere, that we could go
through our entire codebase and append `.vidx` to every array indexing
op. Perhaps others on this list find this a reasonable request, but I
don't. Aside from the churn involved, it would make our codebase
significantly uglier and less readable.
I should also emphasise that NumPy is really *the* foundational project
for the entire Scientific Python ecosystem. Changing the meaning of 
should only be considered if it delivers an *extreme* benefit. Robert's
statement would apply to a stupid number of projects.
> *Once we have some experience with them for a year or three, then
> let's talk about deprecating parts of the current behavior and make a
> new NEP then if we want to go that route.**
To Sebastian's comment:
> if we choose to not annoy you a little, we will
> have much less long term options which also includes such projects
> compatibility to new/current array-likes.
> So basically one point is: if we annoy scikit-image now, their code
> will work better for dask arrays in the future hopefully.
Let's get rid of the hopefully. Let NumPy implement .oindex and
.vindex. Let Dask arrays do the same. Let's have an announcement on
the scikit-image mailing list, "hey guys, if you switch all your
indexing operations to .vindex, suddenly all of your library works
with dask arrays!"
At that point, we have a value proposition on our hands. Currently, it
amounts to gambling with others' time.
To Stephan's options that were sent while I was composing this:
> Some options, in roughly descending order of severity:
I favour 4, or at the limit 3. (See use case above, which I would
argue is totally unsurprising.) I'm happy that option 1 appears to be
off the table.
> For libraries like Dask, XArray, pydata/sparse, XND, etc., it would be
> bad for them if there was continued use of “weird” indexing behaviour
> (no warnings means more code written that’s… well… not exactly the
> best design).
Again, I think libraries should support the simple/not unintuitive
vindex cases. This is not bad design.
> *We don't know which of those futures are going to be true.
> Anecdatally, you want .oindex semantics most often; I would almost
> exclusively use .vindex. I don't know which of us is more
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the NumPy-Discussion