[Numpy-discussion] NEP 21: Simplified and explicit advanced indexing

Tue Jun 26 19:32:21 EDT 2018

On Tue, Jun 26, 2018 at 3:50 AM Sebastian Berg <sebastian at sipsolutions.net>
wrote:
>
> On Tue, 2018-06-26 at 02:27 -0700, Robert Kern wrote:
> > On Tue, Jun 26, 2018 at 1:36 AM Sebastian Berg <sebastian at sipsolution
> > s.net> wrote:
> > > On Tue, 2018-06-26 at 01:21 -0700, Robert Kern wrote:
> > > > On Tue, Jun 26, 2018 at 12:58 AM Sebastian Berg
> > > > <sebastian at sipsolutions.net> wrote:
> > >
> > > <snip>
> > >
> > > > >
> > > > > Yes, that is true, but I doubt you will find a lot of code path
> > > > > that
> > > > > need the current indexing as opposed to vindex here,
> > > >
> > > > That's probably true! But I think it's besides the point. I'd
> > > wager
> > > > that most code paths that will use .vindex would work perfectly
> > > well
> > > > with current indexing, too. Most of the time, people aren't
> > > getting
> > > > into the hairy corners of advanced indexing.
> > > >
> > >
> > > Right, the proposal was to have DeprecationWarnings when they
> > > differ,
> > > now I also thought DeprecationWarnings on two advanced indexes in
> > > general is good, because it is good for new users.
> > > I have to agree with your argument that most of the confused should
> > > be
> > > running into broadcast errors (if they expect oindex vs. fancy). So
> > > I
> > > see this as a point that we likely should just limit ourselves at
> > > least
> > > for now to the cases for example with sudden transposing going on.
> > >
> > > However, I would like to point out that the reason for the more
> > > broad
> > > warnings is that it could allow warping normal indexing at some
> > > point.
> > >
> >
> > I don't really understand this. You would discourage the "normal"
> > syntax in favor of these more specific named syntaxes, so you can
> > introduce different behavior for the "normal" syntax and encourage
> > everyone to use it again? Just add more named syntaxes if you want
> > new behavior! That's the beauty of the design underlying this NEP.
> >
> > > Also it decreases traps with array-likes that behave differently.
> >
> > If we were to take this seriously, then no one should use a bare []
> > ever.
> >
> > I'll go on record as saying that array-likes should respond to `a[rr,
> > cc]`, as in Juan's example, with the current behavior. And if they
> > don't, they don't deserve to be operated on by skimage functions.
> >
> > If I'm reading the NEP correctly, the main thrust of the issue with
> > array-likes is that it is difficult for some of them to implement the
> > full spectrum of indexing possibilities. This NEP does not actually
> > make it *easier* for those array-likes to implement every
> > possibility. It just offers some APIs that more naturally express
> > common use cases which can sometimes be implemented more naturally
> > than if expressed in the current indexing. For instance, you can
> > achieve the same effect as orthogonal indexing with the current
> > implementation, but you have to manipulate the indices before you
> > pass them over to __getitem__(), losing information along the way
> > that could be used to make a more efficient lookup in some array-
> > likes.
> >
> > The NEP design is essentially more of a way to give these array-likes
> > standard places to raise NotImplementedError than it is to help them
> > get rid of all of their NotImplementedErrors. More specifically, if
> > these array-likes can't implement `a[rr, cc]`, they're not going to
> > implement `a.vindex[rr, cc]`, either.
> >
> > I think most of the problems that caused these libraries to make
> > different choices in their __getitem__() implementation are due to
> > the fact that these expressive APIs didn't exist, so they had to
> > shoehorn them into __getitem__(); orthogonal indexing was too useful
> > and efficient not to implement! I think that once we have .oindex and
> > .vindex out there, they will be able to clean up their __getitem__()s
> > to consistently support whatever of the current behavior that they
> > can and raise NotImplementedError where they can't.
> >
>
> Right, it helps mostly to be clear about what an object can and cannot
> do. So h5py or whatever could error out for plain indexing and only
> support `.oindex`, and we have all options cleanly available.
>
> And yes, I agree that in itself is a big step forward.

Okay, great. Before we move on to your next point, can we agree that the
array-likes aren't a motivating factor for deprecating the current behavior
of __getitem__()?

> The thing is there are also very strong opinions that the fancy
> indexing behaviour is so confusing that it would ideally not be the
> default since it breaks comparing analogy slice objects.
>
> So, personally, I would argue that if we were to start over from
> scratch, fancy indexing (multiple indexes), would not be the default
> plain indexing behaviour.
> Now, maybe the pain of a few warnings is too high, but if we wish to
> move, no matter how slowly, in such regard, we will have to swallow it
> eventually.
> The suggestion was to make that as easy as possible with adding an
> attribute indefinitely.
> Otherwise, even a possible numpy replacement might have difficulties to
> chose a different default for indexing for years to come...

So I think we've moved past the technical objections. In the post-NEP
.oindex/.vindex order, everyone can get the behavior that they want. Your
argument for deprecation is now just about what the default is, the
semantics that get pride of place with the shortest spelling. I am
sympathetic to the feeling like you wish you had a time machine to go fix a
design with your new insight. But it seems to me that just changing which
semantics are the default has relatively attenuated value while breaking
compatibility for a fundamental feature of numpy has significant costs.
Just introducing .oindex is the bulk of the value of this NEP. Everything
else is window dressing.

You have my sympathies, but not enough for me to consent to deprecation.
You might get more of my sympathy a year or two from now when the community
has had a chance to work with .oindex. It's entirely possible that everyone
will leap to using .oindex (and .vindex only rarely), and we will be
flooded with complaints that "I only use .oindex, but the name is so long
it messes up the readability of my lengthy expressions". But it's also
possible that it sort of fizzles: people use it, but maybe use .vindex
more, or about the same. Or just keep on happily using neither.

We don't know which of those futures are going to be true. Anecdatally, you
want .oindex semantics most often; I would almost exclusively use .vindex.
I don't know which of us is more representative. Probably neither.

I maintain that considering deprecation is premature at this time. Please
take it out of this NEP. Let us get a feel for how people actually use
.oindex/.vindex. Then we can talk about deprecation. This NEP gets my
enthusiastic approval, except for the deprecation. I will be happy to talk
about deprecation with an open mind in a few years. With some more actual
experience under our belt, rather than prediction and theory, we can be
more confident about the approach we want to take. Deprecation is not a
fundamental part of this NEP and can be decided independently at a later
time.

--
Robert Kern
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20180626/012418c9/attachment-0001.html>