[Numpy-discussion] NEP 31 — Context-local and global overrides of the NumPy API

Nathan nathan.goldbaum at gmail.com
Sun Sep 8 22:29:18 EDT 2019

On Sun, Sep 8, 2019 at 7:27 PM Nathaniel Smith <njs at pobox.com> wrote:

> On Sun, Sep 8, 2019 at 8:40 AM Ralf Gommers <ralf.gommers at gmail.com>
> wrote:
> >
> >
> >
> > On Sun, Sep 8, 2019 at 12:54 AM Nathaniel Smith <njs at pobox.com> wrote:
> >>
> >> On Fri, Sep 6, 2019 at 11:53 AM Ralf Gommers <ralf.gommers at gmail.com>
> wrote:
> >> > On Fri, Sep 6, 2019 at 12:53 AM Nathaniel Smith <njs at pobox.com>
> wrote:
> >> >> On Tue, Sep 3, 2019 at 2:04 AM Hameer Abbasi <
> einstein.edison at gmail.com> wrote:
> >> >> > The fact that we're having to design more and more protocols for a
> lot
> >> >> > of very similar things is, to me, an indicator that we do have
> holistic
> >> >> > problems that ought to be solved by a single protocol.
> >> >>
> >> >> But the reason we've had trouble designing these protocols is that
> >> >> they're each different :-). If it was just a matter of copying
> >> >> __array_ufunc__ we'd have been done in a few minutes...
> >> >
> >> > I don't think that argument is correct. That we now have two very
> similar protocols is simply a matter of history and limited developer time.
> NEP 18 discusses in several places that __array_ufunc__ should be brought
> in line with __array_ufunc__, and that we can migrate a function from one
> protocol to the other. There's no technical reason other than backwards
> compat and dev time why we couldn't use __array_function__ for ufuncs also.
> >>
> >> Huh, that's interesting! Apparently we have a profoundly different
> >> understanding of what we're doing here.
> >
> >
> > That is interesting indeed. We should figure this out first - no point
> discussing a NEP about plugging the gaps in our override system when we
> don't have a common understanding of why we wanted/needed an override
> system in the first place.
> >
> >> To me, __array_ufunc__ and
> >> __array_function__ are completely different. In fact I'd say
> >> __array_ufunc__ is a good idea and __array_function__ is a bad idea,
> >
> >
> > It's early days, but "customer feedback" certainly has been more
> enthusiastic for __array_function__. Also from what I've seen so far it
> works well. Example: at the SciPy sprints someone put together Xarray plus
> pydata/sparse to use distributed sparse arrays for visualizing some large
> genetic (I think) data sets. That was made to work in a single day, with
> impressively little code.
> Yeah, it's true, and __array_function__ made a bunch of stuff that
> used to be impossible become possible, I'm not saying it didn't. My
> prediction is that the longer we live with it, the more limits we'll
> hit and the more problems we'll have with long-term maintainability. I
> don't think initial enthusiasm is a good predictor of that either way.
> >> The key difference is that __array_ufunc__ allows for *generic*
> >> implementations.
> >
> > Implementations of what?
> Generic in the sense that you can write __array_ufunc__ once and have
> it work for all ufuncs.
> >> Most duck array libraries can write a single
> >> implementation of __array_ufunc__ that works for *all* ufuncs, even
> >> new third-party ufuncs that the duck array library has never heard of,
> >
> >
> > I see where you're going with this. You are thinking of reusing the
> ufunc implementation to do a computation. That's a minor use case (imho),
> and I can't remember seeing it used.
> I mean, I just looked at dask and xarray, and they're both doing
> exactly what I said, right now in shipping code. What use cases are
> you targeting here if you consider dask and xarray out-of-scope? :-)
> > this is case where knowing if something is a ufunc helps use a property
> of it. so there the more specialized nature of __array_ufunc__ helps. Seems
> niche though, and could probably also be done by checking if a function is
> an instance of np.ufunc via __array_function__
> Sparse arrays aren't very niche... and the isinstance trick is
> possible in some cases, but (a) it's relying on an undocumented
> implementation detail of __array_function__; according to
> __array_function__'s API contract, you could just as easily get passed
> the ufunc's __call__ method instead of the object itself, and (b) it
> doesn't work at all for ufunc methods like reduce, outer, accumulate.
> These are both show-stoppers IMO.
> > This last point, using third-party ufuncs, is the interesting one to me.
> They have to be generated with the NumPy ufunc machinery, so the dispatch
> mechanism is attached to them. We could do third party functions for
> __array_function__ too, but that would require making
> @array_function_dispatch public, which we haven't done (yet?).
> With __array_function__ it's theoretically possible to do the dispatch
> on third-party functions, but when someone defines a new function they
> always have to go update all the duck array libraries to hard-code in
> some special knowledge of their new function. So in my example, even
> if we made @array_function_dispatch public, you still couldn't use
> your nice new numba-created gufunc unless you first convinced dask,
> xarray, and bcolz to all accept patches to support your new gufunc.
> With __array_ufunc__, it works out-of-the-box.
> > But what is that road, and what do you think the goal is? To me it's:
> separate our API from our implementation. Yours seems to be "reuse our
> implementations" for __array_ufunc__, but I can't see how that generalizes
> beyond ufuncs.
> The road is to define *abstractions* for the operations we expose
> through our API, so that duck array implementors can work against a
> contract with well-defined preconditions and postconditions, so they
> can write code the works reliably even when the surrounding
> environment changes. That's the only way to keep things maintainable
> AFAICT. If the API contract is just a vague handwave at the numpy API,
> then no-one knows which details actually matter, it's impossible to
> test, implementations will inevitably end up with subtle long-standing
> bugs, and literally any change in numpy could potentially break duck
> array users, we don't know. So my motivation is that I like testing, I
> don't like bugs, and I like being able to maintain things with
> confidence :-). The principles are much more general than ufuncs;
> that's just a pertinent example.
> > I think this is an important point. GPUs are massively popular, and when
> very likely just continue to grow in importance. So anything we do in this
> space that says "well it works, just not for GPUs" is probably not going to
> solve our most pressing problems.
> I'm not saying "__array_ufunc__ doesn't work for GPUs". I'm saying
> that when it comes to GPUs, there's an upper bound for how good you
> can hope to do, and __array_ufunc__ achieves that upper bound. So does
> __array_function__. So if we only care about GPUs, they're about
> equally good. But if we also care about dask and xarray and compressed
> storage and sparse storage and ... then __array_ufunc__ is strictly
> superior in those cases. So replacing __array_ufunc__ with
> __array_function__ would be a major backwards step.

One case that hasn’t been brought up in this thread is unit-handling. For
example, unyt’s array_ufunc implementation explicitly handles ufuncs and
will bail if someone tries to use a ufunc that unyt doesn’t know about. I
tried to implement a completely generic solution but ended up concluding I
couldn’t do that without silently generating answers with incorrect units.

I definitely agree with your analysis that this sort of implementation is
error-prone, in fact we just had to do a bugfix release to fix clip
suddenly not working now that it’s a ufunc in numpy 1.17.

