On Sun, Sep 8, 2019 at 12:54 AM Nathaniel Smith <njs@pobox.com> wrote:

On Fri, Sep 6, 2019 at 11:53 AM Ralf Gommers <ralf.gommers@gmail.com> wrote:
> On Fri, Sep 6, 2019 at 12:53 AM Nathaniel Smith <njs@pobox.com> wrote:
>> On Tue, Sep 3, 2019 at 2:04 AM Hameer Abbasi <einstein.edison@gmail.com> wrote:
>> > The fact that we're having to design more and more protocols for a lot
>> > of very similar things is, to me, an indicator that we do have holistic
>> > problems that ought to be solved by a single protocol.
>>
>> But the reason we've had trouble designing these protocols is that
>> they're each different :-). If it was just a matter of copying
>> __array_ufunc__ we'd have been done in a few minutes...
>
> I don't think that argument is correct. That we now have two very similar protocols is simply a matter of history and limited developer time. NEP 18 discusses in several places that __array_ufunc__ should be brought in line with __array_ufunc__, and that we can migrate a function from one protocol to the other. There's no technical reason other than backwards compat and dev time why we couldn't use __array_function__ for ufuncs also.

Huh, that's interesting! Apparently we have a profoundly different
understanding of what we're doing here.

That is interesting indeed. We should figure this out first - no point discussing a NEP about plugging the gaps in our override system when we don't have a common understanding of why we wanted/needed an override system in the first place.

To me, __array_ufunc__ and
__array_function__ are completely different. In fact I'd say
__array_ufunc__ is a good idea and __array_function__ is a bad idea,

It's early days, but "customer feedback" certainly has been more enthusiastic for __array_function__. Also from what I've seen so far it works well. Example: at the SciPy sprints someone put together Xarray plus pydata/sparse to use distributed sparse arrays for visualizing some large genetic (I think) data sets. That was made to work in a single day, with impressively little code.

and would definitely not be in favor of combining them together.

I'm not saying we should. But __array_ufunc__ is basically a slight specialization - knowing that the function that was called is a ufunc can be handy but is usually irrelevant.

The key difference is that __array_ufunc__ allows for *generic*
implementations.

Implementations of what?

Most duck array libraries can write a single
implementation of __array_ufunc__ that works for *all* ufuncs, even
new third-party ufuncs that the duck array library has never heard of,

I see where you're going with this. You are thinking of reusing the ufunc implementation to do a computation. That's a minor use case (imho), and I can't remember seeing it used.

The original use case was scipy.sparse matrices. The executive summary of NEP 13 talks about this. It's about calling `np.some_ufunc(other_ndarray_like)` and "handing over control" to that object rather than the numpy function starting to execute. Also note that NEP 13 states in the summary "This covers some of the same ground as Travis Oliphant’s proposal to retro-fit NumPy with multi-methods" (reminds one of uarray....).

For scipy.sparse, the layout of the data doesn't make sense to numpy. All that was desired was that the sparse matrix needs to know what function was called, so it can call its own implementation of that function instead.

because ufuncs all share the same structure of a loop wrapped around a
core operation, and they can treat the core operation as a black box.
For example:

- Dask can split up the operation across its tiled sub-arrays, and
then for each tile it invokes the core operation.

Works for __array_function__ too. Note, *not* by explicitly reusing the numpy function. Dask anyway has its own functions that mirror the numpy API. Dask's __array_function__ just does the forwarding to its own functions.

Also, a Dask array could be a collection of CuPy arrays, and CuPy implements __array_ufunc__. So explicitly reusing the NumPy ufunc implementation on whatever comes in would be, well, not so nice.

- xarray can do its label-based axis matching, and then invoke the
core operation.

Could do this with __array_function__ too

- bcolz can loop over the array uncompressing one block at a time,
invoking the core operation on each.

not sure about this one

- sparse arrays can check the ufunc .identity attribute

this is case where knowing if something is a ufunc helps use a property of it. so there the more specialized nature of __array_ufunc__ helps. Seems niche though, and could probably also be done by checking if a function is an instance of np.ufunc via __array_function__

to find out
whether 0 is an identity, and if so invoke the operation directly on
the non-zero entries; otherwise, it can loop over the array and
densify it in blocks and invoke the core operation on each. (It would
be useful to have a bit more metadata on the ufunc, so e.g.
np.subtract could declare that zero is a right-identity but not a
left-identity, but that's a simple enough extension to make at some
point.)

Result: __array_ufunc__ makes it totally possible to take a ufunc from
scipy.special or a random new on created with numba, and have it
immediately work on an xarray wrapped around dask wrapped around
bcolz, out-of-the-box. That's a clean, generic interface. [1]

This last point, using third-party ufuncs, is the interesting one to me. They have to be generated with the NumPy ufunc machinery, so the dispatch mechanism is attached to them. We could do third party functions for __array_function__ too, but that would require making @array_function_dispatch public, which we haven't done (yet?).

OTOH, __array_function__ doesn't allow this kind of simplification: if
we were using __array_function__ for ufuncs, every library would have
to special-case every individual ufunc, which leads to dramatically
more work and more potential for bugs.

This all assumes that "reusing the ufunc's implementation" is the one thing that matters. To me that's a small side benefit, which we haven't seen a whole lot of use of in the 2+ years that __array_ufunc__ was available. I think that what (for example) CuPy does - use __array_ufunc__ to simply take over control, is both the major use case and the original motivation for introducing the protocol.

To me, the whole point of interfaces is to reduce coupling. When you
have N interacting modules, it's unmaintainable if every change
requires considering every N! combination. From this perspective,
__array_function__ isn't good, but it is still somewhat constrained:
the result of each operation is still determined by the objects
involved, nothing else. In this regard, uarray even more extreme than
__array_function__, because arbitrary operations can be arbitrarily
changed by arbitrarily distant code. It sort of feels like the
argument for uarray is: well, designing maintainable interfaces is a
lot of work, so forget it, let's just make it easy to monkeypatch
everything and call it a day.

That said, in my replies in this thread I've been trying to stay
productive and focus on narrower concrete issues. I'm pretty sure that
__array_function__ and uarray will turn out to be bad ideas and will
fail, but that's not a proven fact, it's just an informed guess. And
the road that I favor also has lots of risks and uncertainty.

But what is that road, and what do you think the goal is? To me it's: separate our API from our implementation. Yours seems to be "reuse our implementations" for __array_ufunc__, but I can't see how that generalizes beyond ufuncs.

So I don't have a problem with trying both as experiments and learning
more! But hopefully that explains why it's not at all obvious that
uarray solves the protocol design problems we've been talking about.

-n

[1] There are also some cases that __array_ufunc__ doesn't handle as
nicely. One obvious one is that GPU/TPU libraries still need to
special-case individual ufuncs. But that's not a limitation of
__array_ufunc__, it's a limitation of GPUs

I think this is an important point. GPUs are massively popular, and when very likely just continue to grow in importance. So anything we do in this space that says "well it works, just not for GPUs" is probably not going to solve our most pressing problems.

– they can't run CPU code,
so they can't use the CPU implementation of the core operations.
Another limitation is that __array_ufunc__ is weak at handling
operations that involve mixed libraries (e.g. np.add(bcolz_array,
sparse_array)) – to work well, this might require that bcolz have
special-case handling for sparse arrays, or vice-versa, so you still
potentially have some N**2 special cases, though at least here N is
the number of duck array libraries, not the number of ufuncs. I think
this is an interesting target for future work. But in general,
__array_ufunc__ goes a long way to taming the complexity of
interacting libraries and ufuncs.

With *only* ufuncs you can't create that many interesting applications, you need the other functions too......

Cheers,

Ralf