Re: [Numpy-discussion] NEP 31 — Context-local and global overrides of the NumPy API

Sept. 8, 2019

      On Fri, Sep 6, 2019 at 11:53 AM Ralf Gommers <ralf.gommers@gmail.com> wrote:
...
On Fri, Sep 6, 2019 at 12:53 AM Nathaniel Smith <njs@pobox.com> wrote:
...
On Tue, Sep 3, 2019 at 2:04 AM Hameer Abbasi <einstein.edison@gmail.com> wrote:
...
The fact that we're having to design more and more protocols for a lot
of very similar things is, to me, an indicator that we do have holistic
problems that ought to be solved by a single protocol.
But the reason we've had trouble designing these protocols is that
they're each different :-). If it was just a matter of copying
__array_ufunc__ we'd have been done in a few minutes...
I don't think that argument is correct. That we now have two very similar protocols is simply a matter of history and limited developer time. NEP 18 discusses in several places that __array_ufunc__ should be brought in line with __array_ufunc__, and that we can migrate a function from one protocol to the other. There's no technical reason other than backwards compat and dev time why we couldn't use __array_function__ for ufuncs also.
Huh, that's interesting! Apparently we have a profoundly different
understanding of what we're doing here. To me, __array_ufunc__ and
__array_function__ are completely different. In fact I'd say
__array_ufunc__ is a good idea and __array_function__ is a bad idea,
and would definitely not be in favor of combining them together.

The key difference is that __array_ufunc__ allows for *generic*
implementations. Most duck array libraries can write a single
implementation of __array_ufunc__ that works for *all* ufuncs, even
new third-party ufuncs that the duck array library has never heard of,
because ufuncs all share the same structure of a loop wrapped around a
core operation, and they can treat the core operation as a black box.
For example:

- Dask can split up the operation across its tiled sub-arrays, and
then for each tile it invokes the core operation.
- xarray can do its label-based axis matching, and then invoke the
core operation.
- bcolz can loop over the array uncompressing one block at a time,
invoking the core operation on each.
- sparse arrays can check the ufunc .identity attribute to find out
whether 0 is an identity, and if so invoke the operation directly on
the non-zero entries; otherwise, it can loop over the array and
densify it in blocks and invoke the core operation on each. (It would
be useful to have a bit more metadata on the ufunc, so e.g.
np.subtract could declare that zero is a right-identity but not a
left-identity, but that's a simple enough extension to make at some
point.)

Result: __array_ufunc__ makes it totally possible to take a ufunc from
scipy.special or a random new on created with numba, and have it
immediately work on an xarray wrapped around dask wrapped around
bcolz, out-of-the-box. That's a clean, generic interface. [1]

OTOH, __array_function__ doesn't allow this kind of simplification: if
we were using __array_function__ for ufuncs, every library would have
to special-case every individual ufunc, which leads to dramatically
more work and more potential for bugs.

To me, the whole point of interfaces is to reduce coupling. When you
have N interacting modules, it's unmaintainable if every change
requires considering every N! combination. From this perspective,
__array_function__ isn't good, but it is still somewhat constrained:
the result of each operation is still determined by the objects
involved, nothing else. In this regard, uarray even more extreme than
__array_function__, because arbitrary operations can be arbitrarily
changed by arbitrarily distant code. It sort of feels like the
argument for uarray is: well, designing maintainable interfaces is a
lot of work, so forget it, let's just make it easy to monkeypatch
everything and call it a day.

That said, in my replies in this thread I've been trying to stay
productive and focus on narrower concrete issues. I'm pretty sure that
__array_function__ and uarray will turn out to be bad ideas and will
fail, but that's not a proven fact, it's just an informed guess. And
the road that I favor also has lots of risks and uncertainty. So I
don't have a problem with trying both as experiments and learning
more! But hopefully that explains why it's not at all obvious that
uarray solves the protocol design problems we've been talking about.

-n

[1] There are also some cases that __array_ufunc__ doesn't handle as
nicely. One obvious one is that GPU/TPU libraries still need to
special-case individual ufuncs. But that's not a limitation of
__array_ufunc__, it's a limitation of GPUs – they can't run CPU code,
so they can't use the CPU implementation of the core operations.
Another limitation is that __array_ufunc__ is weak at handling
operations that involve mixed libraries (e.g. np.add(bcolz_array,
sparse_array)) – to work well, this might require that bcolz have
special-case handling for sparse arrays, or vice-versa, so you still
potentially have some N**2 special cases, though at least here N is
the number of duck array libraries, not the number of ufuncs. I think
this is an interesting target for future work. But in general,
__array_ufunc__ goes a long way to taming the complexity of
interacting libraries and ufuncs.

--
Nathaniel J. Smith -- https://vorpus.org