[Numpy-discussion] NEP 31 — Context-local and global overrides of the NumPy API
njs at pobox.com
Sun Sep 8 03:53:43 EDT 2019
On Fri, Sep 6, 2019 at 11:53 AM Ralf Gommers <ralf.gommers at gmail.com> wrote:
> On Fri, Sep 6, 2019 at 12:53 AM Nathaniel Smith <njs at pobox.com> wrote:
>> On Tue, Sep 3, 2019 at 2:04 AM Hameer Abbasi <einstein.edison at gmail.com> wrote:
>> > The fact that we're having to design more and more protocols for a lot
>> > of very similar things is, to me, an indicator that we do have holistic
>> > problems that ought to be solved by a single protocol.
>> But the reason we've had trouble designing these protocols is that
>> they're each different :-). If it was just a matter of copying
>> __array_ufunc__ we'd have been done in a few minutes...
> I don't think that argument is correct. That we now have two very similar protocols is simply a matter of history and limited developer time. NEP 18 discusses in several places that __array_ufunc__ should be brought in line with __array_ufunc__, and that we can migrate a function from one protocol to the other. There's no technical reason other than backwards compat and dev time why we couldn't use __array_function__ for ufuncs also.
Huh, that's interesting! Apparently we have a profoundly different
understanding of what we're doing here. To me, __array_ufunc__ and
__array_function__ are completely different. In fact I'd say
__array_ufunc__ is a good idea and __array_function__ is a bad idea,
and would definitely not be in favor of combining them together.
The key difference is that __array_ufunc__ allows for *generic*
implementations. Most duck array libraries can write a single
implementation of __array_ufunc__ that works for *all* ufuncs, even
new third-party ufuncs that the duck array library has never heard of,
because ufuncs all share the same structure of a loop wrapped around a
core operation, and they can treat the core operation as a black box.
- Dask can split up the operation across its tiled sub-arrays, and
then for each tile it invokes the core operation.
- xarray can do its label-based axis matching, and then invoke the
- bcolz can loop over the array uncompressing one block at a time,
invoking the core operation on each.
- sparse arrays can check the ufunc .identity attribute to find out
whether 0 is an identity, and if so invoke the operation directly on
the non-zero entries; otherwise, it can loop over the array and
densify it in blocks and invoke the core operation on each. (It would
be useful to have a bit more metadata on the ufunc, so e.g.
np.subtract could declare that zero is a right-identity but not a
left-identity, but that's a simple enough extension to make at some
Result: __array_ufunc__ makes it totally possible to take a ufunc from
scipy.special or a random new on created with numba, and have it
immediately work on an xarray wrapped around dask wrapped around
bcolz, out-of-the-box. That's a clean, generic interface. 
OTOH, __array_function__ doesn't allow this kind of simplification: if
we were using __array_function__ for ufuncs, every library would have
to special-case every individual ufunc, which leads to dramatically
more work and more potential for bugs.
To me, the whole point of interfaces is to reduce coupling. When you
have N interacting modules, it's unmaintainable if every change
requires considering every N! combination. From this perspective,
__array_function__ isn't good, but it is still somewhat constrained:
the result of each operation is still determined by the objects
involved, nothing else. In this regard, uarray even more extreme than
__array_function__, because arbitrary operations can be arbitrarily
changed by arbitrarily distant code. It sort of feels like the
argument for uarray is: well, designing maintainable interfaces is a
lot of work, so forget it, let's just make it easy to monkeypatch
everything and call it a day.
That said, in my replies in this thread I've been trying to stay
productive and focus on narrower concrete issues. I'm pretty sure that
__array_function__ and uarray will turn out to be bad ideas and will
fail, but that's not a proven fact, it's just an informed guess. And
the road that I favor also has lots of risks and uncertainty. So I
don't have a problem with trying both as experiments and learning
more! But hopefully that explains why it's not at all obvious that
uarray solves the protocol design problems we've been talking about.
 There are also some cases that __array_ufunc__ doesn't handle as
nicely. One obvious one is that GPU/TPU libraries still need to
special-case individual ufuncs. But that's not a limitation of
__array_ufunc__, it's a limitation of GPUs – they can't run CPU code,
so they can't use the CPU implementation of the core operations.
Another limitation is that __array_ufunc__ is weak at handling
operations that involve mixed libraries (e.g. np.add(bcolz_array,
sparse_array)) – to work well, this might require that bcolz have
special-case handling for sparse arrays, or vice-versa, so you still
potentially have some N**2 special cases, though at least here N is
the number of duck array libraries, not the number of ufuncs. I think
this is an interesting target for future work. But in general,
__array_ufunc__ goes a long way to taming the complexity of
interacting libraries and ufuncs.
Nathaniel J. Smith -- https://vorpus.org
More information about the NumPy-Discussion