[Numpy-discussion] NEP 31 — Context-local and global overrides of the NumPy API

Hameer Abbasi einstein.edison at gmail.com
Sun Sep 8 04:03:57 EDT 2019


On 08.09.19 09:53, Nathaniel Smith wrote:
> On Fri, Sep 6, 2019 at 11:53 AM Ralf Gommers <ralf.gommers at gmail.com> 
> wrote:
>> On Fri, Sep 6, 2019 at 12:53 AM Nathaniel Smith <njs at pobox.com> wrote:
>>> On Tue, Sep 3, 2019 at 2:04 AM Hameer Abbasi 
>>> <einstein.edison at gmail.com> wrote:
>>>> The fact that we're having to design more and more protocols for a lot
>>>> of very similar things is, to me, an indicator that we do have 
>>>> holistic
>>>> problems that ought to be solved by a single protocol.
>>> But the reason we've had trouble designing these protocols is that
>>> they're each different . If it was just a matter of copying
>>> __array_ufunc__ we'd have been done in a few minutes...
>> I don't think that argument is correct. That we now have two very 
>> similar protocols is simply a matter of history and limited developer 
>> time. NEP 18 discusses in several places that __array_ufunc__ should 
>> be brought in line with __array_ufunc__, and that we can migrate a 
>> function from one protocol to the other. There's no technical reason 
>> other than backwards compat and dev time why we couldn't use 
>> __array_function__ for ufuncs also.
> Huh, that's interesting! Apparently we have a profoundly different
> understanding of what we're doing here. To me, __array_ufunc__ and
> __array_function__ are completely different. In fact I'd say
> __array_ufunc__ is a good idea and __array_function__ is a bad idea,
> and would definitely not be in favor of combining them together.
>
> The key difference is that __array_ufunc__ allows for *generic*
> implementations. Most duck array libraries can write a single
> implementation of __array_ufunc__ that works for *all* ufuncs, even
> new third-party ufuncs that the duck array library has never heard of,
> because ufuncs all share the same structure of a loop wrapped around a
> core operation, and they can treat the core operation as a black box.
> For example:
>
> - Dask can split up the operation across its tiled sub-arrays, and
> then for each tile it invokes the core operation.
> - xarray can do its label-based axis matching, and then invoke the
> core operation.
> - bcolz can loop over the array uncompressing one block at a time,
> invoking the core operation on each.
> - sparse arrays can check the ufunc .identity attribute to find out
> whether 0 is an identity, and if so invoke the operation directly on
> the non-zero entries; otherwise, it can loop over the array and
> densify it in blocks and invoke the core operation on each. (It would
> be useful to have a bit more metadata on the ufunc, so e.g.
> np.subtract could declare that zero is a right-identity but not a
> left-identity, but that's a simple enough extension to make at some
> point.)
>
> Result: __array_ufunc__ makes it totally possible to take a ufunc from
> scipy.special or a random new on created with numba, and have it
> immediately work on an xarray wrapped around dask wrapped around
> bcolz, out-of-the-box. That's a clean, generic interface. [1]
>
> OTOH, __array_function__ doesn't allow this kind of simplification: if
> we were using __array_function__ for ufuncs, every library would have
> to special-case every individual ufunc, which leads to dramatically
> more work and more potential for bugs.

But uarray does allow this kind of simplification. You would do the 
following inside a uarray backend:

def __ua_function__(func, args, kwargs):
     with ua.skip_backend(self_backend):
         # Do code here, dispatches to everything but

This is possible today and is done in the dask backend inside unumpy for 
example.

>
> To me, the whole point of interfaces is to reduce coupling. When you
> have N interacting modules, it's unmaintainable if every change
> requires considering every N! combination. From this perspective,
> __array_function__ isn't good, but it is still somewhat constrained:
> the result of each operation is still determined by the objects
> involved, nothing else. In this regard, uarray even more extreme than
> __array_function__, because arbitrary operations can be arbitrarily
> changed by arbitrarily distant code. It sort of feels like the
> argument for uarray is: well, designing maintainable interfaces is a
> lot of work, so forget it, let's just make it easy to monkeypatch
> everything and call it a day.
>
> That said, in my replies in this thread I've been trying to stay
> productive and focus on narrower concrete issues. I'm pretty sure that
> __array_function__ and uarray will turn out to be bad ideas and will
> fail, but that's not a proven fact, it's just an informed guess. And
> the road that I favor also has lots of risks and uncertainty. So I
> don't have a problem with trying both as experiments and learning
> more! But hopefully that explains why it's not at all obvious that
> uarray solves the protocol design problems we've been talking about.
>
> -n
>
> [1] There are also some cases that __array_ufunc__ doesn't handle as
> nicely. One obvious one is that GPU/TPU libraries still need to
> special-case individual ufuncs. But that's not a limitation of
> __array_ufunc__, it's a limitation of GPUs – they can't run CPU code,
> so they can't use the CPU implementation of the core operations.
> Another limitation is that __array_ufunc__ is weak at handling
> operations that involve mixed libraries (e.g. np.add(bcolz_array,
> sparse_array)) – to work well, this might require that bcolz have
> special-case handling for sparse arrays, or vice-versa, so you still
> potentially have some N**2 special cases, though at least here N is
> the number of duck array libraries, not the number of ufuncs. I think
> this is an interesting target for future work. But in general,
> __array_ufunc__ goes a long way to taming the complexity of
> interacting libraries and ufuncs.
>
> -- 
> Nathaniel J. Smith -- https://vorpus.org
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at python.org
> https://mail.python.org/mailman/listinfo/numpy-discussion


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20190908/169d8682/attachment-0001.html>


More information about the NumPy-Discussion mailing list