[Numpy-discussion] Adding to the non-dispatched implementation of NumPy methods

Tue Apr 30 01:24:05 EDT 2019

On Mon, Apr 29, 2019 at 5:49 AM Marten van Kerkwijk <
m.h.vankerkwijk at gmail.com> wrote:

> The uses that I've seen so far (in CuPy and JAX), involve a handful of
>> functions that are directly re-exported from NumPy, e.g.,
>> jax.numpy.array_repr is the exact same object as numpy.array_repr:
>>
>> https://github.com/cupy/cupy/blob/c3f1be602bf6951b007beaae644a5662f910048b/cupy/__init__.py#L341-L366
>>
>> https://github.com/google/jax/blob/5edb23679f2605654949156da84e330205840695/jax/numpy/lax_numpy.py#L89-L132
>>
>>
>> I suspect this will be less common in the future if __array_function__
>> takes off, but for now it's convenient because users don't need to know
>> exactly which functions have been reimplemented. They can just use "import
>> jax.numpy as np" and everything works.
>>
>> These libraries are indeed passing CuPy or JAX arrays into NumPy
>> functions, which currently happen to have the desired behavior, thanks to
>> accidental details about how NumPy currently supports duck-typing and/or
>> coercions.
>>
>> To this end, it would be really nice to have an alias that *is*
>> guaranteed to work exactly as if __array_function__ didn't exist, and not
>> only for numpy.ndarray arrays.
>>
>
> Just to be clear: for this purpose, being able to call the implementation
> is still mostly a convenient crutch, correct? For classes that define
> __array_function__, would you expect more than the guarantee I wrote above,
> that the wrapped version will continue to work as advertised for ndarray
> input only?
>

I'm not sure I agree -- what would be the more principled alternative here?

Modules that emulate NumPy's public API for a new array type are both
pretty common (cupy, jax.numpy, autograd, dask.array, pydata/sparse, etc)
and also the best early candidates for adopting NEP-18, because they don't
need to do much extra work to write a __array_function__ method. I want to
make it as easy as possible for these early adopters, because their success
will make or break the entire __array_function__ protocol.

In the long term, I agree that the importance of these numpy-like
namespaces will diminish, because it will be possible to use the original
NumPy namespace instead. Possibly new projects will decide that they don't
need to bother with them at all. But there are still lots of plausible
reasons for keeping them around even for a project that implements
__array_function__, e.g.,
(a) to avoid the overhead of NumPy's dispatching
(b) to access functions like np.ones that return a different array type
(c) to make use of optional duck-array specific arguments, e.g., the
split_every argument to dask.array.sum()
(d) if they care about supporting versions of NumPy older than 1.17

In practice, I suspect we'll see these modules continue to exist for a long
time. And they really do rely upon the exact behavior of NumPy today,
whatever that happens to be (e.g., the undocumented fact that
np.result_type supports duck-typing with the .dtype attribute rather than
coercing arguments to NumPy arrays)..

In particular, suppose we change an implementation to use different other
> numpy functions inside (which are of course overridden using
> __array_function__). I could imagine situations where  that would work fine
> for everything that does not define __array_ufunc__, but where it would not
> for classes that do define it. Is that then a problem for numpy or for the
> project that has a class that defines __array_function__?
>

If we change an existing NumPy function to start calling ufuncs directly on
input arguments, rather than calling np.asarray() on its inputs, that will
already (potentially) be a breaking change. We lost the ability to these
sorts of refactors without breaking backwards compatibility when we added
__array_ufunc__. So I think it's already our problem, unless we're willing
to risk breaking __array_ufunc__ users.

That said, I doubt this would actually be a major issue in practice. The
projects for which __array_function__ makes the most sense are "full duck
arrays," and all these projects are going to implement __array_ufunc__,
too, in a mostly compatible way.

I'm a little puzzled by why you are concerned about retaining this
flexibility to reuse the attribute I'm asking for here for a function that
works differently. What I want is a special attribute that is guaranteed to
work like the public version of a NumPy function, but without checking for
an __array_function__ attribute.

If we later decide we want to expose an attribute that provides a
non-coercing function that calls ufuncs directly instead of np.asarray,
what do we lose by giving it a new name so users don't need to worry about
changed behavior? There is plenty of room for special attributes on NumPy
functions. We can have both np.something.__skip_array_overrides__ and
np.something.__array_implementation__.

So we might as well pick a name that works for both, e.g.,
>> __skip_array_overrides__ rather than __skip_array_function__. This would
>> let us save our users a bit of pain by not requiring them to make changes
>> like  np.where.__skip_array_function__ -> np.where.__skip_array_ufunc__.
>>
>
> Note that for ufuncs it is not currently possible to skip the override. I
> don't think it is super-hard to do it, but I'm not sure I see the need to
> add a crutch where none has been needed so far. More generally, it is not
> obvious there is any C code where skipping the override is useful, since
> the C code relies much more directly on inputs being ndarray.
>

To be entirely clear: I was thinking of
ufunc.method.__skip_array_overrides__() as "equivalent to ufunc.method()
except not checking for __array_ufunc__ attributes".

I think the use-cases would be for Python code that ufuncs, in much the
same way that there are use-cases for Python code that call other NumPy
functions, e.g.,
- np.sin.__skip_array_overrides__() could be a slightly faster than
np.sin(), because it avoids checking for __array_ufunc__ attributes.
- np.add.__skip_array_overrides__(x, y) is definitely going to be a faster
than np.add(np.asarray(x), np.asarray(y)), because it avoids the overhead
of two Python function calls.

The use cases here are certainly not as compelling as those for
__array_function__, because __array_ufunc__'s arguments are in a
standardized form, but I think there's still meaningful. Not to mention, we
can refactor np.ndarray.__array_ufunc__ to work exactly like
np.ndarray.__array_function__, eliminating the special case in NEP-13's
dispatch rules.

I agree that it wouldn't make sense to call the "generic duck-array
implementation" of a ufunc (these don't exist), but that wasn't what I was
proposing here.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20190429/6d670468/attachment.html>