[Numpy-discussion] Adding to the non-dispatched implementation of NumPy methods

Ralf Gommers ralf.gommers at gmail.com
Sat May 4 15:29:06 EDT 2019


We seem to have run out of steam a bit here.



On Tue, Apr 30, 2019 at 7:24 AM Stephan Hoyer <shoyer at gmail.com> wrote:

> On Mon, Apr 29, 2019 at 5:49 AM Marten van Kerkwijk <
> m.h.vankerkwijk at gmail.com> wrote:
>
>> The uses that I've seen so far (in CuPy and JAX), involve a handful of
>>> functions that are directly re-exported from NumPy, e.g.,
>>> jax.numpy.array_repr is the exact same object as numpy.array_repr:
>>>
>>> https://github.com/cupy/cupy/blob/c3f1be602bf6951b007beaae644a5662f910048b/cupy/__init__.py#L341-L366
>>>
>>> https://github.com/google/jax/blob/5edb23679f2605654949156da84e330205840695/jax/numpy/lax_numpy.py#L89-L132
>>>
>>>
>>> I suspect this will be less common in the future if __array_function__
>>> takes off, but for now it's convenient because users don't need to know
>>> exactly which functions have been reimplemented. They can just use "import
>>> jax.numpy as np" and everything works.
>>>
>>> These libraries are indeed passing CuPy or JAX arrays into NumPy
>>> functions, which currently happen to have the desired behavior, thanks to
>>> accidental details about how NumPy currently supports duck-typing and/or
>>> coercions.
>>>
>>> To this end, it would be really nice to have an alias that *is*
>>> guaranteed to work exactly as if __array_function__ didn't exist, and not
>>> only for numpy.ndarray arrays.
>>>
>>
>> Just to be clear: for this purpose, being able to call the implementation
>> is still mostly a convenient crutch, correct? For classes that define
>> __array_function__, would you expect more than the guarantee I wrote above,
>> that the wrapped version will continue to work as advertised for ndarray
>> input only?
>>
>
> I'm not sure I agree -- what would be the more principled alternative here?
>
> Modules that emulate NumPy's public API for a new array type are both
> pretty common (cupy, jax.numpy, autograd, dask.array, pydata/sparse, etc)
> and also the best early candidates for adopting NEP-18, because they don't
> need to do much extra work to write a __array_function__ method. I want to
> make it as easy as possible for these early adopters, because their success
> will make or break the entire __array_function__ protocol.
>
> In the long term, I agree that the importance of these numpy-like
> namespaces will diminish, because it will be possible to use the original
> NumPy namespace instead. Possibly new projects will decide that they don't
> need to bother with them at all. But there are still lots of plausible
> reasons for keeping them around even for a project that implements
> __array_function__, e.g.,
> (a) to avoid the overhead of NumPy's dispatching
> (b) to access functions like np.ones that return a different array type
> (c) to make use of optional duck-array specific arguments, e.g., the
> split_every argument to dask.array.sum()
> (d) if they care about supporting versions of NumPy older than 1.17
>
> In practice, I suspect we'll see these modules continue to exist for a
> long time. And they really do rely upon the exact behavior of NumPy today,
> whatever that happens to be (e.g., the undocumented fact that
> np.result_type supports duck-typing with the .dtype attribute rather than
> coercing arguments to NumPy arrays)..
>
> In particular, suppose we change an implementation to use different other
>> numpy functions inside (which are of course overridden using
>> __array_function__). I could imagine situations where  that would work fine
>> for everything that does not define __array_ufunc__, but where it would not
>> for classes that do define it. Is that then a problem for numpy or for the
>> project that has a class that defines __array_function__?
>>
>
> If we change an existing NumPy function to start calling ufuncs directly
> on input arguments, rather than calling np.asarray() on its inputs,
>

This wasn't really the question I believe. More like, if numpy function A
now calls B under the hood, and we replace it with C (in a way that's fully
backwards compatible for users of A), then will that be a problem in the
future? I think that in practice this doesn't happen a lot, and is quite
unlikely to be a problem.

that will already (potentially) be a breaking change. We lost the ability
> to these sorts of refactors without breaking backwards compatibility when
> we added __array_ufunc__. So I think it's already our problem, unless we're
> willing to risk breaking __array_ufunc__ users.
>
> That said, I doubt this would actually be a major issue in practice. The
> projects for which __array_function__ makes the most sense are "full duck
> arrays," and all these projects are going to implement __array_ufunc__,
> too, in a mostly compatible way.
>
> I'm a little puzzled by why you are concerned about retaining this
> flexibility to reuse the attribute I'm asking for here for a function that
> works differently. What I want is a special attribute that is guaranteed to
> work like the public version of a NumPy function, but without checking for
> an __array_function__ attribute.
>
> If we later decide we want to expose an attribute that provides a
> non-coercing function that calls ufuncs directly instead of np.asarray,
> what do we lose by giving it a new name so users don't need to worry about
> changed behavior? There is plenty of room for special attributes on NumPy
> functions. We can have both np.something.__skip_array_overrides__ and
> np.something.__array_implementation__.
>

That's a good argument I think.

Ralf



> So we might as well pick a name that works for both, e.g.,
>>> __skip_array_overrides__ rather than __skip_array_function__. This would
>>> let us save our users a bit of pain by not requiring them to make changes
>>> like  np.where.__skip_array_function__ -> np.where.__skip_array_ufunc__.
>>>
>>
>> Note that for ufuncs it is not currently possible to skip the override. I
>> don't think it is super-hard to do it, but I'm not sure I see the need to
>> add a crutch where none has been needed so far. More generally, it is not
>> obvious there is any C code where skipping the override is useful, since
>> the C code relies much more directly on inputs being ndarray.
>>
>
> To be entirely clear: I was thinking of
> ufunc.method.__skip_array_overrides__() as "equivalent to ufunc.method()
> except not checking for __array_ufunc__ attributes".
>
> I think the use-cases would be for Python code that ufuncs, in much the
> same way that there are use-cases for Python code that call other NumPy
> functions, e.g.,
> - np.sin.__skip_array_overrides__() could be a slightly faster than
> np.sin(), because it avoids checking for __array_ufunc__ attributes.
> - np.add.__skip_array_overrides__(x, y) is definitely going to be a faster
> than np.add(np.asarray(x), np.asarray(y)), because it avoids the overhead
> of two Python function calls.
>
> The use cases here are certainly not as compelling as those for
> __array_function__, because __array_ufunc__'s arguments are in a
> standardized form, but I think there's still meaningful. Not to mention, we
> can refactor np.ndarray.__array_ufunc__ to work exactly like
> np.ndarray.__array_function__, eliminating the special case in NEP-13's
> dispatch rules.
>
> I agree that it wouldn't make sense to call the "generic duck-array
> implementation" of a ufunc (these don't exist), but that wasn't what I was
> proposing here.
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at python.org
> https://mail.python.org/mailman/listinfo/numpy-discussion
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20190504/e2d8de53/attachment-0001.html>


More information about the NumPy-Discussion mailing list