We seem to have run out of steam a bit here.



On Tue, Apr 30, 2019 at 7:24 AM Stephan Hoyer <shoyer@gmail.com> wrote:
On Mon, Apr 29, 2019 at 5:49 AM Marten van Kerkwijk <m.h.vankerkwijk@gmail.com> wrote:
The uses that I've seen so far (in CuPy and JAX), involve a handful of functions that are directly re-exported from NumPy, e.g., jax.numpy.array_repr is the exact same object as numpy.array_repr:

I suspect this will be less common in the future if __array_function__ takes off, but for now it's convenient because users don't need to know exactly which functions have been reimplemented. They can just use "import jax.numpy as np" and everything works.

These libraries are indeed passing CuPy or JAX arrays into NumPy functions, which currently happen to have the desired behavior, thanks to accidental details about how NumPy currently supports duck-typing and/or coercions.

To this end, it would be really nice to have an alias that *is* guaranteed to work exactly as if __array_function__ didn't exist, and not only for numpy.ndarray arrays.

Just to be clear: for this purpose, being able to call the implementation is still mostly a convenient crutch, correct? For classes that define __array_function__, would you expect more than the guarantee I wrote above, that the wrapped version will continue to work as advertised for ndarray input only?

I'm not sure I agree -- what would be the more principled alternative here?

Modules that emulate NumPy's public API for a new array type are both pretty common (cupy, jax.numpy, autograd, dask.array, pydata/sparse, etc) and also the best early candidates for adopting NEP-18, because they don't need to do much extra work to write a __array_function__ method. I want to make it as easy as possible for these early adopters, because their success will make or break the entire __array_function__ protocol.

In the long term, I agree that the importance of these numpy-like namespaces will diminish, because it will be possible to use the original NumPy namespace instead. Possibly new projects will decide that they don't need to bother with them at all. But there are still lots of plausible reasons for keeping them around even for a project that implements __array_function__, e.g., 
(a) to avoid the overhead of NumPy's dispatching
(b) to access functions like np.ones that return a different array type
(c) to make use of optional duck-array specific arguments, e.g., the split_every argument to dask.array.sum()
(d) if they care about supporting versions of NumPy older than 1.17

In practice, I suspect we'll see these modules continue to exist for a long time. And they really do rely upon the exact behavior of NumPy today, whatever that happens to be (e.g., the undocumented fact that np.result_type supports duck-typing with the .dtype attribute rather than coercing arguments to NumPy arrays)..

In particular, suppose we change an implementation to use different other numpy functions inside (which are of course overridden using __array_function__). I could imagine situations where  that would work fine for everything that does not define __array_ufunc__, but where it would not for classes that do define it. Is that then a problem for numpy or for the project that has a class that defines __array_function__?

If we change an existing NumPy function to start calling ufuncs directly on input arguments, rather than calling np.asarray() on its inputs,

This wasn't really the question I believe. More like, if numpy function A now calls B under the hood, and we replace it with C (in a way that's fully backwards compatible for users of A), then will that be a problem in the future? I think that in practice this doesn't happen a lot, and is quite unlikely to be a problem.

that will already (potentially) be a breaking change. We lost the ability to these sorts of refactors without breaking backwards compatibility when we added __array_ufunc__. So I think it's already our problem, unless we're willing to risk breaking __array_ufunc__ users.

That said, I doubt this would actually be a major issue in practice. The projects for which __array_function__ makes the most sense are "full duck arrays," and all these projects are going to implement __array_ufunc__, too, in a mostly compatible way.

I'm a little puzzled by why you are concerned about retaining this flexibility to reuse the attribute I'm asking for here for a function that works differently. What I want is a special attribute that is guaranteed to work like the public version of a NumPy function, but without checking for an __array_function__ attribute.

If we later decide we want to expose an attribute that provides a non-coercing function that calls ufuncs directly instead of np.asarray, what do we lose by giving it a new name so users don't need to worry about changed behavior? There is plenty of room for special attributes on NumPy functions. We can have both np.something.__skip_array_overrides__ and np.something.__array_implementation__.

That's a good argument I think.

Ralf



So we might as well pick a name that works for both, e.g., __skip_array_overrides__ rather than __skip_array_function__. This would let us save our users a bit of pain by not requiring them to make changes like  np.where.__skip_array_function__ -> np.where.__skip_array_ufunc__.

Note that for ufuncs it is not currently possible to skip the override. I don't think it is super-hard to do it, but I'm not sure I see the need to add a crutch where none has been needed so far. More generally, it is not obvious there is any C code where skipping the override is useful, since the C code relies much more directly on inputs being ndarray.

To be entirely clear: I was thinking of ufunc.method.__skip_array_overrides__() as "equivalent to ufunc.method() except not checking for __array_ufunc__ attributes".

I think the use-cases would be for Python code that ufuncs, in much the same way that there are use-cases for Python code that call other NumPy functions, e.g.,
- np.sin.__skip_array_overrides__() could be a slightly faster than np.sin(), because it avoids checking for __array_ufunc__ attributes.
- np.add.__skip_array_overrides__(x, y) is definitely going to be a faster than np.add(np.asarray(x), np.asarray(y)), because it avoids the overhead of two Python function calls.

The use cases here are certainly not as compelling as those for __array_function__, because __array_ufunc__'s arguments are in a standardized form, but I think there's still meaningful. Not to mention, we can refactor np.ndarray.__array_ufunc__ to work exactly like np.ndarray.__array_function__, eliminating the special case in NEP-13's dispatch rules.

I agree that it wouldn't make sense to call the "generic duck-array implementation" of a ufunc (these don't exist), but that wasn't what I was proposing here.
_______________________________________________
NumPy-Discussion mailing list
NumPy-Discussion@python.org
https://mail.python.org/mailman/listinfo/numpy-discussion