On Sun, Feb 23, 2020 at 3:59 PM Ralf Gommers <ralf.gommers@gmail.com> wrote:


On Sun, Feb 23, 2020 at 3:31 PM Stephan Hoyer <shoyer@gmail.com> wrote:
On Thu, Feb 6, 2020 at 12:20 PM Sebastian Berg <sebastian@sipsolutions.net> wrote:

Another thing about backward compatibility: What is our vision there
actually?
This NEP will *not* give the *end user* the option to opt-in! Here,
opt-in is really reserved to the *library user* (e.g. sklearn). (I did
not realize this clearly before)

Thinking about that for a bit now, that seems like the right choice.
But it also means that the library requires an easy way of giving a
FutureWarning, to notify the end-user of the upcoming change. The end-
user will easily be able to convert to a NumPy array to keep the old
behaviour.
Once this warning is given (maybe during `get_array_module()`, the
array module object/context would preferably be passed around,
hopefully even between libraries. That provides a reasonable way to
opt-in to the new behaviour without a warning (mainly for library
users, end-users can silence the warning if they wish so).

I don't think NumPy needs to do anything about warnings. It is straightforward for libraries that want to use use get_array_module() to issue their own warnings before calling get_array_module(), if desired.

Or alternatively, if a library is about to add a new __array_module__ method, it is straightforward to issue a warning inside the new __array_module__ method before returning the NumPy functions. 

I don't think this is quite enough. Sebastian points out a fairly important issue. One of the main rationales for the whole NEP, and the argument in multiple places (https://numpy.org/neps/nep-0037-array-module.html#opt-in-vs-opt-out-for-users) is that it's now opt-in while __array_function__ was opt-out. This isn't really true - the problem is simply *moved*, from the duck array libraries to the array-consuming libraries. The end user will still see the backwards incompatible change, with no way to turn it off. It will be easier with __array_module__ to warn users, but this should be expanded on in the NEP.

Ralf, thanks for sharing your thoughts.

I'm not quite I understand the concerns about backwards incompatibility:
1. The intention is that implementing a __array_module__ method should be backwards compatible with all current uses of NumPy. This satisfies backwards compatibility concerns for an array-implementing library like JAX.
2. In contrast, calling get_array_module() offers no guarantees about backwards compatibility. This seems nearly impossible, because the entire point of the protocol is to make it possible to opt-in to new behavior. So backwards compatibility isn't solved for Scikit-Learn switching to use get_array_module(), and after Scikit-Learn does so, adding __array_module__ to new types of arrays could potentially have backwards incompatible consequences for Scikit-Learn (unless sklearn uses default=None).

Are you suggesting just adding something like what I'm writing here into the NEP? Perhaps along with advice to consider issuing warnings inside __array_module__  and falling back to legacy behavior when first implementing it on a new type?

We could also potentially make a few changes to make backwards compatibility even easier, by making the protocol less aggressive about assuming that NumPy is a safe fallback. Some non-exclusive options:
a. We could switch the default value of "default" on get_array_module() to None, so an exception is raised if nothing implements __array_module__.
b. We could includes *all* argument types in "types", not just types that implement __array_module__. NumPy's ndarray.__array_module__ could then recognize and refuse to return an implementation if there are other arguments that might implement __array_module__ in the future (e.g., anything outside the standard library?).

The downside of making either of these choices is that it would potentially make get_array_function() a bit less usable, because it is more likely to fail, e.g., if called on a float, or some custom type that should be treated as a scalar.

Also, I'm still not sure I agree with the tone of the discussion on this topic. It's very heavily inspired by what the JAX devs are telling you (the NEP still says PyTorch and scipy.sparse as well, but that's not true in both cases). If you ask Dask and CuPy for example, they're quite happy with __array_function__ and there haven't been many complaints about backwards compat breakage.

I'm linking to comments you wrote in reference to PyTorch and scipy.sparse in the current draft of the NEP, so I certainly want to make sure that you agree my characterization :).

Would it be fair to say:
- JAX is reluctant to implement __array_function__ because of concerns about breaking existing code. JAX developers think that when users use NumPy functions on JAX arrays, they are explicitly choosing to convert from JAX to NumPy. This model is fundamentally incompatible __array_function__, which we chose to override the existing numpy namespace.
- PyTorch and scipy.sparse are not yet in position to implement __array_function__ (due to a lack of a direct implementation of NumPy's API), but these projects take backwards compatibility seriously.

Does "take backwards compatibility seriously" sound about right to you? I'm very open to specific suggestions here. (TensorFlow could probably also be safely added to this second list.)

Best,
Stephan