[Numpy-discussion] NEP 37: A dispatch protocol for NumPy-like modules

Mon Feb 24 01:44:29 EST 2020

On Sun, Feb 23, 2020 at 3:59 PM Ralf Gommers <ralf.gommers at gmail.com> wrote:

>
>
> On Sun, Feb 23, 2020 at 3:31 PM Stephan Hoyer <shoyer at gmail.com> wrote:
>
>> On Thu, Feb 6, 2020 at 12:20 PM Sebastian Berg <
>> sebastian at sipsolutions.net> wrote:
>>
>>>
>>> Another thing about backward compatibility: What is our vision there
>>> actually?
>>> This NEP will *not* give the *end user* the option to opt-in! Here,
>>> opt-in is really reserved to the *library user* (e.g. sklearn). (I did
>>> not realize this clearly before)
>>>
>>> Thinking about that for a bit now, that seems like the right choice.
>>> But it also means that the library requires an easy way of giving a
>>> FutureWarning, to notify the end-user of the upcoming change. The end-
>>> user will easily be able to convert to a NumPy array to keep the old
>>> behaviour.
>>> Once this warning is given (maybe during `get_array_module()`, the
>>> array module object/context would preferably be passed around,
>>> hopefully even between libraries. That provides a reasonable way to
>>> opt-in to the new behaviour without a warning (mainly for library
>>> users, end-users can silence the warning if they wish so).
>>>
>>
>> I don't think NumPy needs to do anything about warnings. It is
>> straightforward for libraries that want to use use get_array_module() to
>> issue their own warnings before calling get_array_module(), if desired.
>>
>
>> Or alternatively, if a library is about to add a new __array_module__
>> method, it is straightforward to issue a warning inside the new
>> __array_module__ method before returning the NumPy functions.
>>
>
> I don't think this is quite enough. Sebastian points out a fairly
> important issue. One of the main rationales for the whole NEP, and the
> argument in multiple places (
> https://numpy.org/neps/nep-0037-array-module.html#opt-in-vs-opt-out-for-users)
> is that it's now opt-in while __array_function__ was opt-out. This isn't
> really true - the problem is simply *moved*, from the duck array libraries
> to the array-consuming libraries. The end user will still see the backwards
> incompatible change, with no way to turn it off. It will be easier with
> __array_module__ to warn users, but this should be expanded on in the NEP.
>

Ralf, thanks for sharing your thoughts.

I'm not quite I understand the concerns about backwards incompatibility:
1. The intention is that implementing a __array_module__ method should be
backwards compatible with all current uses of NumPy. This satisfies
backwards compatibility concerns for an array-implementing library like JAX.
2. In contrast, calling get_array_module() offers no guarantees about
backwards compatibility. This seems nearly impossible, because the entire
point of the protocol is to make it possible to opt-in to new behavior. So
backwards compatibility isn't solved for Scikit-Learn switching to use
get_array_module(), and after Scikit-Learn does so, adding __array_module__
to new types of arrays could potentially have backwards incompatible
consequences for Scikit-Learn (unless sklearn uses default=None).

Are you suggesting just adding something like what I'm writing here into
the NEP? Perhaps along with advice to consider issuing warnings inside
__array_module__  and falling back to legacy behavior when first
implementing it on a new type?

We could also potentially make a few changes to make backwards
compatibility even easier, by making the protocol less aggressive about
assuming that NumPy is a safe fallback. Some non-exclusive options:
a. We could switch the default value of "default" on get_array_module() to
None, so an exception is raised if nothing implements __array_module__.
b. We could includes *all* argument types in "types", not just types that
implement __array_module__. NumPy's ndarray.__array_module__ could then
recognize and refuse to return an implementation if there are other
arguments that might implement __array_module__ in the future (e.g.,
anything outside the standard library?).

The downside of making either of these choices is that it would potentially
make get_array_function() a bit less usable, because it is more likely to
fail, e.g., if called on a float, or some custom type that should be
treated as a scalar.

Also, I'm still not sure I agree with the tone of the discussion on this
> topic. It's very heavily inspired by what the JAX devs are telling you (the
> NEP still says PyTorch and scipy.sparse as well, but that's not true in
> both cases). If you ask Dask and CuPy for example, they're quite happy with
> __array_function__ and there haven't been many complaints about backwards
> compat breakage.
>

I'm linking to comments you wrote in reference to PyTorch and scipy.sparse
in the current draft of the NEP, so I certainly want to make sure that you
agree my characterization :).

Would it be fair to say:
- JAX is reluctant to implement __array_function__ because of concerns
about breaking existing code. JAX developers think that when users use
NumPy functions on JAX arrays, they are explicitly choosing to convert from
JAX to NumPy. This model is fundamentally incompatible __array_function__,
which we chose to override the existing numpy namespace.
- PyTorch and scipy.sparse are not yet in position to implement
__array_function__ (due to a lack of a direct implementation of NumPy's
API), but these projects take backwards compatibility seriously.

Does "take backwards compatibility seriously" sound about right to you? I'm
very open to specific suggestions here. (TensorFlow could probably also be
safely added to this second list.)

Best,
Stephan
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20200223/5872edef/attachment-0001.html>