[Numpy-discussion] NEP 37: A dispatch protocol for NumPy-like modules

Sebastian Berg sebastian at sipsolutions.net
Tue Mar 3 19:20:31 EST 2020


On Sun, 2020-02-23 at 22:44 -0800, Stephan Hoyer wrote:
> On Sun, Feb 23, 2020 at 3:59 PM Ralf Gommers <ralf.gommers at gmail.com>
> wrote:
> > 
> > On Sun, Feb 23, 2020 at 3:31 PM Stephan Hoyer <shoyer at gmail.com>
> > wrote:
> > > On Thu, Feb 6, 2020 at 12:20 PM Sebastian Berg <
> > > sebastian at sipsolutions.net> wrote:
<snip>
> > > 
> > > I don't think NumPy needs to do anything about warnings. It is
> > > straightforward for libraries that want to use use
> > > get_array_module() to issue their own warnings before calling
> > > get_array_module(), if desired.
> > > 
> > > Or alternatively, if a library is about to add a new
> > > __array_module__ method, it is straightforward to issue a warning
> > > inside the new __array_module__ method before returning the NumPy
> > > functions. 
> > > 
> > 
> > I don't think this is quite enough. Sebastian points out a fairly
> > important issue. One of the main rationales for the whole NEP, and
> > the argument in multiple places (
> > https://numpy.org/neps/nep-0037-array-module.html#opt-in-vs-opt-out-for-users
> > ) is that it's now opt-in while __array_function__ was opt-out.
> > This isn't really true - the problem is simply *moved*, from the
> > duck array libraries to the array-consuming libraries. The end user
> > will still see the backwards incompatible change, with no way to
> > turn it off. It will be easier with __array_module__ to warn users,
> > but this should be expanded on in the NEP.
> > 
> 
> Ralf, thanks for sharing your thoughts.
> 
> I'm not quite I understand the concerns about backwards
> incompatibility:
> 1. The intention is that implementing a __array_module__ method
> should be backwards compatible with all current uses of NumPy. This
> satisfies backwards compatibility concerns for an array-implementing
> library like JAX.
> 2. In contrast, calling get_array_module() offers no guarantees about
> backwards compatibility. This seems nearly impossible, because the
> entire point of the protocol is to make it possible to opt-in to new
> behavior. So backwards compatibility isn't solved for Scikit-Learn
> switching to use get_array_module(), and after Scikit-Learn does so,
> adding __array_module__ to new types of arrays could potentially have
> backwards incompatible consequences for Scikit-Learn (unless sklearn
> uses default=None).
> 
> Are you suggesting just adding something like what I'm writing here
> into the NEP? Perhaps along with advice to consider issuing warnings
> inside __array_module__  and falling back to legacy behavior when
> first implementing it on a new type?

I think that should be sufficient, personally. We could mention that
scikit-learn will likely use a context manager to do this.
We can also think about providing a global default (which sklearn can
use as its own default if they wish so, but that is reserved to the
end-user).
That would be a small amendment, and I think we could add it even after
accepting the NEP as it is.

> 
> We could also potentially make a few changes to make backwards
> compatibility even easier, by making the protocol less aggressive
> about assuming that NumPy is a safe fallback. Some non-exclusive
> options:
> a. We could switch the default value of "default" on
> get_array_module() to None, so an exception is raised if nothing
> implements __array_module__.

I am not sure that I feel switching the default to None makes much of a
difference to be honest. Unless we use it to signal a super strict mode
similar to b. below.

> b. We could includes *all* argument types in "types", not just types
> that implement __array_module__. NumPy's ndarray.__array_module__
> could then recognize and refuse to return an implementation if there
> are other arguments that might implement __array_module__ in the
> future (e.g., anything outside the standard library?).

That is a good point, anything that is not NumPy recognized could
simply be rejected. It does mean that you have to call
`module.asarray()` manually more often though.
For `list`, it could also make sense to just add np.ndarray to types.

If we want to be conservative, maybe we could also just error before
calling `__array_module__`.  Whenever there is something that we do not
know how to interpret force the user to clarify?

> 
> The downside of making either of these choices is that it would
> potentially make get_array_function() a bit less usable, because it
> is more likely to fail, e.g., if called on a float, or some custom
> type that should be treated as a scalar.

Right, although we could relax it later if it seems overly annoying.

> 
> > Also, I'm still not sure I agree with the tone of the discussion on
> > this topic. It's very heavily inspired by what the JAX devs are
> > telling you (the NEP still says PyTorch and scipy.sparse as well,
> > but that's not true in both cases). If you ask Dask and CuPy for
> > example, they're quite happy with __array_function__ and there
> > haven't been many complaints about backwards compat breakage.
> > 
> 
> I'm linking to comments you wrote in reference to PyTorch and
> scipy.sparse in the current draft of the NEP, so I certainly want to
> make sure that you agree my characterization :).
> 
> Would it be fair to say:
> - JAX is reluctant to implement __array_function__ because of
> concerns about breaking existing code. JAX developers think that when
> users use NumPy functions on JAX arrays, they are explicitly choosing
> to convert from JAX to NumPy. This model is fundamentally
> incompatible __array_function__, which we chose to override the
> existing numpy namespace.
> - PyTorch and scipy.sparse are not yet in position to implement
> __array_function__ (due to a lack of a direct implementation of
> NumPy's API), but these projects take backwards compatibility
> seriously.
> 
> Does "take backwards compatibility seriously" sound about right to
> you? I'm very open to specific suggestions here. (TensorFlow could
> probably also be safely added to this second list.)

This will need input from Ralf, my personal main concern is backward
compatibility in libraries: I am pretty sure sklearn would only use a
potential `np.asduckarray` when the user opted in. But in that case my
personal feeling is that the `get_array_module` solution is cleaner and
makes it easier to expand functionality slowly (for libraries).

Two other points:

First, I am wondering if we should add something like a `__qualname__`
to the contract. I.e. a returned module must have a well defined
`module.__name__` (that is usually already correct), so that sklearn
could do:

module = np.get_array_module(*arrays)
if module.__name__ not in ("numpy", "sparse"):
    raise TypeError("Currently only numpy and sparse are supported")

if they wish so (that is trivial, but if you return a class acting as a
module it may be important).

Second, we have to make progress on whether or not the "restricted"
namespace idea should have priority.  My personal opinion is tending
strongly towards no.
The NumPy version should normally be older than other libraries, and if
NumPy updates the API so do the downstream implementers.
E.g. dask may have to provide multiple version of the same function
depending on the installed NumPy version, but that seems OK to me? It
is just as downstream libraries currently have to support multiple
NumPy versions.
We could add a contract that the first time `get_array_module` is used
to e.g. get the dask namespace and the NumPy version is too new, a
warning should be given.

The practical thing seems to me that we ignore this for the moment (as
something we can do later on)? If there is missing API, in most cases
an AttributeError will be raised which could provide some additional
information to the user?
The only alternative seems the complete opposite?: Create a new module,
and make even NumPy only one of the implementers of that new
(restricted) module. That may be cleaner, but I fear that it is
impractical to be honest.


I will put this on the agenda for tomorrow, even if we discuss it only
very briefly. My feeling (and hope) is that we are nearing a point
where we can make a final decision.

Best,

Sebastian


> 
> Best,
> Stephan
> 
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at python.org
> https://mail.python.org/mailman/listinfo/numpy-discussion
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 833 bytes
Desc: This is a digitally signed message part
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20200303/04105f87/attachment.sig>


More information about the NumPy-Discussion mailing list