[Numpy-discussion] NEP 37: A dispatch protocol for NumPy-like modules

Thu Apr 9 07:52:05 EDT 2020

On Wed, Mar 4, 2020 at 1:22 AM Sebastian Berg <sebastian at sipsolutions.net>
wrote:

> On Sun, 2020-02-23 at 22:44 -0800, Stephan Hoyer wrote:
> > On Sun, Feb 23, 2020 at 3:59 PM Ralf Gommers <ralf.gommers at gmail.com>
> > wrote:
> > >
> > > On Sun, Feb 23, 2020 at 3:31 PM Stephan Hoyer <shoyer at gmail.com>
> > > wrote:
> > > > On Thu, Feb 6, 2020 at 12:20 PM Sebastian Berg <
> > > > sebastian at sipsolutions.net> wrote:
> <snip>
> > > >
> > > > I don't think NumPy needs to do anything about warnings. It is
> > > > straightforward for libraries that want to use use
> > > > get_array_module() to issue their own warnings before calling
> > > > get_array_module(), if desired.
> > > >
> > > > Or alternatively, if a library is about to add a new
> > > > __array_module__ method, it is straightforward to issue a warning
> > > > inside the new __array_module__ method before returning the NumPy
> > > > functions.
> > > >
> > >
> > > I don't think this is quite enough. Sebastian points out a fairly
> > > important issue. One of the main rationales for the whole NEP, and
> > > the argument in multiple places (
> > >
> https://numpy.org/neps/nep-0037-array-module.html#opt-in-vs-opt-out-for-users
> > > ) is that it's now opt-in while __array_function__ was opt-out.
> > > This isn't really true - the problem is simply *moved*, from the
> > > duck array libraries to the array-consuming libraries. The end user
> > > will still see the backwards incompatible change, with no way to
> > > turn it off. It will be easier with __array_module__ to warn users,
> > > but this should be expanded on in the NEP.
> > >
> >
> > Ralf, thanks for sharing your thoughts.
>

Sorry, this never made it back to the top of my todo list.

>
> > I'm not quite I understand the concerns about backwards
> > incompatibility:
> > 1. The intention is that implementing a __array_module__ method
> > should be backwards compatible with all current uses of NumPy. This
> > satisfies backwards compatibility concerns for an array-implementing
> > library like JAX.
> > 2. In contrast, calling get_array_module() offers no guarantees about
> > backwards compatibility. This seems nearly impossible, because the
> > entire point of the protocol is to make it possible to opt-in to new
> > behavior.

Indeed, it is nearly impossible. Except if there's a context manager or
some other control mechanism exposed to the end user. Hence that should be
a part of the design I think. Otherwise you're just solving something for
the JAX devs, but not for the scikit-learn/scipy/etc devs who will then
each have to invent their own wheel for backwards compat.

So backwards compatibility isn't solved for Scikit-Learn
> > switching to use get_array_module(), and after Scikit-Learn does so,
> > adding __array_module__ to new types of arrays could potentially have
> > backwards incompatible consequences for Scikit-Learn (unless sklearn
> > uses default=None).
> >
> > Are you suggesting just adding something like what I'm writing here
> > into the NEP? Perhaps along with advice to consider issuing warnings
> > inside __array_module__  and falling back to legacy behavior when
> > first implementing it on a new type?
>
> I think that should be sufficient, personally. We could mention that
> scikit-learn will likely use a context manager to do this.
> We can also think about providing a global default (which sklearn can
> use as its own default if they wish so, but that is reserved to the
> end-user).
>

+1

That would be a small amendment, and I think we could add it even after
> accepting the NEP as it is.
>
> >
> > We could also potentially make a few changes to make backwards
> > compatibility even easier, by making the protocol less aggressive
> > about assuming that NumPy is a safe fallback. Some non-exclusive
> > options:
> > a. We could switch the default value of "default" on
> > get_array_module() to None, so an exception is raised if nothing
> > implements __array_module__.
>
> I am not sure that I feel switching the default to None makes much of a
> difference to be honest. Unless we use it to signal a super strict mode
> similar to b. below.
>

I agree, that doesn't make a difference.

> > b. We could includes *all* argument types in "types", not just types
> > that implement __array_module__. NumPy's ndarray.__array_module__
> > could then recognize and refuse to return an implementation if there
> > are other arguments that might implement __array_module__ in the
> > future (e.g., anything outside the standard library?).
>
> That is a good point, anything that is not NumPy recognized could
> simply be rejected. It does mean that you have to call
> `module.asarray()` manually more often though.
> For `list`, it could also make sense to just add np.ndarray to types.
>
> If we want to be conservative, maybe we could also just error before
> calling `__array_module__`.  Whenever there is something that we do not
> know how to interpret force the user to clarify?
>
> >
> > The downside of making either of these choices is that it would
> > potentially make get_array_function() a bit less usable, because it
> > is more likely to fail, e.g., if called on a float, or some custom
> > type that should be treated as a scalar.
>
> Right, although we could relax it later if it seems overly annoying.
>

Interesting point. Not accepting sequences could be considered here. It may
help a lot with robustness and typing to only accept ndarray, other objects
with __array__, and scalars.

> >
> > > Also, I'm still not sure I agree with the tone of the discussion on
> > > this topic. It's very heavily inspired by what the JAX devs are
> > > telling you (the NEP still says PyTorch and scipy.sparse as well,
> > > but that's not true in both cases). If you ask Dask and CuPy for
> > > example, they're quite happy with __array_function__ and there
> > > haven't been many complaints about backwards compat breakage.
> > >
> >
> > I'm linking to comments you wrote in reference to PyTorch and
> > scipy.sparse in the current draft of the NEP, so I certainly want to
> > make sure that you agree my characterization :).
> >
> > Would it be fair to say:
> > - JAX is reluctant to implement __array_function__ because of
> > concerns about breaking existing code. JAX developers think that when
> > users use NumPy functions on JAX arrays, they are explicitly choosing
> > to convert from JAX to NumPy. This model is fundamentally
> > incompatible __array_function__, which we chose to override the
> > existing numpy namespace.
>

agreed

> - PyTorch and scipy.sparse are not yet in position to implement
> > __array_function__ (due to a lack of a direct implementation of
> > NumPy's API), but these projects take backwards compatibility
> > seriously.
>

True. I would say though that scipy.sparse will never implement either
__array_function__ or array_module__ due to semantic imcompatibilities (it
acts like np.matrix). So it's kind of irrelevant. And if PyTorch gets
around to adding a numpy-compatible API, they're fine with
__array_function__.

>
> > Does "take backwards compatibility seriously" sound about right to
> > you? I'm very open to specific suggestions here. (TensorFlow could
> > probably also be safely added to this second list.)
>
> This will need input from Ralf, my personal main concern is backward
> compatibility in libraries: I am pretty sure sklearn would only use a
> potential `np.asduckarray` when the user opted in. But in that case my
> personal feeling is that the `get_array_module` solution is cleaner and
> makes it easier to expand functionality slowly (for libraries).
>
> Two other points:
>
> First, I am wondering if we should add something like a `__qualname__`
> to the contract. I.e. a returned module must have a well defined
> `module.__name__` (that is usually already correct), so that sklearn
> could do:
>
> module = np.get_array_module(*arrays)
> if module.__name__ not in ("numpy", "sparse"):
>     raise TypeError("Currently only numpy and sparse are supported")
>
> if they wish so (that is trivial, but if you return a class acting as a
> module it may be important).
>
> Second, we have to make progress on whether or not the "restricted"
> namespace idea should have priority.  My personal opinion is tending
> strongly towards no.
>

I think it's quite important, and __array_module__ gives a chance to
introduce it. However, it's not ready - so I'd say that if __array_module__
implementation is ready and there's no well-defined restricted API proposal
(I expect to have that in August), then we can move ahead without it.

The NumPy version should normally be older than other libraries, and if
> NumPy updates the API so do the downstream implementers.
> E.g. dask may have to provide multiple version of the same function
> depending on the installed NumPy version, but that seems OK to me?

That seems unworkable, and I don't think any libraries do this. Coupling
the semantics of a single Dask function to the installed numpy version is
odd.

It is just as downstream libraries currently have to support multiple
> NumPy versions.
> We could add a contract that the first time `get_array_module` is used
> to e.g. get the dask namespace and the NumPy version is too new, a
> warning should be given.
>

I think we can't solve this until we have a well-defined API, which is the
restricted API + API versioning. Until then it just remains with the
current status, compatibility is implementation-defined.

Cheers,
Ralf

> The practical thing seems to me that we ignore this for the moment (as
> something we can do later on)? If there is missing API, in most cases
> an AttributeError will be raised which could provide some additional
> information to the user?
> The only alternative seems the complete opposite?: Create a new module,
> and make even NumPy only one of the implementers of that new
> (restricted) module. That may be cleaner, but I fear that it is
> impractical to be honest.
>
>
> I will put this on the agenda for tomorrow, even if we discuss it only
> very briefly. My feeling (and hope) is that we are nearing a point
> where we can make a final decision.
>
> Best,
>
> Sebastian
>
>
> >
> > Best,
> > Stephan
> >
> > _______________________________________________
> > NumPy-Discussion mailing list
> > NumPy-Discussion at python.org
> > https://mail.python.org/mailman/listinfo/numpy-discussion
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at python.org
> https://mail.python.org/mailman/listinfo/numpy-discussion
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20200409/7d2da506/attachment-0001.html>