[Numpy-discussion] NEP 37: A dispatch protocol for NumPy-like modules

Sebastian Berg sebastian at sipsolutions.net
Thu Apr 9 12:48:46 EDT 2020


On Thu, 2020-04-09 at 13:52 +0200, Ralf Gommers wrote:
> On Wed, Mar 4, 2020 at 1:22 AM Sebastian Berg <
> sebastian at sipsolutions.net>
> wrote:
> 
> > On Sun, 2020-02-23 at 22:44 -0800, Stephan Hoyer wrote:
> > > On Sun, Feb 23, 2020 at 3:59 PM Ralf Gommers <
> > > ralf.gommers at gmail.com>
> > > wrote:
> > > > On Sun, Feb 23, 2020 at 3:31 PM Stephan Hoyer <shoyer at gmail.com
> > > > >
> > > > wrote:
> > > > > On Thu, Feb 6, 2020 at 12:20 PM Sebastian Berg <
> > > > > sebastian at sipsolutions.net> wrote:
> > <snip>
> > > > > I don't think NumPy needs to do anything about warnings. It
> > > > > is
> > > > > straightforward for libraries that want to use use
> > > > > get_array_module() to issue their own warnings before calling
> > > > > get_array_module(), if desired.
> > > > > 
> > > > > Or alternatively, if a library is about to add a new
> > > > > __array_module__ method, it is straightforward to issue a
> > > > > warning
> > > > > inside the new __array_module__ method before returning the
> > > > > NumPy
> > > > > functions.
> > > > > 
> > > > 
> > > > I don't think this is quite enough. Sebastian points out a
> > > > fairly
> > > > important issue. One of the main rationales for the whole NEP,
> > > > and
> > > > the argument in multiple places (
> > > > 
> > https://numpy.org/neps/nep-0037-array-module.html#opt-in-vs-opt-out-for-users
> > > > ) is that it's now opt-in while __array_function__ was opt-out.
> > > > This isn't really true - the problem is simply *moved*, from
> > > > the
> > > > duck array libraries to the array-consuming libraries. The end
> > > > user
> > > > will still see the backwards incompatible change, with no way
> > > > to
> > > > turn it off. It will be easier with __array_module__ to warn
> > > > users,
> > > > but this should be expanded on in the NEP.
> > > > 
> > > 
> > > Ralf, thanks for sharing your thoughts.
> 
> Sorry, this never made it back to the top of my todo list.
> 
> > > I'm not quite I understand the concerns about backwards
> > > incompatibility:
> > > 1. The intention is that implementing a __array_module__ method
> > > should be backwards compatible with all current uses of NumPy.
> > > This
> > > satisfies backwards compatibility concerns for an array-
> > > implementing
> > > library like JAX.
> > > 2. In contrast, calling get_array_module() offers no guarantees
> > > about
> > > backwards compatibility. This seems nearly impossible, because
> > > the
> > > entire point of the protocol is to make it possible to opt-in to
> > > new
> > > behavior.
> 
> Indeed, it is nearly impossible. Except if there's a context manager
> or
> some other control mechanism exposed to the end user. Hence that
> should be
> a part of the design I think. Otherwise you're just solving something
> for
> the JAX devs, but not for the scikit-learn/scipy/etc devs who will
> then
> each have to invent their own wheel for backwards compat.
> 
> So backwards compatibility isn't solved for Scikit-Learn
> > > switching to use get_array_module(), and after Scikit-Learn does
> > > so,
> > > adding __array_module__ to new types of arrays could potentially
> > > have
> > > backwards incompatible consequences for Scikit-Learn (unless
> > > sklearn
> > > uses default=None).
> > > 
> > > Are you suggesting just adding something like what I'm writing
> > > here
> > > into the NEP? Perhaps along with advice to consider issuing
> > > warnings
> > > inside __array_module__  and falling back to legacy behavior when
> > > first implementing it on a new type?
> > 
> > I think that should be sufficient, personally. We could mention
> > that
> > scikit-learn will likely use a context manager to do this.
> > We can also think about providing a global default (which sklearn
> > can
> > use as its own default if they wish so, but that is reserved to the
> > end-user).
> > 
> 
> +1
> 
> That would be a small amendment, and I think we could add it even
> after
> > accepting the NEP as it is.
> > 
> > > We could also potentially make a few changes to make backwards
> > > compatibility even easier, by making the protocol less aggressive
> > > about assuming that NumPy is a safe fallback. Some non-exclusive
> > > options:
> > > a. We could switch the default value of "default" on
> > > get_array_module() to None, so an exception is raised if nothing
> > > implements __array_module__.
> > 
> > I am not sure that I feel switching the default to None makes much
> > of a
> > difference to be honest. Unless we use it to signal a super strict
> > mode
> > similar to b. below.
> > 
> 
> I agree, that doesn't make a difference.
> 
> 
> > > b. We could includes *all* argument types in "types", not just
> > > types
> > > that implement __array_module__. NumPy's ndarray.__array_module__
> > > could then recognize and refuse to return an implementation if
> > > there
> > > are other arguments that might implement __array_module__ in the
> > > future (e.g., anything outside the standard library?).
> > 
> > That is a good point, anything that is not NumPy recognized could
> > simply be rejected. It does mean that you have to call
> > `module.asarray()` manually more often though.
> > For `list`, it could also make sense to just add np.ndarray to
> > types.
> > 
> > If we want to be conservative, maybe we could also just error
> > before
> > calling `__array_module__`.  Whenever there is something that we do
> > not
> > know how to interpret force the user to clarify?
> > 
> > > The downside of making either of these choices is that it would
> > > potentially make get_array_function() a bit less usable, because
> > > it
> > > is more likely to fail, e.g., if called on a float, or some
> > > custom
> > > type that should be treated as a scalar.
> > 
> > Right, although we could relax it later if it seems overly
> > annoying.
> > 
> 
> Interesting point. Not accepting sequences could be considered here.
> It may
> help a lot with robustness and typing to only accept ndarray, other
> objects
> with __array__, and scalars.
> 
> 
> > > > Also, I'm still not sure I agree with the tone of the
> > > > discussion on
> > > > this topic. It's very heavily inspired by what the JAX devs are
> > > > telling you (the NEP still says PyTorch and scipy.sparse as
> > > > well,
> > > > but that's not true in both cases). If you ask Dask and CuPy
> > > > for
> > > > example, they're quite happy with __array_function__ and there
> > > > haven't been many complaints about backwards compat breakage.
> > > > 
> > > 
> > > I'm linking to comments you wrote in reference to PyTorch and
> > > scipy.sparse in the current draft of the NEP, so I certainly want
> > > to
> > > make sure that you agree my characterization :).
> > > 
> > > Would it be fair to say:
> > > - JAX is reluctant to implement __array_function__ because of
> > > concerns about breaking existing code. JAX developers think that
> > > when
> > > users use NumPy functions on JAX arrays, they are explicitly
> > > choosing
> > > to convert from JAX to NumPy. This model is fundamentally
> > > incompatible __array_function__, which we chose to override the
> > > existing numpy namespace.
> 
> agreed
> 
> > - PyTorch and scipy.sparse are not yet in position to implement
> > > __array_function__ (due to a lack of a direct implementation of
> > > NumPy's API), but these projects take backwards compatibility
> > > seriously.
> 
> True. I would say though that scipy.sparse will never implement
> either
> __array_function__ or array_module__ due to semantic
> imcompatibilities (it
> acts like np.matrix). So it's kind of irrelevant. And if PyTorch gets
> around to adding a numpy-compatible API, they're fine with
> __array_function__.
> 
> > > Does "take backwards compatibility seriously" sound about right
> > > to
> > > you? I'm very open to specific suggestions here. (TensorFlow
> > > could
> > > probably also be safely added to this second list.)
> > 
> > This will need input from Ralf, my personal main concern is
> > backward
> > compatibility in libraries: I am pretty sure sklearn would only use
> > a
> > potential `np.asduckarray` when the user opted in. But in that case
> > my
> > personal feeling is that the `get_array_module` solution is cleaner
> > and
> > makes it easier to expand functionality slowly (for libraries).
> > 
> > Two other points:
> > 
> > First, I am wondering if we should add something like a
> > `__qualname__`
> > to the contract. I.e. a returned module must have a well defined
> > `module.__name__` (that is usually already correct), so that
> > sklearn
> > could do:
> > 
> > module = np.get_array_module(*arrays)
> > if module.__name__ not in ("numpy", "sparse"):
> >     raise TypeError("Currently only numpy and sparse are
> > supported")
> > 
> > if they wish so (that is trivial, but if you return a class acting
> > as a
> > module it may be important).
> > 
> > Second, we have to make progress on whether or not the "restricted"
> > namespace idea should have priority.  My personal opinion is
> > tending
> > strongly towards no.
> > 
> 
> I think it's quite important, and __array_module__ gives a chance to
> introduce it. However, it's not ready - so I'd say that if
> __array_module__
> implementation is ready and there's no well-defined restricted API
> proposal
> (I expect to have that in August), then we can move ahead without it.
> 
> The NumPy version should normally be older than other libraries, and
> if
> > NumPy updates the API so do the downstream implementers.
> > E.g. dask may have to provide multiple version of the same function
> > depending on the installed NumPy version, but that seems OK to me?
> 
> That seems unworkable, and I don't think any libraries do this.
> Coupling
> the semantics of a single Dask function to the installed numpy
> version is
> odd.

Is it all that odd? Libraries (not array providers) already need to
test for NumPy version occasionally due to API changes, so they also
have two versions of the same thing around (e.g. a fallback).
This simply would move the burden to the array-object implementer to
some degree. Assume that we have a versioned API in some form or
another, it seems to me we either require:

   module = np.get_array_module(..., api_version=2)

or define `module.__api_version__`.

Where the latter means that sklearn/SciPy may have to check
`__api_version__` on every function call, while currently such checks
usually happen at import time. On the other hand, the former means that
sklearn/scipy can only opt-in to new API after 3+ years easily?

Saying that the NumPy version is what pins the api-version, is not much
more than assuming/requiring that NumPy will be the least up-to-date
package?

Of course it is unworkable to get 100% right in practice but are you
saying that because it seems like an impractical approach, or because
the API surface is currently so large that, of course, we will never
get it 100% right (but that is generally true, nobody will be able to
implement NumPy 100% compatible)?

`__array_function__` has same issue? If we change our API, Dask has to
catch up. If SciPy expects it to be the old version though (based on
the NumPy import) it will incorrectly assume the old-api will be used.

- Sebastian


> 
> It is just as downstream libraries currently have to support multiple
> > NumPy versions.
> > We could add a contract that the first time `get_array_module` is
> > used
> > to e.g. get the dask namespace and the NumPy version is too new, a
> > warning should be given.
> > 
> 
> I think we can't solve this until we have a well-defined API, which
> is the
> restricted API + API versioning. Until then it just remains with the
> current status, compatibility is implementation-defined.
> 
> Cheers,
> Ralf
> 
> 
> > The practical thing seems to me that we ignore this for the moment
> > (as
> > something we can do later on)? If there is missing API, in most
> > cases
> > an AttributeError will be raised which could provide some
> > additional
> > information to the user?
> > The only alternative seems the complete opposite?: Create a new
> > module,
> > and make even NumPy only one of the implementers of that new
> > (restricted) module. That may be cleaner, but I fear that it is
> > impractical to be honest.
> > 
> > 
> > I will put this on the agenda for tomorrow, even if we discuss it
> > only
> > very briefly. My feeling (and hope) is that we are nearing a point
> > where we can make a final decision.
> > 
> > Best,
> > 
> > Sebastian
> > 
> > 
> > > Best,
> > > Stephan
> > > 
> > > _______________________________________________
> > > NumPy-Discussion mailing list
> > > NumPy-Discussion at python.org
> > > https://mail.python.org/mailman/listinfo/numpy-discussion
> > _______________________________________________
> > NumPy-Discussion mailing list
> > NumPy-Discussion at python.org
> > https://mail.python.org/mailman/listinfo/numpy-discussion
> > 
> 
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at python.org
> https://mail.python.org/mailman/listinfo/numpy-discussion

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 833 bytes
Desc: This is a digitally signed message part
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20200409/d7e5fcf4/attachment.sig>


More information about the NumPy-Discussion mailing list