<div dir="ltr"><div dir="ltr"><br></div><br><div class="gmail_quote"><div dir="ltr" class="gmail_attr">On Wed, Mar 4, 2020 at 1:22 AM Sebastian Berg <<a href="mailto:sebastian@sipsolutions.net">sebastian@sipsolutions.net</a>> wrote:<br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">On Sun, 2020-02-23 at 22:44 -0800, Stephan Hoyer wrote:<br>

> On Sun, Feb 23, 2020 at 3:59 PM Ralf Gommers <<a href="mailto:ralf.gommers@gmail.com" target="_blank">ralf.gommers@gmail.com</a>><br>

> wrote:<br>

> > <br>

> > On Sun, Feb 23, 2020 at 3:31 PM Stephan Hoyer <<a href="mailto:shoyer@gmail.com" target="_blank">shoyer@gmail.com</a>><br>

> > wrote:<br>

> > > On Thu, Feb 6, 2020 at 12:20 PM Sebastian Berg <<br>

> > > <a href="mailto:sebastian@sipsolutions.net" target="_blank">sebastian@sipsolutions.net</a>> wrote:<br>

<snip><br>

> > > <br>

> > > I don't think NumPy needs to do anything about warnings. It is<br>

> > > straightforward for libraries that want to use use<br>

> > > get_array_module() to issue their own warnings before calling<br>

> > > get_array_module(), if desired.<br>

> > > <br>

> > > Or alternatively, if a library is about to add a new<br>

> > > __array_module__ method, it is straightforward to issue a warning<br>

> > > inside the new __array_module__ method before returning the NumPy<br>

> > > functions. <br>

> > > <br>

> > <br>

> > I don't think this is quite enough. Sebastian points out a fairly<br>

> > important issue. One of the main rationales for the whole NEP, and<br>

> > the argument in multiple places (<br>

> > <a href="https://numpy.org/neps/nep-0037-array-module.html#opt-in-vs-opt-out-for-users" rel="noreferrer" target="_blank">https://numpy.org/neps/nep-0037-array-module.html#opt-in-vs-opt-out-for-users</a><br>

> > ) is that it's now opt-in while __array_function__ was opt-out.<br>

> > This isn't really true - the problem is simply *moved*, from the<br>

> > duck array libraries to the array-consuming libraries. The end user<br>

> > will still see the backwards incompatible change, with no way to<br>

> > turn it off. It will be easier with __array_module__ to warn users,<br>

> > but this should be expanded on in the NEP.<br>

> > <br>

> <br>

> Ralf, thanks for sharing your thoughts.<br></blockquote><div><br></div><div>Sorry, this never made it back to the top of my todo list.<br></div><div> <br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">

> <br>

> I'm not quite I understand the concerns about backwards<br>

> incompatibility:<br>

> 1. The intention is that implementing a __array_module__ method<br>

> should be backwards compatible with all current uses of NumPy. This<br>

> satisfies backwards compatibility concerns for an array-implementing<br>

> library like JAX.<br>

> 2. In contrast, calling get_array_module() offers no guarantees about<br>

> backwards compatibility. This seems nearly impossible, because the<br>

> entire point of the protocol is to make it possible to opt-in to new<br>

> behavior. </blockquote><div><br></div><div>Indeed, it is nearly impossible. Except if there's a context manager or some other control mechanism exposed to the end user. Hence that should be a part of the design I think. Otherwise you're just solving something for the JAX devs, but not for the scikit-learn/scipy/etc devs who will then each have to invent their own wheel for backwards compat.<br></div><div> <br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">So backwards compatibility isn't solved for Scikit-Learn<br>

> switching to use get_array_module(), and after Scikit-Learn does so,<br>

> adding __array_module__ to new types of arrays could potentially have<br>

> backwards incompatible consequences for Scikit-Learn (unless sklearn<br>

> uses default=None).<br>

> <br>

> Are you suggesting just adding something like what I'm writing here<br>

> into the NEP? Perhaps along with advice to consider issuing warnings<br>

> inside __array_module__  and falling back to legacy behavior when<br>

> first implementing it on a new type?<br>

<br>

I think that should be sufficient, personally. We could mention that<br>

scikit-learn will likely use a context manager to do this.<br>

We can also think about providing a global default (which sklearn can<br>

use as its own default if they wish so, but that is reserved to the<br>

end-user).<br></blockquote><div><br></div><div>+1</div><div> <br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">

That would be a small amendment, and I think we could add it even after<br>

accepting the NEP as it is.<br>

<br>

> <br>

> We could also potentially make a few changes to make backwards<br>

> compatibility even easier, by making the protocol less aggressive<br>

> about assuming that NumPy is a safe fallback. Some non-exclusive<br>

> options:<br>

> a. We could switch the default value of "default" on<br>

> get_array_module() to None, so an exception is raised if nothing<br>

> implements __array_module__.<br>

<br>

I am not sure that I feel switching the default to None makes much of a<br>

difference to be honest. Unless we use it to signal a super strict mode<br>

similar to b. below.<br></blockquote><div><br></div><div>I agree, that doesn't make a difference.</div><div> <br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">

<br>

> b. We could includes *all* argument types in "types", not just types<br>

> that implement __array_module__. NumPy's ndarray.__array_module__<br>

> could then recognize and refuse to return an implementation if there<br>

> are other arguments that might implement __array_module__ in the<br>

> future (e.g., anything outside the standard library?).<br>

<br>

That is a good point, anything that is not NumPy recognized could<br>

simply be rejected. It does mean that you have to call<br>

`module.asarray()` manually more often though.<br>

For `list`, it could also make sense to just add np.ndarray to types.<br>

<br>

If we want to be conservative, maybe we could also just error before<br>

calling `__array_module__`.  Whenever there is something that we do not<br>

know how to interpret force the user to clarify?<br>

<br>

> <br>

> The downside of making either of these choices is that it would<br>

> potentially make get_array_function() a bit less usable, because it<br>

> is more likely to fail, e.g., if called on a float, or some custom<br>

> type that should be treated as a scalar.<br>

<br>

Right, although we could relax it later if it seems overly annoying.<br></blockquote><div><br></div><div>Interesting point. Not accepting sequences could be considered here. It may help a lot with robustness and typing to only accept ndarray, other objects with __array__, and scalars. <br></div><div> <br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">

<br>

> <br>

> > Also, I'm still not sure I agree with the tone of the discussion on<br>

> > this topic. It's very heavily inspired by what the JAX devs are<br>

> > telling you (the NEP still says PyTorch and scipy.sparse as well,<br>

> > but that's not true in both cases). If you ask Dask and CuPy for<br>

> > example, they're quite happy with __array_function__ and there<br>

> > haven't been many complaints about backwards compat breakage.<br>

> > <br>

> <br>

> I'm linking to comments you wrote in reference to PyTorch and<br>

> scipy.sparse in the current draft of the NEP, so I certainly want to<br>

> make sure that you agree my characterization :).<br>

> <br>

> Would it be fair to say:<br>

> - JAX is reluctant to implement __array_function__ because of<br>

> concerns about breaking existing code. JAX developers think that when<br>

> users use NumPy functions on JAX arrays, they are explicitly choosing<br>

> to convert from JAX to NumPy. This model is fundamentally<br>

> incompatible __array_function__, which we chose to override the<br>

> existing numpy namespace.<br></blockquote><div><br></div><div>agreed</div><div> <br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">

> - PyTorch and scipy.sparse are not yet in position to implement<br>

> __array_function__ (due to a lack of a direct implementation of<br>

> NumPy's API), but these projects take backwards compatibility<br>

> seriously.<br></blockquote><div><br></div><div>True. I would say though that scipy.sparse will never implement either __array_function__ or array_module__ due to semantic imcompatibilities (it acts like np.matrix). So it's kind of irrelevant. And if PyTorch gets around to adding a numpy-compatible API, they're fine with __array_function__.<br></div><div><br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">

> <br>

> Does "take backwards compatibility seriously" sound about right to<br>

> you? I'm very open to specific suggestions here. (TensorFlow could<br>

> probably also be safely added to this second list.)<br>

<br>

This will need input from Ralf, my personal main concern is backward<br>

compatibility in libraries: I am pretty sure sklearn would only use a<br>

potential `np.asduckarray` when the user opted in. But in that case my<br>

personal feeling is that the `get_array_module` solution is cleaner and<br>

makes it easier to expand functionality slowly (for libraries).<br>

<br>

Two other points:<br>

<br>

First, I am wondering if we should add something like a `__qualname__`<br>

to the contract. I.e. a returned module must have a well defined<br>

`module.__name__` (that is usually already correct), so that sklearn<br>

could do:<br>

<br>

module = np.get_array_module(*arrays)<br>

if module.__name__ not in ("numpy", "sparse"):<br>

    raise TypeError("Currently only numpy and sparse are supported")<br>

<br>

if they wish so (that is trivial, but if you return a class acting as a<br>

module it may be important).<br>

<br>

Second, we have to make progress on whether or not the "restricted"<br>

namespace idea should have priority.  My personal opinion is tending<br>

strongly towards no.<br></blockquote><div><br></div><div>I think it's quite important, and __array_module__ gives a chance to introduce it. However, it's not ready - so I'd say that if __array_module__ implementation is ready and there's no well-defined restricted API proposal (I expect to have that in August), then we can move ahead without it.<br></div><div> <br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">

The NumPy version should normally be older than other libraries, and if<br>

NumPy updates the API so do the downstream implementers.<br>

E.g. dask may have to provide multiple version of the same function<br>

depending on the installed NumPy version, but that seems OK to me? </blockquote><div><br></div><div>That seems unworkable, and I don't think any libraries do this. Coupling the semantics of a single Dask function to the installed numpy version is odd.<br></div><div> <br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">It is just as downstream libraries currently have to support multiple<br>

NumPy versions.<br>

We could add a contract that the first time `get_array_module` is used<br>

to e.g. get the dask namespace and the NumPy version is too new, a<br>

warning should be given.<br></blockquote><div><br></div><div>I think we can't solve this until we have a well-defined API, which is the restricted API + API versioning. Until then it just remains with the current status, compatibility is implementation-defined.</div><div><br></div><div>Cheers,</div><div>Ralf</div><div><br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">

<br>

The practical thing seems to me that we ignore this for the moment (as<br>

something we can do later on)? If there is missing API, in most cases<br>

an AttributeError will be raised which could provide some additional<br>

information to the user?<br>

The only alternative seems the complete opposite?: Create a new module,<br>

and make even NumPy only one of the implementers of that new<br>

(restricted) module. That may be cleaner, but I fear that it is<br>

impractical to be honest.<br>

<br>

<br>

I will put this on the agenda for tomorrow, even if we discuss it only<br>

very briefly. My feeling (and hope) is that we are nearing a point<br>

where we can make a final decision.<br>

<br>

Best,<br>

<br>

Sebastian<br>

<br>

<br>

> <br>

> Best,<br>

> Stephan<br>

> <br>

> _______________________________________________<br>

> NumPy-Discussion mailing list<br>

> <a href="mailto:NumPy-Discussion@python.org" target="_blank">NumPy-Discussion@python.org</a><br>

> <a href="https://mail.python.org/mailman/listinfo/numpy-discussion" rel="noreferrer" target="_blank">https://mail.python.org/mailman/listinfo/numpy-discussion</a><br>

_______________________________________________<br>

NumPy-Discussion mailing list<br>

<a href="mailto:NumPy-Discussion@python.org" target="_blank">NumPy-Discussion@python.org</a><br>

<a href="https://mail.python.org/mailman/listinfo/numpy-discussion" rel="noreferrer" target="_blank">https://mail.python.org/mailman/listinfo/numpy-discussion</a><br>

</blockquote></div></div>