Hello all,
It was recently brought to my attention that my mails to NumPy-discussion were probably going into the spam folder for many people, so here I am trying from another email. Probably Google trying to force people onto their products as usual. 😉
Me, Ralf Gommers and Peter Bell (both cc’d) have come up with a proposal on how to solve the array creation and duck array problems. The solution is outlined in NEP-31, currently in the form of a PR, [1] Following the high level discussion in NEP-22. [2]
It would be nice to get some feedback.
Full-text of the NEP:
============================================================ NEP 31 — Context-local and global overrides of the NumPy API ============================================================
:Author: Hameer Abbasi habbasi@quansight.commailto:habbasi@quansight.com :Author: Ralf Gommers rgommers@quansight.commailto:rgommers@quansight.com :Author: Peter Bell peterbell10@live.co.ukmailto:peterbell10@live.co.uk :Status: Draft :Type: Standards Track :Created: 2019-08-22
Abstract --------
This NEP proposes to make all of NumPy's public API overridable via an extensible backend mechanism, using a library called ``uarray`` `[1]`_
``uarray`` provides global and context-local overrides, as well as a dispatch mechanism similar to NEP-18 `[2]`_. First experiences with ``__array_function__`` show that it is necessary to be able to override NumPy functions that *do not take an array-like argument*, and hence aren't overridable via ``__array_function__``. The most pressing need is array creation and coercion functions - see e.g. NEP-30 `[9]`_.
This NEP proposes to allow, in an opt-in fashion, overriding any part of the NumPy API. It is intended as a comprehensive resolution to NEP-22 `[3]`_, and obviates the need to add an ever-growing list of new protocols for each new type of function or object that needs to become overridable.
Motivation and Scope --------------------
The motivation behind ``uarray`` is manyfold: First, there have been several attempts to allow dispatch of parts of the NumPy API, including (most prominently), the ``__array_ufunc__`` protocol in NEP-13 `[4]`_, and the ``__array_function__`` protocol in NEP-18 `[2]`_, but this has shown the need for further protocols to be developed, including a protocol for coercion (see `[5]`_). The reasons these overrides are needed have been extensively discussed in the references, and this NEP will not attempt to go into the details of why these are needed. Another pain point requiring yet another protocol is the duck-array protocol (see `[9]`_).
This NEP takes a more holistic approach: It assumes that there are parts of the API that need to be overridable, and that these will grow over time. It provides a general framework and a mechanism to avoid a design of a new protocol each time this is required.
This NEP proposes the following: That ``unumpy`` `[8]`_ becomes the recommended override mechanism for the parts of the NumPy API not yet covered by ``__array_function__`` or ``__array_ufunc__``, and that ``uarray`` is vendored into a new namespace within NumPy to give users and downstream dependencies access to these overrides. This vendoring mechanism is similar to what SciPy decided to do for making ``scipy.fft`` overridable (see `[10]`_).
Detailed description --------------------
**Note:** *This section will not attempt to explain the specifics or the mechanism of ``uarray``, that is explained in the ``uarray`` documentation.* `[1]`_ *However, the NumPy community will have input into the design of ``uarray``, and any backward-incompatible changes will be discussed on the mailing list.*
The way we propose the overrides will be used by end users is::
import numpy.overridable as np with np.set_backend(backend): x = np.asarray(my_array, dtype=dtype)
And a library that implements a NumPy-like API will use it in the following manner (as an example)::
import numpy.overridable as np _ua_implementations = {}
__ua_domain__ = "numpy"
def __ua_function__(func, args, kwargs): fn = _ua_implementations.get(func, None) return fn(*args, **kwargs) if fn is not None else NotImplemented
def implements(ua_func): def inner(func): _ua_implementations[ua_func] = func return func
return inner
@implements(np.asarray) def asarray(a, dtype=None, order=None): # Code here # Either this method or __ua_convert__ must # return NotImplemented for unsupported types, # Or they shouldn't be marked as dispatchable.
# Provides a default implementation for ones and zeros. @implements(np.full) def full(shape, fill_value, dtype=None, order='C'): # Code here
The only change this NEP proposes at its acceptance, is to make ``unumpy`` the officially recommended way to override NumPy. ``unumpy`` will remain a separate repository/package (which we propose to vendor to avoid a hard dependency, and use the separate ``unumpy`` package only if it is installed) rather than depend on for the time being), and will be developed primarily with the input of duck-array authors and secondarily, custom dtype authors, via the usual GitHub workflow. There are a few reasons for this:
* Faster iteration in the case of bugs or issues. * Faster design changes, in the case of needed functionality. * ``unumpy`` will work with older versions of NumPy as well. * The user and library author opt-in to the override process, rather than breakages happening when it is least expected. In simple terms, bugs in ``unumpy`` mean that ``numpy`` remains unaffected.
Advantanges of ``unumpy`` over other solutions ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
``unumpy`` offers a number of advantanges over the approach of defining a new protocol for every problem encountered: Whenever there is something requiring an override, ``unumpy`` will be able to offer a unified API with very minor changes. For example:
* ``ufunc`` objects can be overridden via their ``__call__``, ``reduce`` and other methods. * ``dtype`` objects can be overridden via the dispatch/backend mechanism, going as far as to allow ``np.float32`` et. al. to be overridden by overriding ``__get__``. * Other functions can be overridden in a similar fashion. * ``np.asduckarray`` goes away, and becomes ``np.asarray`` with a backend set. * The same holds for array creation functions such as ``np.zeros``, ``np.empty`` and so on.
This also holds for the future: Making something overridable would require only minor changes to ``unumpy``.
Another promise ``unumpy`` holds is one of default implementations. Default implementations can be provided for any multimethod, in terms of others. This allows one to override a large part of the NumPy API by defining only a small part of it. This is to ease the creation of new duck-arrays, by providing default implementations of many functions that can be easily expressed in terms of others, as well as a repository of utility functions that help in the implementation of duck-arrays that most duck-arrays would require.
The last benefit is a clear way to coerce to a given backend, and a protocol for coercing not only arrays, but also ``dtype`` objects and ``ufunc`` objects with similar ones from other libraries. This is due to the existence of actual, third party dtype packages, and their desire to blend into the NumPy ecosystem (see `[6]`_). This is a separate issue compared to the C-level dtype redesign proposed in `[7]`_, it's about allowing third-party dtype implementations to work with NumPy, much like third-party array implementations.
Mixing NumPy and ``unumpy`` in the same file ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Normally, one would only want to import only one of ``unumpy`` or ``numpy``, you would import it as ``np`` for familiarity. However, there may be situations where one wishes to mix NumPy and the overrides, and there are a few ways to do this, depending on the user's style::
import numpy.overridable as unumpy import numpy as np
or::
import numpy as np
# Use unumpy via np.overridable
Related Work ------------
Previous override mechanisms ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
* NEP-18, the ``__array_function__`` protocol. `[2]`_ * NEP-13, the ``__array_ufunc__`` protocol. `[3]`_
Existing NumPy-like array implementations ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
* Dask: https://dask.org/ * CuPy: https://cupy.chainer.org/ * PyData/Sparse: https://sparse.pydata.org/ * Xnd: https://xnd.readthedocs.io/ * Astropy's Quantity: https://docs.astropy.org/en/stable/units/
Existing and potential consumers of alternative arrays ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
* Dask: https://dask.org/ * scikit-learn: https://scikit-learn.org/ * Xarray: https://xarray.pydata.org/ * TensorLy: http://tensorly.org/
Existing alternate dtype implementations ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
* ``ndtypes``: https://ndtypes.readthedocs.io/en/latest/ * Datashape: https://datashape.readthedocs.io * Plum: https://plum-py.readthedocs.io/
Implementation --------------
The implementation of this NEP will require the following steps:
* Implementation of ``uarray`` multimethods corresponding to the NumPy API, including classes for overriding ``dtype``, ``ufunc`` and ``array`` objects, in the ``unumpy`` repository. * Moving backends from ``unumpy`` into the respective array libraries.
Backward compatibility ----------------------
There are no backward incompatible changes proposed in this NEP.
Alternatives ------------
The current alternative to this problem is NEP-30 plus adding more protocols (not yet specified) in addition to it. Even then, some parts of the NumPy API will remain non-overridable, so it's a partial alternative.
The main alternative to vendoring ``unumpy`` is to simply move it into NumPy completely and not distribute it as a separate package. This would also achieve the proposed goals, however we prefer to keep it a separate package for now, for reasons already stated above.
Discussion ----------
* ``uarray`` blogpost: https://labs.quansight.org/blog/2019/07/uarray-update-api-changes-overhead-a... * The discussion section of NEP-18: https://numpy.org/neps/nep-0018-array-function-protocol.html#discussion * NEP-22: https://numpy.org/neps/nep-0022-ndarray-duck-typing-overview.html * Dask issue #4462: https://github.com/dask/dask/issues/4462 * PR #13046: https://github.com/numpy/numpy/pull/13046 * Dask issue #4883: https://github.com/dask/dask/issues/4883 * Issue #13831: https://github.com/numpy/numpy/issues/13831 * Discussion PR 1: https://github.com/hameerabbasi/numpy/pull/3 * Discussion PR 2: https://github.com/hameerabbasi/numpy/pull/4
References and Footnotes ------------------------
.. _[1]:
[1] uarray, A general dispatch mechanism for Python: https://uarray.readthedocs.io
.. _[2]:
[2] NEP 18 — A dispatch mechanism for NumPy’s high level array functions: https://numpy.org/neps/nep-0018-array-function-protocol.html
.. _[3]:
[3] NEP 22 — Duck typing for NumPy arrays – high level overview: https://numpy.org/neps/nep-0022-ndarray-duck-typing-overview.html
.. _[4]:
[4] NEP 13 — A Mechanism for Overriding Ufuncs: https://numpy.org/neps/nep-0013-ufunc-overrides.html
.. _[5]:
[5] Reply to Adding to the non-dispatched implementation of NumPy methods: http://numpy-discussion.10968.n7.nabble.com/Adding-to-the-non-dispatched-imp...
.. _[6]:
[6] Custom Dtype/Units discussion: http://numpy-discussion.10968.n7.nabble.com/Custom-Dtype-Units-discussion-td...
.. _[7]:
[7] The epic dtype cleanup plan: https://github.com/numpy/numpy/issues/2899
.. _[8]:
[8] unumpy: NumPy, but implementation-independent: https://unumpy.readthedocs.io
.. _[9]:
[9] NEP 30 — Duck Typing for NumPy Arrays - Implementation: https://www.numpy.org/neps/nep-0030-duck-array-protocol.html
.. _[10]:
[10] http://scipy.github.io/devdocs/fft.html#backend-control
Copyright ---------
This document has been placed in the public domain.
Best regards, Hameer Abbasi
[1] https://github.com/numpy/numpy/pull/14389 [2] https://numpy.org/neps/nep-0022-ndarray-duck-typing-overview.html
On Mon, Sep 2, 2019 at 2:15 AM Hameer Abbasi einstein.edison@gmail.com wrote:
Me, Ralf Gommers and Peter Bell (both cc’d) have come up with a proposal on how to solve the array creation and duck array problems. The solution is outlined in NEP-31, currently in the form of a PR, [1]
Thanks for putting this together! It'd be great to have more engagement between uarray and numpy.
============================================================
NEP 31 — Context-local and global overrides of the NumPy API
============================================================
Now that I've read this over, my main feedback is that right now it seems too vague and high-level to give it a fair evaluation? The idea of a NEP is to lay out a problem and proposed solution in enough detail that it can be evaluated and critiqued, but this felt to me more like it was pointing at some other documents for all the details and then promising that uarray has solutions for all our problems.
This NEP takes a more holistic approach: It assumes that there are parts of the API that need to be overridable, and that these will grow over time. It provides a general framework and a mechanism to avoid a design of a new protocol each time this is required.
The idea of a holistic approach makes me nervous, because I'm not sure we have holistic problems. Sometimes a holistic approach is the right thing; other times it means sweeping the actual problems under the rug, so things *look* simple and clean but in fact nothing has been solved, and they just end up biting us later. And from the NEP as currently written, I can't tell whether this is the good kind of holistic or the bad kind of holistic.
Now I'm writing vague handwavey things, so let me follow my own advice and make it more concrete with an example :-).
When Stephan and I were writing NEP 22, the single thing we spent the most time discussing was the problem of duck-array coercion, and in particular what to do about existing code that does np.asarray(duck_array_obj).
The reason this is challenging is that there's a lot of code written in Cython/C/C++ that calls np.asarray, and then blindly casts the return value to a PyArray struct and starts accessing the raw memory fields. If np.asarray starts returning anything besides a real-actual np.ndarray object, then this code will start corrupting random memory, leading to a segfault at best.
Stephan felt strongly that this meant that existing np.asarray calls *must not* ever return anything besides an np.ndarray object, and therefore we needed to add a new function np.asduckarray(), or maybe an explicit opt-in flag like np.asarray(..., allow_duck_array=True).
I agreed that this was a problem, but thought we might be able to get away with an "opt-out" system, where we add an allow_duck_array= flag, but make it *default* to True, and document that the Cython/C/C++ users who want to work with a raw np.ndarray object should modify their code to explicitly call np.asarray(obj, allow_duck_array=False). This would mean that for a while people who tried to pass duck-arrays into legacy library would get segfaults, but there would be a clear path for fixing these issues as they were discovered.
Either way, there are also some other details to figure out: how does this affect the C version of asarray? What about np.asfortranarray – probably that should default to allow_duck_array=False, even if we did make np.asarray default to allow_duck_array=True, right?
Now if I understand right, your proposal would be to make it so any code in any package could arbitrarily change the behavior of np.asarray for all inputs, e.g. I could just decide that np.asarray([1, 2, 3]) should return some arbitrary non-np.ndarray object. It seems like this has a much greater potential for breaking existing Cython/C/C++ code, and the NEP doesn't currently describe why this extra power is useful, and it doesn't currently describe how it plans to mitigate the downsides. (For example, if a caller needs a real np.ndarray, then is there some way to explicitly request one? The NEP doesn't say.) Maybe this is all fine and there are solutions to these issues, but any proposal to address duck array coercion needs to at least talk about these issues!
And that's just one example... array coercion is a particularly central and tricky problem, but the numpy API big, and there are probably other problems like this. For another example, I don't understand what the NEP is proposing to do about dtypes at all.
That's why I think the NEP needs to be fleshed out a lot more before it will be possible to evaluate fairly.
-n
On Mon, Sep 2, 2019 at 2:09 PM Nathaniel Smith njs@pobox.com wrote:
On Mon, Sep 2, 2019 at 2:15 AM Hameer Abbasi einstein.edison@gmail.com wrote:
Me, Ralf Gommers and Peter Bell (both cc’d) have come up with a proposal
on how to solve the array creation and duck array problems. The solution is outlined in NEP-31, currently in the form of a PR, [1]
Thanks for putting this together! It'd be great to have more engagement between uarray and numpy.
============================================================
NEP 31 — Context-local and global overrides of the NumPy API
============================================================
Now that I've read this over, my main feedback is that right now it seems too vague and high-level to give it a fair evaluation? The idea of a NEP is to lay out a problem and proposed solution in enough detail that it can be evaluated and critiqued, but this felt to me more like it was pointing at some other documents for all the details and then promising that uarray has solutions for all our problems.
This is fair enough I think. We'll need to put some more thought in where to refer to other NEPs, and where to be more concrete.
This NEP takes a more holistic approach: It assumes that there are parts
of the API that need to be
overridable, and that these will grow over time. It provides a general
framework and a mechanism to
avoid a design of a new protocol each time this is required.
The idea of a holistic approach makes me nervous, because I'm not sure we have holistic problems. Sometimes a holistic approach is the right thing; other times it means sweeping the actual problems under the rug, so things *look* simple and clean but in fact nothing has been solved, and they just end up biting us later. And from the NEP as currently written, I can't tell whether this is the good kind of holistic or the bad kind of holistic.
Now I'm writing vague handwavey things, so let me follow my own advice and make it more concrete with an example :-).
When Stephan and I were writing NEP 22, the single thing we spent the most time discussing was the problem of duck-array coercion, and in particular what to do about existing code that does np.asarray(duck_array_obj).
The reason this is challenging is that there's a lot of code written in Cython/C/C++ that calls np.asarray,
Cython code only perhaps? It would surprise me if there's a lot of C/C++ code that explicitly calls into our Python rather than C API.
and then blindly casts the
return value to a PyArray struct and starts accessing the raw memory fields. If np.asarray starts returning anything besides a real-actual np.ndarray object, then this code will start corrupting random memory, leading to a segfault at best.
Stephan felt strongly that this meant that existing np.asarray calls *must not* ever return anything besides an np.ndarray object, and therefore we needed to add a new function np.asduckarray(), or maybe an explicit opt-in flag like np.asarray(..., allow_duck_array=True).
I agreed that this was a problem, but thought we might be able to get away with an "opt-out" system, where we add an allow_duck_array= flag, but make it *default* to True, and document that the Cython/C/C++ users who want to work with a raw np.ndarray object should modify their code to explicitly call np.asarray(obj, allow_duck_array=False). This would mean that for a while people who tried to pass duck-arrays into legacy library would get segfaults, but there would be a clear path for fixing these issues as they were discovered.
Either way, there are also some other details to figure out: how does this affect the C version of asarray? What about np.asfortranarray – probably that should default to allow_duck_array=False, even if we did make np.asarray default to allow_duck_array=True, right?
Now if I understand right, your proposal would be to make it so any code in any package could arbitrarily change the behavior of np.asarray for all inputs, e.g. I could just decide that np.asarray([1, 2, 3]) should return some arbitrary non-np.ndarray object.
No, definitely not! It's all opt-in, by explicitly importing from `numpy.overridable` or `unumpy`. No behavior of anything in the existing numpy namespaces should be affected in any way.
I agree with the concerns below, hence it should stay opt-in.
Cheers, Ralf
It seems like this has a much greater potential for breaking
existing Cython/C/C++ code, and the NEP doesn't currently describe why this extra power is useful, and it doesn't currently describe how it plans to mitigate the downsides. (For example, if a caller needs a real np.ndarray, then is there some way to explicitly request one? The NEP doesn't say.) Maybe this is all fine and there are solutions to these issues, but any proposal to address duck array coercion needs to at least talk about these issues!
And that's just one example... array coercion is a particularly central and tricky problem, but the numpy API big, and there are probably other problems like this. For another example, I don't understand what the NEP is proposing to do about dtypes at all.
That's why I think the NEP needs to be fleshed out a lot more before it will be possible to evaluate fairly.
-n
-- Nathaniel J. Smith -- https://vorpus.org _______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@python.org https://mail.python.org/mailman/listinfo/numpy-discussion
On Mon, Sep 2, 2019 at 11:21 PM Ralf Gommers ralf.gommers@gmail.com wrote:
On Mon, Sep 2, 2019 at 2:09 PM Nathaniel Smith njs@pobox.com wrote:
The reason this is challenging is that there's a lot of code written in Cython/C/C++ that calls np.asarray,
Cython code only perhaps? It would surprise me if there's a lot of C/C++ code that explicitly calls into our Python rather than C API.
I think there's also code written as Python-wrappers-around-C-code where the Python layer handles the error-checking/coercion, and the C code trusts it to have done so.
Now if I understand right, your proposal would be to make it so any code in any package could arbitrarily change the behavior of np.asarray for all inputs, e.g. I could just decide that np.asarray([1, 2, 3]) should return some arbitrary non-np.ndarray object.
No, definitely not! It's all opt-in, by explicitly importing from `numpy.overridable` or `unumpy`. No behavior of anything in the existing numpy namespaces should be affected in any way.
Ah, whoops, I definitely missed that :-). That does change things!
So one of the major decision points for any duck-array API work, is whether to modify the numpy semantics "in place", so user code automatically gets access to the new semantics, or else to make a new namespace, that users have to switch over to manually.
The major disadvantage of doing changes "in place" is, of course, that we have to do all this careful work to move incrementally and make sure that we don't break things. The major (potential) advantage is that we have a much better chance of moving the ecosystem with us.
The major advantage of making a new namespace is that it's *much* easier to experiment, because there's no chance of breaking any projects that didn't opt in. The major disadvantage is that numpy is super strongly entrenched, and convincing every project to switch to something else is incredibly difficult and costly. (I just searched github for "import numpy" and got 17.7 million hits. That's a lot of imports to update!) Also, empirically, we've seen multiple projects try to do this (e.g. DyND), and so far they all failed.
It sounds like unumpy is an interesting approach that hasn't been tried before – in particular, the promise that you can "just switch your imports" is a much easier transition than e.g. DyND offered. Of course, that promise is somewhat undermined by the reality that all these potential backend libraries *aren't* 100% compatible with numpy, and can't be... it might turn out that this ends up like asanyarray, where you can't really use it reliably because the thing that comes out will generally support *most* of the normal ndarray semantics, but you don't know which part. Is scipy planning to switch to using this everywhere, including in C code? If not, then how do you expect projects like matplotlib to switch, given that matplotlib likes to pass array objects into scipy functions? Are you planning to take the opportunity to clean up some of the obscure corners of the numpy API?
But those are general questions about unumpy, and I'm guessing no-one knows all the answers yet... and these question actually aren't super relevant to the NEP. The NEP isn't inventing unumpy. IIUC, the main thing the NEP is proposes is simply to make "numpy.overridable" an alias for "unumpy".
It's not clear to me what problem this alias is solving. If all downstream users have to update their imports anyway, then they can write "import unumpy as np" just as easily as they can write "import numpy.overridable as np". I guess the main reason this is a NEP is because the unumpy project is hoping to get an "official stamp of approval" from numpy? But even that could be accomplished by just putting something in the docs. And adding the alias has substantial risks: it makes unumpy tied to the numpy release cycle and compatibility rules, and it means that we're committing to maintaining unumpy ~forever even if Hameer or Quansight move onto other things. That seems like a lot to take on for such vague benefits?
On Tue, Sep 3, 2019 at 2:04 AM Hameer Abbasi einstein.edison@gmail.com wrote:
The fact that we're having to design more and more protocols for a lot of very similar things is, to me, an indicator that we do have holistic problems that ought to be solved by a single protocol.
But the reason we've had trouble designing these protocols is that they're each different :-). If it was just a matter of copying __array_ufunc__ we'd have been done in a few minutes...
-n
-- Nathaniel J. Smith -- https://vorpus.org
That's a lot of very good questions! Let me see if I can answer them one-by-one.
On 06.09.19 09:49, Nathaniel Smith wrote:
Ah, whoops, I definitely missed that :-). That does change things! So one of the major decision points for any duck-array API work, is whether to modify the numpy semantics "in place", so user code automatically gets access to the new semantics, or else to make a new namespace, that users have to switch over to manually.
The major disadvantage of doing changes "in place" is, of course, that we have to do all this careful work to move incrementally and make sure that we don't break things. The major (potential) advantage is that we have a much better chance of moving the ecosystem with us.
The major advantage of making a new namespace is that it's *much* easier to experiment, because there's no chance of breaking any projects that didn't opt in. The major disadvantage is that numpy is super strongly entrenched, and convincing every project to switch to something else is incredibly difficult and costly. (I just searched github for "import numpy" and got 17.7 million hits. That's a lot of imports to update!) Also, empirically, we've seen multiple projects try to do this (e.g. DyND), and so far they all failed.
It sounds like unumpy is an interesting approach that hasn't been tried before – in particular, the promise that you can "just switch your imports" is a much easier transition than e.g. DyND offered. Of course, that promise is somewhat undermined by the reality that all these potential backend libraries *aren't* 100% compatible with numpy, and can't be...
This is true, however, with minor adjustments it should be possible to make your code work across backends, if you don't use a few obscure parts of NumPy.
it might turn out that this ends up like asanyarray, where you can't really use it reliably because the thing that comes out will generally support *most* of the normal ndarray semantics, but you don't know which part. Is scipy planning to switch to using this everywhere, including in C code?
Not at present I think, however, it should be possible to "re-write" parts of scipy on top of unumpy in order to make that work, and where speed is required and an efficient implementation isn't available in terms of NumPy functions, make dispatchable multimethods and allow library authors to provide the said implementations. We'll call this project uscipy, but that's an endgame at this point. Right now, we're focusing on unumpy.
If not, then how do you expect projects like matplotlib to switch, given that matplotlib likes to pass array objects into scipy functions? Are you planning to take the opportunity to clean up some of the obscure corners of the numpy API?
That's a completely different thing, and to answer that question requires a distinction between uarray and unumpy... uarray is a backend-mechanism, independent of array computing. We hope that matplotlib will adopt it to switch around it's GUI back-ends for example.
But those are general questions about unumpy, and I'm guessing no-one knows all the answers yet... and these question actually aren't super relevant to the NEP. The NEP isn't inventing unumpy. IIUC, the main thing the NEP is proposes is simply to make "numpy.overridable" an alias for "unumpy".
It's not clear to me what problem this alias is solving. If all downstream users have to update their imports anyway, then they can write "import unumpy as np" just as easily as they can write "import numpy.overridable as np". I guess the main reason this is a NEP is because the unumpy project is hoping to get an "official stamp of approval" from numpy?
That's part of it. The concrete problems it's solving are threefold:
1. Array creation functions can be overridden. 2. Array coercion is now covered. 3. "Default implementations" will allow you to re-write your NumPy array more easily, when such efficient implementations exist in terms of other NumPy functions. That will also help achieve similar semantics, but as I said, they're just "default"...
The import numpy.overridable part is meant to help garner adoption, and to prefer the unumpy module if it is available (which will continue to be developed separately). That way it isn't so tightly coupled to the release cycle. One alternative Sebastian Berg mentioned (and I am on board with) is just moving unumpy into the NumPy organisation. What we fear keeping it separate is that the simple act of a pip install unumpy will keep people from using it or trying it out.
But even that could be accomplished by just putting something in the docs. And adding the alias has substantial risks: it makes unumpy tied to the numpy release cycle and compatibility rules, and it means that we're committing to maintaining unumpy ~forever even if Hameer or Quansight move onto other things. That seems like a lot to take on for such vague benefits?
I can assure you Travis has had the goal of "replatforming SciPy" from as far back as I met him, he's spawned quite a few efforts in that direction along with others from Quansight (and they've led to nice projects). Quansight, as I see it, is unlikely to abandon something like this if it becomes successful (and acceptance of this NEP will be a huge success story).
On Tue, Sep 3, 2019 at 2:04 AM Hameer Abbasieinstein.edison@gmail.com wrote:
The fact that we're having to design more and more protocols for a lot of very similar things is, to me, an indicator that we do have holistic problems that ought to be solved by a single protocol.
But the reason we've had trouble designing these protocols is that they're each different :-). If it was just a matter of copying __array_ufunc__ we'd have been done in a few minutes...
uarray borrows heavily from __array_function__. It allows substituting (for example) __array_ufunc__ by overriding ufunc.__call__, ufunc.reduce and so on. It takes, as I mentioned, a holistic approach: There are callables that need to be overriden, possibly with nothing to dispatch on. And then it builds on top of that, adding coercion/conversion.
-n
-- Nathaniel J. Smith --https://vorpus.org _______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@python.org https://mail.python.org/mailman/listinfo/numpy-discussion
On Fri, Sep 6, 2019 at 1:32 AM Hameer Abbasi einstein.edison@gmail.com wrote:
That's a lot of very good questions! Let me see if I can answer them one-by-one.
On 06.09.19 09:49, Nathaniel Smith wrote:
But even that could be accomplished by just putting something in the docs. And adding the alias has substantial risks: it makes unumpy tied to the numpy release cycle and compatibility rules, and it means that we're committing to maintaining unumpy ~forever even if Hameer or Quansight move onto other things. That seems like a lot to take on for such vague benefits?
I can assure you Travis has had the goal of "replatforming SciPy" from as far back as I met him, he's spawned quite a few efforts in that direction along with others from Quansight (and they've led to nice projects). Quansight, as I see it, is unlikely to abandon something like this if it becomes successful (and acceptance of this NEP will be a huge success story).
Let me address this separately, since it's not really a technical concern.
First, this is not what we say for other contributions. E.g. we didn't say no to Pocketfft because Martin Reineck may move on, or __array_function__ because Stephan may get other interests at some point, or a whole new numpy.random, etc.
Second, this is not about Quansight. At Quansight Labs we've been able to create time for Hameer to build this, and me and others to contribute - which is very nice, but the two are not tied inextricably together. In the end it's still individuals submitting this NEP. I have been a NumPy dev for ~10 years before joining Quansight, and my future NumPy contributions are not dependent on staying at Quansight (not that I plan to go anywhere!). I'm guessing the same is true for others.
Third, unumpy is a fairly thin layer over uarray, which already has another user in SciPy.
Cheers, Ralf
On Fri, Sep 6, 2019 at 11:44 AM Ralf Gommers ralf.gommers@gmail.com wrote:
On Fri, Sep 6, 2019 at 1:32 AM Hameer Abbasi einstein.edison@gmail.com wrote:
That's a lot of very good questions! Let me see if I can answer them one-by-one.
On 06.09.19 09:49, Nathaniel Smith wrote:
But even that could be accomplished by just putting something in the docs. And adding the alias has substantial risks: it makes unumpy tied to the numpy release cycle and compatibility rules, and it means that we're committing to maintaining unumpy ~forever even if Hameer or Quansight move onto other things. That seems like a lot to take on for such vague benefits?
I can assure you Travis has had the goal of "replatforming SciPy" from as far back as I met him, he's spawned quite a few efforts in that direction along with others from Quansight (and they've led to nice projects). Quansight, as I see it, is unlikely to abandon something like this if it becomes successful (and acceptance of this NEP will be a huge success story).
Let me address this separately, since it's not really a technical concern.
First, this is not what we say for other contributions. E.g. we didn't say no to Pocketfft because Martin Reineck may move on, or __array_function__ because Stephan may get other interests at some point, or a whole new numpy.random, etc.
Second, this is not about Quansight. At Quansight Labs we've been able to create time for Hameer to build this, and me and others to contribute - which is very nice, but the two are not tied inextricably together. In the end it's still individuals submitting this NEP. I have been a NumPy dev for ~10 years before joining Quansight, and my future NumPy contributions are not dependent on staying at Quansight (not that I plan to go anywhere!). I'm guessing the same is true for others.
Third, unumpy is a fairly thin layer over uarray, which already has another user in SciPy.
I'm sorry if that came across as some kind snipe at Quansight specifically. I didn't mean it that way. It's a much more general concern: software projects are inherently risky, and often fail; companies and research labs change focus and funding shifts around. This is just a general risk that we need to take that into account when making decisions. And when there are proposals to add new submodules to numpy, we always put them under intense scrutiny, exactly because of the support commitments.
The new fft and random code are replacing/extending our existing public APIs that we already committed to, so that's a very different situation. And __array_function__ was something that couldn't work at all without being built into numpy, and even then it was controversial and merged on an experimental basis. It's always about trade-offs. My concern here is that the NEP is proposing that the numpy maintainers take on this large commitment, *and* AFAICT there's no compensating benefit to justify that: everything that can be done with numpy.overridable can be done just as well with a standalone unumpy package... right?
-n
On Fri, Sep 6, 2019 at 5:16 PM Nathaniel Smith njs@pobox.com wrote:
On Fri, Sep 6, 2019 at 11:44 AM Ralf Gommers ralf.gommers@gmail.com wrote:
On Fri, Sep 6, 2019 at 1:32 AM Hameer Abbasi einstein.edison@gmail.com
wrote:
That's a lot of very good questions! Let me see if I can answer them
one-by-one.
On 06.09.19 09:49, Nathaniel Smith wrote:
But even that could be accomplished by just putting something in the docs. And adding the alias has substantial risks: it makes unumpy tied to the numpy release cycle and compatibility rules, and it means that we're committing to maintaining unumpy ~forever even if Hameer or Quansight move onto other things. That seems like a lot to take on for such vague benefits?
I can assure you Travis has had the goal of "replatforming SciPy" from
as far back as I met him, he's spawned quite a few efforts in that direction along with others from Quansight (and they've led to nice projects). Quansight, as I see it, is unlikely to abandon something like this if it becomes successful (and acceptance of this NEP will be a huge success story).
Let me address this separately, since it's not really a technical
concern.
First, this is not what we say for other contributions. E.g. we didn't
say no to Pocketfft because Martin Reineck may move on, or __array_function__ because Stephan may get other interests at some point, or a whole new numpy.random, etc.
Second, this is not about Quansight. At Quansight Labs we've been able
to create time for Hameer to build this, and me and others to contribute - which is very nice, but the two are not tied inextricably together. In the end it's still individuals submitting this NEP. I have been a NumPy dev for ~10 years before joining Quansight, and my future NumPy contributions are not dependent on staying at Quansight (not that I plan to go anywhere!). I'm guessing the same is true for others.
Third, unumpy is a fairly thin layer over uarray, which already has
another user in SciPy.
I'm sorry if that came across as some kind snipe at Quansight specifically. I didn't mean it that way. It's a much more general concern: software projects are inherently risky, and often fail; companies and research labs change focus and funding shifts around. This is just a general risk that we need to take that into account when making decisions. And when there are proposals to add new submodules to numpy, we always put them under intense scrutiny, exactly because of the support commitments.
Yes, that's fair, and we should be critical here. All code we accept is indeed a maintenance burden.
The new fft and random code are replacing/extending our existing public APIs that we already committed to, so that's a very different situation. And __array_function__ was something that couldn't work at all without being built into numpy, and even then it was controversial and merged on an experimental basis. It's always about trade-offs. My concern here is that the NEP is proposing that the numpy maintainers take on this large commitment,
Again, not just the NumPy maintainers. There really isn't that much in `unumpy` that's all that complicated. And again, `uarray` has multiple maintainers (note that Peter is also a SciPy core dev) and has another user in SciPy.
*and* AFAICT there's no compensating
benefit to justify that: everything that can be done with numpy.overridable can be done just as well with a standalone unumpy package... right?
True, mostly. But at that point, if we say that it's the way to do array coercion, and creation (and perhaps some other things as well), we're saying at the same time that every other package that needs this (e.g. Dask, CuPy) should take unumpy as a hard dependency. Which is a much bigger ask than when it comes with NumPy. We can discuss it of course.
Major exception is if we want to make it default for some functionality, like for example numpy.fft (I'll answer your other email for that.
Cheers, Ralf
On Fri, Sep 6, 2019 at 1:32 AM Hameer Abbasi einstein.edison@gmail.com wrote:
That's a lot of very good questions! Let me see if I can answer them one-by-one.
On 06.09.19 09:49, Nathaniel Smith wrote:
But those are general questions about unumpy, and I'm guessing no-one knows all the answers yet... and these question actually aren't super relevant to the NEP. The NEP isn't inventing unumpy. IIUC, the main thing the NEP is proposes is simply to make "numpy.overridable" an alias for "unumpy".
It's not clear to me what problem this alias is solving. If all downstream users have to update their imports anyway, then they can write "import unumpy as np" just as easily as they can write "import numpy.overridable as np". I guess the main reason this is a NEP is because the unumpy project is hoping to get an "official stamp of approval" from numpy?
Also because we have NEP 30 for yet another protocol, and there's likely
another NEP to follow after that for array creation. Those use cases are covered by unumpy, so it makes sense to have a NEP for that as well, so they can be considered side-by-side.
That's part of it. The concrete problems it's solving are threefold:
- Array creation functions can be overridden.
- Array coercion is now covered.
- "Default implementations" will allow you to re-write your NumPy
array more easily, when such efficient implementations exist in terms of other NumPy functions. That will also help achieve similar semantics, but as I said, they're just "default"...
There may be another very concrete one (that's not yet in the NEP):
allowing other libraries that consume ndarrays to use overrides. An example is numpy.fft: currently both mkl_fft and pyfftw monkeypatch NumPy, something we don't like all that much (in particular for mkl_fft, because it's the default in Anaconda). `__array_function__` isn't able to help here, because it will always choose NumPy's own implementation for ndarray input. With unumpy you can support multiple libraries that consume ndarrays.
Another example is einsum: if you want to use opt_einsum for all inputs (including ndarrays), then you cannot use np.einsum. And yet another is using bottleneck (https://kwgoodman.github.io/bottleneck-doc/reference.html) for nan-functions and partition. There's likely more of these.
The point is: sometimes the array protocols are preferred (e.g. Dask/Xarray-style meta-arrays), sometimes unumpy-style dispatch works better. It's also not necessarily an either or, they can be complementary.
Actually, after writing this I just realized something. With 1.17.x we have:
``` In [1]: import dask.array as da
In [2]: d = da.from_array(np.linspace(0, 1))
In [3]: np.fft.fft(d)
Out[3]: dask.array<fft, shape=(50,), dtype=complex128, chunksize=(50,)> ```
In Anaconda `np.fft.fft` *is* `mkl_fft._numpy_fft.fft`, so this won't work. We have no bug report yet because 1.17.x hasn't landed in conda defaults yet (perhaps this is a/the reason why?), but it will be a problem.
The import numpy.overridable part is meant to help garner adoption, and to
prefer the unumpy module if it is available (which will continue to be developed separately). That way it isn't so tightly coupled to the release cycle. One alternative Sebastian Berg mentioned (and I am on board with) is just moving unumpy into the NumPy organisation. What we fear keeping it separate is that the simple act of a pip install unumpy will keep people from using it or trying it out.
Note that this is not the most critical aspect. I pushed for vendoring as numpy.overridable because I want to not derail the comparison with NEP 30 et al. with a "should we add a dependency" discussion. The interesting part to decide on first is: do we need the unumpy override mechanism? Vendoring opt-in vs. making it default vs. adding a dependency is of secondary interest right now.
Cheers, Ralf
On Fri, Sep 6, 2019 at 2:45 PM Ralf Gommers ralf.gommers@gmail.com wrote:
There may be another very concrete one (that's not yet in the NEP): allowing other libraries that consume ndarrays to use overrides. An example is numpy.fft: currently both mkl_fft and pyfftw monkeypatch NumPy, something we don't like all that much (in particular for mkl_fft, because it's the default in Anaconda). `__array_function__` isn't able to help here, because it will always choose NumPy's own implementation for ndarray input. With unumpy you can support multiple libraries that consume ndarrays.
unumpy doesn't help with this either though, does it? unumpy is double-opt-in: the code using np.fft has to switch to using unumpy.fft instead, and then someone has to enable the backend. But MKL/pyfftw started out as opt-in – you could `import mkl_fft` or `import pyfftw` – and the whole reason they switched to monkeypatching is that they decided that opt-in wasn't good enough for them.
The import numpy.overridable part is meant to help garner adoption, and to prefer the unumpy module if it is available (which will continue to be developed separately). That way it isn't so tightly coupled to the release cycle. One alternative Sebastian Berg mentioned (and I am on board with) is just moving unumpy into the NumPy organisation. What we fear keeping it separate is that the simple act of a pip install unumpy will keep people from using it or trying it out.
Note that this is not the most critical aspect. I pushed for vendoring as numpy.overridable because I want to not derail the comparison with NEP 30 et al. with a "should we add a dependency" discussion. The interesting part to decide on first is: do we need the unumpy override mechanism? Vendoring opt-in vs. making it default vs. adding a dependency is of secondary interest right now.
Wait, but I thought the only reason we would have a dependency is if we're exporting it as part of the numpy namespace. If we keep the import as `import unumpy`, then it works just as well, without any dependency *or* vendoring in numpy, right?
-n
On Fri, Sep 6, 2019 at 4:51 PM Nathaniel Smith njs@pobox.com wrote:
On Fri, Sep 6, 2019 at 2:45 PM Ralf Gommers ralf.gommers@gmail.com wrote:
There may be another very concrete one (that's not yet in the NEP):
allowing other libraries that consume ndarrays to use overrides. An example is numpy.fft: currently both mkl_fft and pyfftw monkeypatch NumPy, something we don't like all that much (in particular for mkl_fft, because it's the default in Anaconda). `__array_function__` isn't able to help here, because it will always choose NumPy's own implementation for ndarray input. With unumpy you can support multiple libraries that consume ndarrays.
unumpy doesn't help with this either though, does it? unumpy is double-opt-in: the code using np.fft has to switch to using unumpy.fft instead, and then someone has to enable the backend.
Very good point. It would make a lot of sense to at least make unumpy default on fft/linalg/random, even if we want to keep it opt-in for the functions in the main namespace.
But MKL/pyfftw
started out as opt-in – you could `import mkl_fft` or `import pyfftw` – and the whole reason they switched to monkeypatching is that they decided that opt-in wasn't good enough for them.
No, that's not correct. The MKL team has asked for a proper backend system, so they can plug into numpy rather than monkeypatch it. Oleksey, Chuck and I discussed that two years ago already at the NumFOCUS Summit 2017.
This has been explicitly on the NumPy roadmap for quite a while: "A backend system for numpy.fft (so that e.g. fft-mkl doesn’t need to monkeypatch numpy)" (see https://numpy.org/neps/roadmap.html#other-functionality)
And if Anaconda would like to default to it, that's possible - because one registered backend needs to be chosen as the default, that could be mkl-fft. That is still a major improvement over the situation today.
The import numpy.overridable part is meant to help garner adoption, and
to prefer the unumpy module if it is available (which will continue to be developed separately). That way it isn't so tightly coupled to the release cycle. One alternative Sebastian Berg mentioned (and I am on board with) is just moving unumpy into the NumPy organisation. What we fear keeping it separate is that the simple act of a pip install unumpy will keep people from using it or trying it out.
Note that this is not the most critical aspect. I pushed for vendoring
as numpy.overridable because I want to not derail the comparison with NEP 30 et al. with a "should we add a dependency" discussion. The interesting part to decide on first is: do we need the unumpy override mechanism? Vendoring opt-in vs. making it default vs. adding a dependency is of secondary interest right now.
Wait, but I thought the only reason we would have a dependency is if we're exporting it as part of the numpy namespace. If we keep the import as `import unumpy`, then it works just as well, without any dependency *or* vendoring in numpy, right?
Vendoring means "include the code". So no dependency on an external package. If we don't vendor, it's going to be either unused, or end up as a dependency for the whole SciPy/PyData stack.
Actually, now that we've discussed the fft issue, I'd suggest to change the NEP to: vendor, and make default for fft, random, and linalg.
Cheers, Ralf
On Fri, Sep 6, 2019 at 11:04 PM Ralf Gommers ralf.gommers@gmail.com wrote:
Vendoring means "include the code". So no dependency on an external package. If we don't vendor, it's going to be either unused, or end up as a dependency for the whole SciPy/PyData stack.
If we vendor it then it also ends up as a dependency for the whole SciPy/PyData stack...
Actually, now that we've discussed the fft issue, I'd suggest to change the NEP to: vendor, and make default for fft, random, and linalg.
There's no way we can have an effective discussion of duck arrays, fft backends, random backends, and linalg backends all at once in a single thread.
Can you write separate NEPs for each of these? Some questions I'd like to see addressed:
For fft: - fft is an entirely self-contained operation, with no interactions with the rest of the system; the only difference between implementations is speed. What problems are caused by monkeypatching, and how is uarray materially different from monkeypatching?
For random: - I thought the new random implementation with pluggable generators etc. was supposed to solve this problem already. Why doesn't it? - The biggest issue with MKL monkeypatching random is that it breaks stream stability. How does the uarray approach address this?
For linalg: - linalg already support __array_ufunc__ for overrides. Why do we need a second override system? Isn't that redundant?
-n
On Sat, Sep 7, 2019 at 4:16 PM Nathaniel Smith njs@pobox.com wrote:
On Fri, Sep 6, 2019 at 11:04 PM Ralf Gommers ralf.gommers@gmail.com wrote:
Vendoring means "include the code". So no dependency on an external
package. If we don't vendor, it's going to be either unused, or end up as a dependency for the whole SciPy/PyData stack.
If we vendor it then it also ends up as a dependency for the whole SciPy/PyData stack...
It seems you're just using an unusual definition here. Dependency == a package you have to install, is present in pyproject.toml/install_requires, shows up in https://github.com/numpy/numpy/network/dependencies, etc.
Actually, now that we've discussed the fft issue, I'd suggest to change
the NEP to: vendor, and make default for fft, random, and linalg.
There's no way we can have an effective discussion of duck arrays, fft backends, random backends, and linalg backends all at once in a single thread.
Can you write separate NEPs for each of these? Some questions I'd like to see addressed:
For fft:
- fft is an entirely self-contained operation, with no interactions
with the rest of the system; the only difference between implementations is speed. What problems are caused by monkeypatching,
It was already explained in this thread, it's been on our roadmap for ~2 years at least, and monkeypatching is pretty much universally understood to be bad. If that's not enough, please search the NumPy issues for "monkeypatching". You'll find issues like https://github.com/numpy/numpy/issues/12374#issuecomment-438725645. At the moment this is very confusing, and hard to diagnose - you have to install a whole new NumPy and then find that the problem is gone (or not). Being able to switch backends in one line of code and re-test would be very valuable.
It seems perhaps more useful to have a call so we can communicate with higher bandwidth, rather than lots of writing new NEPs here? In preparation, we need to write up in more detail how __array_function__ and unumpy fit together, rather than treat different pieces all separately (because the problems and pros/cons really won't change much between functions and submodules). I'll defer answering your other questions till that's done, so the discussion is hopefully a bit more structured.
Cheers, Ralf
and how is uarray materially different from monkeypatching?
For random:
- I thought the new random implementation with pluggable generators
etc. was supposed to solve this problem already. Why doesn't it?
- The biggest issue with MKL monkeypatching random is that it breaks
stream stability. How does the uarray approach address this?
For linalg:
- linalg already support __array_ufunc__ for overrides. Why do we need
a second override system? Isn't that redundant?
-n
-- Nathaniel J. Smith -- https://vorpus.org _______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@python.org https://mail.python.org/mailman/listinfo/numpy-discussion
On Sat, Sep 7, 2019 at 5:08 PM Ralf Gommers ralf.gommers@gmail.com wrote:
On Sat, Sep 7, 2019 at 4:16 PM Nathaniel Smith njs@pobox.com wrote:
On Fri, Sep 6, 2019 at 11:04 PM Ralf Gommers ralf.gommers@gmail.com wrote:
Vendoring means "include the code". So no dependency on an external package. If we don't vendor, it's going to be either unused, or end up as a dependency for the whole SciPy/PyData stack.
If we vendor it then it also ends up as a dependency for the whole SciPy/PyData stack...
It seems you're just using an unusual definition here. Dependency == a package you have to install, is present in pyproject.toml/install_requires, shows up in https://github.com/numpy/numpy/network/dependencies, etc.
That's a pretty trivial definition though. Surely the complexity of the installed code and its maintainer structure is what matters, not the exact details of how the install happens.
Actually, now that we've discussed the fft issue, I'd suggest to change the NEP to: vendor, and make default for fft, random, and linalg.
There's no way we can have an effective discussion of duck arrays, fft backends, random backends, and linalg backends all at once in a single thread.
Can you write separate NEPs for each of these? Some questions I'd like to see addressed:
For fft:
- fft is an entirely self-contained operation, with no interactions
with the rest of the system; the only difference between implementations is speed. What problems are caused by monkeypatching,
It was already explained in this thread, it's been on our roadmap for ~2 years at least, and monkeypatching is pretty much universally understood to be bad. If that's not enough, please search the NumPy issues for "monkeypatching". You'll find issues like https://github.com/numpy/numpy/issues/12374#issuecomment-438725645. At the moment this is very confusing, and hard to diagnose - you have to install a whole new NumPy and then find that the problem is gone (or not). Being able to switch backends in one line of code and re-test would be very valuable.
Sure, it's not meant a trick question, I'm just saying you should write down the reasons and how you solve them in one place. Maybe some of the reasons monkeypatching is bad don't apply here, or maybe some of them do apply, but uarray doesn't solve them – we can't tell without doing the work. The link you gave doesn't involve monkeypatching or np.fft, so I'm not sure how it's relevant...?
It seems perhaps more useful to have a call so we can communicate with higher bandwidth, rather than lots of writing new NEPs here? In preparation, we need to write up in more detail how __array_function__ and unumpy fit together, rather than treat different pieces all separately (because the problems and pros/cons really won't change much between functions and submodules). I'll defer answering your other questions till that's done, so the discussion is hopefully a bit more structured.
I don't have a lot of time for calls, and you'd still have to write it up for everyone who isn't on the call...
-n
-- Nathaniel J. Smith -- https://vorpus.org
There may be another very concrete one (that's not yet in the NEP): allowing other libraries that consume ndarrays to use overrides. An example is numpy.fft: currently both mkl_fft and pyfftw monkeypatch NumPy, something we don't like all that much (in particular for mkl_fft, because it's the default in Anaconda). `__array_function__` isn't able to help here, because it will always choose NumPy's own implementation for ndarray input. With unumpy you can support multiple libraries that consume ndarrays.
unumpy doesn't help with this either though, does it? unumpy is double-opt-in: the code using np.fft has to switch to using unumpy.fft instead, and then someone has to enable the backend. But MKL/pyfftw started out as opt-in – you could `import mkl_fft` or `import pyfftw` – and the whole reason they switched to monkeypatching is that they decided that opt-in wasn't good enough for them.
Because numpy functions are used to write many library functions, the end user isn't always able to opt-in by changing imports. So, for library functions, monkey patching is not simply convenient but actually necessary. Take for example scipy.signal.fftconvolve: SciPy can't change to pyfftw for licensing reasons so with SciPy < 1.4 your only option is to monkey patch scipy.fftpack and numpy.fft. However in SciPy >= 1.4, thanks to the uarray-based backend support in scipy.fft, I can write
from scipy import fft, signal import pyfftw.interfaces.scipy_fft as pyfftw_fft
x = np.random.randn(1024, 1024) with fft.set_backend(pyfftw_fft): y = signal.fftconvolve(x, x) # Calls pyfftw's rfft, irfft
Yes, we had to opt-in in the library function (signal moved from scipy.fftpack to scipy.fft). But because there can be distance between the set_backend call and the FFT calls, the library is now much more configurable. Generally speaking, any library written to use unumpy would be configurable: (i) by the user, (ii) at runtime, (iii) without changing library code and (iv) without monkey patching.
In scipy.fft I actually did it slightly differently than unumpy: the scipy.fft interface itself has the uarray dispatch and I set SciPy's version of pocketfft as the default global backend. This means that normal users don't need to set a backend, and thus don't need to opt-in in any way. For NumPy to follow this pattern as well would require more change to NumPy's code base than the current NEP's suggestion, mainly in separating the interface from the implementation that would become the default backend.
- Peter
On Fri, 2019-09-06 at 14:45 -0700, Ralf Gommers wrote:
<snip>
That's part of it. The concrete problems it's solving are threefold: Array creation functions can be overridden. Array coercion is now covered. "Default implementations" will allow you to re-write your NumPy array more easily, when such efficient implementations exist in terms of other NumPy functions. That will also help achieve similar semantics, but as I said, they're just "default"...
There may be another very concrete one (that's not yet in the NEP): allowing other libraries that consume ndarrays to use overrides. An example is numpy.fft: currently both mkl_fft and pyfftw monkeypatch NumPy, something we don't like all that much (in particular for mkl_fft, because it's the default in Anaconda). `__array_function__` isn't able to help here, because it will always choose NumPy's own implementation for ndarray input. With unumpy you can support multiple libraries that consume ndarrays.
Another example is einsum: if you want to use opt_einsum for all inputs (including ndarrays), then you cannot use np.einsum. And yet another is using bottleneck ( https://kwgoodman.github.io/bottleneck-doc/reference.html) for nan- functions and partition. There's likely more of these.
The point is: sometimes the array protocols are preferred (e.g. Dask/Xarray-style meta-arrays), sometimes unumpy-style dispatch works better. It's also not necessarily an either or, they can be complementary.
Let me try to move the discussion from the github issue here (this may not be the best place). (https://github.com/numpy/numpy/issues/14441 which asked for easier creation functions together with `__array_function__`).
I think an important note mentioned here is how users interact with unumpy, vs. __array_function__. The former is an explicit opt-in, while the latter is implicit choice based on an `array-like` abstract base class and functional type based dispatching.
To quote NEP 18 on this: "The downsides are that this would require an explicit opt-in from all existing code, e.g., import numpy.api as np, and in the long term would result in the maintenance of two separate NumPy APIs. Also, many functions from numpy itself are already overloaded (but inadequately), so confusion about high vs. low level APIs in NumPy would still persist." (I do think this is a point we should not just ignore, `uarray` is a thin layer, but it has a big surface area)
Now there are things where explicit opt-in is obvious. And the FFT example is one of those, there is no way to implicitly choose another backend (except by just replacing it, i.e. monkeypatching) [1]. And right now I think these are _very_ different.
Now for the end-users choosing one array-like over another, seems nicer as an implicit mechanism (why should I not mix sparse, dask and numpy arrays!?). This is the promise `__array_function__` tries to make. Unless convinced otherwise, my guess is that most library authors would strive for implicit support (i.e. sklearn, skimage, scipy).
Circling back to creation and coercion. In a purely Object type system, these would be classmethods, I guess, but in NumPy and the libraries above, we are lost.
Solution 1: Create explicit opt-in, e.g. through uarray. (NEP-31) * Required end-user opt-in. * Seems cleaner in many ways * Requires a full copy of the API.
Solution 2: Add some coercion "protocol" (NEP-30) and expose a way to create new arrays more conveniently. This would practically mean adding an `array_type=np.ndarray` argument. * _Not_ used by end-users! End users should use dask.linspace! * Adds "strange" API somewhere in numpy, and possible a new "protocol" (additionally to coercion).[2]
I still feel these solve different issues. The second one is intended to make array likes work implicitly in libraries (without end users having to do anything). While the first seems to force the end user to opt in, sometimes unnecessarily:
def my_library_func(array_like): exp = np.exp(array_like) idx = np.arange(len(exp)) return idx, exp
Would have all the information for implicit opt-in/Array-like support, but cannot do it right now. This is what I have been wondering, if uarray/unumpy, can in some way help me make this work (even _without_ the end user opting in). The reason is that simply, right now I am very clear on the need for this use case, but not sure about the need for end user opt in, since end users can just use dask.arange().
Cheers,
Sebastian
[1] To be honest, I do think a lot of the "issues" around monkeypatching exists just as much with backend choosing, the main difference seems to me that a lot of that: 1. monkeypatching was not done explicit (import mkl_fft; mkl_fft.monkeypatch_numpy())? 2. A backend system allows libaries to prefer one locally? (which I think is a big advantage)
[2] There are the options of adding `linspace_like` functions somewhere in a numpy submodule, or adding `linspace(..., array_type=np.ndarray)`, or simply inventing a new "protocl" (which is not really a protocol?), and make it `ndarray.__numpy_like_creation_functions__.arange()`.
Actually, after writing this I just realized something. With 1.17.x we have:
In [1]: import dask.array as da In [2]: d = da.from_array(np.linspace(0, 1)) In [3]: np.fft.fft(d) Out[3]: dask.array<fft, shape=(50,), dtype=complex128, chunksize=(50,)>
In Anaconda `np.fft.fft` *is* `mkl_fft._numpy_fft.fft`, so this won't work. We have no bug report yet because 1.17.x hasn't landed in conda defaults yet (perhaps this is a/the reason why?), but it will be a problem.
The import numpy.overridable part is meant to help garner adoption, and to prefer the unumpy module if it is available (which will continue to be developed separately). That way it isn't so tightly coupled to the release cycle. One alternative Sebastian Berg mentioned (and I am on board with) is just moving unumpy into the NumPy organisation. What we fear keeping it separate is that the simple act of a pip install unumpy will keep people from using it or trying it out.
Note that this is not the most critical aspect. I pushed for vendoring as numpy.overridable because I want to not derail the comparison with NEP 30 et al. with a "should we add a dependency" discussion. The interesting part to decide on first is: do we need the unumpy override mechanism? Vendoring opt-in vs. making it default vs. adding a dependency is of secondary interest right now.
Cheers, Ralf
NumPy-Discussion mailing list NumPy-Discussion@python.org https://mail.python.org/mailman/listinfo/numpy-discussion
On Sat, Sep 7, 2019 at 1:07 PM Sebastian Berg sebastian@sipsolutions.net wrote:
On Fri, 2019-09-06 at 14:45 -0700, Ralf Gommers wrote:
<snip>
That's part of it. The concrete problems it's solving are threefold: Array creation functions can be overridden. Array coercion is now covered. "Default implementations" will allow you to re-write your NumPy array more easily, when such efficient implementations exist in terms of other NumPy functions. That will also help achieve similar semantics, but as I said, they're just "default"...
There may be another very concrete one (that's not yet in the NEP): allowing other libraries that consume ndarrays to use overrides. An example is numpy.fft: currently both mkl_fft and pyfftw monkeypatch NumPy, something we don't like all that much (in particular for mkl_fft, because it's the default in Anaconda). `__array_function__` isn't able to help here, because it will always choose NumPy's own implementation for ndarray input. With unumpy you can support multiple libraries that consume ndarrays.
Another example is einsum: if you want to use opt_einsum for all inputs (including ndarrays), then you cannot use np.einsum. And yet another is using bottleneck ( https://kwgoodman.github.io/bottleneck-doc/reference.html) for nan- functions and partition. There's likely more of these.
The point is: sometimes the array protocols are preferred (e.g. Dask/Xarray-style meta-arrays), sometimes unumpy-style dispatch works better. It's also not necessarily an either or, they can be complementary.
Let me try to move the discussion from the github issue here (this may not be the best place). (https://github.com/numpy/numpy/issues/14441 which asked for easier creation functions together with `__array_function__`).
I think an important note mentioned here is how users interact with unumpy, vs. __array_function__. The former is an explicit opt-in, while the latter is implicit choice based on an `array-like` abstract base class and functional type based dispatching.
To quote NEP 18 on this: "The downsides are that this would require an explicit opt-in from all existing code, e.g., import numpy.api as np, and in the long term would result in the maintenance of two separate NumPy APIs. Also, many functions from numpy itself are already overloaded (but inadequately), so confusion about high vs. low level APIs in NumPy would still persist." (I do think this is a point we should not just ignore, `uarray` is a thin layer, but it has a big surface area)
Now there are things where explicit opt-in is obvious. And the FFT example is one of those, there is no way to implicitly choose another backend (except by just replacing it, i.e. monkeypatching) [1]. And right now I think these are _very_ different.
Now for the end-users choosing one array-like over another, seems nicer as an implicit mechanism (why should I not mix sparse, dask and numpy arrays!?). This is the promise `__array_function__` tries to make. Unless convinced otherwise, my guess is that most library authors would strive for implicit support (i.e. sklearn, skimage, scipy).
Circling back to creation and coercion. In a purely Object type system, these would be classmethods, I guess, but in NumPy and the libraries above, we are lost.
Solution 1: Create explicit opt-in, e.g. through uarray. (NEP-31)
- Required end-user opt-in.
* Seems cleaner in many ways
- Requires a full copy of the API.
bullet 1 and 3 are not required. if we decide to make it default, then there's no separate namespace
Solution 2: Add some coercion "protocol" (NEP-30) and expose a way to create new arrays more conveniently. This would practically mean adding an `array_type=np.ndarray` argument.
- _Not_ used by end-users! End users should use dask.linspace!
- Adds "strange" API somewhere in numpy, and possible a new "protocol" (additionally to coercion).[2]
I still feel these solve different issues. The second one is intended to make array likes work implicitly in libraries (without end users having to do anything). While the first seems to force the end user to opt in, sometimes unnecessarily:
def my_library_func(array_like): exp = np.exp(array_like) idx = np.arange(len(exp)) return idx, exp
Would have all the information for implicit opt-in/Array-like support, but cannot do it right now.
Can you explain this a bit more? `len(exp)` is a number, so `np.arange(number)` doesn't really have any information here.
This is what I have been wondering, if uarray/unumpy, can in some way help me make this work (even _without_ the end user opting in).
good question. if that needs to work in the absence of the user doing anything, it should be something like
with unumpy.determine_backend(exp): unumpy.arange(len(exp)) # or np.arange if we make unumpy default
to get the equivalent to `np.arange_like(len(exp), array_type=exp)`.
Note, that `determine_backend` thing doesn't exist today.
The reason is that simply, right now I am very
clear on the need for this use case, but not sure about the need for end user opt in, since end users can just use dask.arange().
I don't get the last part. The arange is inside a library function, so a user can't just go in and change things there.
Cheers, Ralf
Cheers,
Sebastian
[1] To be honest, I do think a lot of the "issues" around monkeypatching exists just as much with backend choosing, the main difference seems to me that a lot of that:
- monkeypatching was not done explicit (import mkl_fft; mkl_fft.monkeypatch_numpy())?
- A backend system allows libaries to prefer one locally? (which I think is a big advantage)
[2] There are the options of adding `linspace_like` functions somewhere in a numpy submodule, or adding `linspace(..., array_type=np.ndarray)`, or simply inventing a new "protocl" (which is not really a protocol?), and make it `ndarray.__numpy_like_creation_functions__.arange()`.
Actually, after writing this I just realized something. With 1.17.x we have:
In [1]: import dask.array as da In [2]: d = da.from_array(np.linspace(0, 1)) In [3]: np.fft.fft(d) Out[3]: dask.array<fft, shape=(50,), dtype=complex128, chunksize=(50,)>
In Anaconda `np.fft.fft` *is* `mkl_fft._numpy_fft.fft`, so this won't work. We have no bug report yet because 1.17.x hasn't landed in conda defaults yet (perhaps this is a/the reason why?), but it will be a problem.
The import numpy.overridable part is meant to help garner adoption, and to prefer the unumpy module if it is available (which will continue to be developed separately). That way it isn't so tightly coupled to the release cycle. One alternative Sebastian Berg mentioned (and I am on board with) is just moving unumpy into the NumPy organisation. What we fear keeping it separate is that the simple act of a pip install unumpy will keep people from using it or trying it out.
Note that this is not the most critical aspect. I pushed for vendoring as numpy.overridable because I want to not derail the comparison with NEP 30 et al. with a "should we add a dependency" discussion. The interesting part to decide on first is: do we need the unumpy override mechanism? Vendoring opt-in vs. making it default vs. adding a dependency is of secondary interest right now.
Cheers, Ralf
NumPy-Discussion mailing list NumPy-Discussion@python.org https://mail.python.org/mailman/listinfo/numpy-discussion
NumPy-Discussion mailing list NumPy-Discussion@python.org https://mail.python.org/mailman/listinfo/numpy-discussion
On 2019-09-07 15:33, Ralf Gommers wrote:
On Sat, Sep 7, 2019 at 1:07 PM Sebastian Berg sebastian@sipsolutions.net wrote:
On Fri, 2019-09-06 at 14:45 -0700, Ralf Gommers wrote:
<snip>
That's part of it. The concrete problems it's solving are threefold: Array creation functions can be overridden. Array coercion is now covered. "Default implementations" will allow you to re-write your NumPy array more easily, when such efficient implementations exist in terms of other NumPy functions. That will also help achieve
similar
semantics, but as I said, they're just "default"...
There may be another very concrete one (that's not yet in the
NEP):
allowing other libraries that consume ndarrays to use overrides.
An
example is numpy.fft: currently both mkl_fft and pyfftw
monkeypatch
NumPy, something we don't like all that much (in particular for mkl_fft, because it's the default in Anaconda).
`__array_function__`
isn't able to help here, because it will always choose NumPy's own implementation for ndarray input. With unumpy you can support multiple libraries that consume ndarrays.
Another example is einsum: if you want to use opt_einsum for all inputs (including ndarrays), then you cannot use np.einsum. And
yet
another is using bottleneck ( https://kwgoodman.github.io/bottleneck-doc/reference.html) for
nan-
functions and partition. There's likely more of these.
The point is: sometimes the array protocols are preferred (e.g. Dask/Xarray-style meta-arrays), sometimes unumpy-style dispatch
works
better. It's also not necessarily an either or, they can be complementary.
Let me try to move the discussion from the github issue here (this may not be the best place). (https://github.com/numpy/numpy/issues/14441 which asked for easier creation functions together with `__array_function__`).
I think an important note mentioned here is how users interact with unumpy, vs. __array_function__. The former is an explicit opt-in, while the latter is implicit choice based on an `array-like` abstract base class and functional type based dispatching.
To quote NEP 18 on this: "The downsides are that this would require an explicit opt-in from all existing code, e.g., import numpy.api as np, and in the long term would result in the maintenance of two separate NumPy APIs. Also, many functions from numpy itself are already overloaded (but inadequately), so confusion about high vs. low level APIs in NumPy would still persist." (I do think this is a point we should not just ignore, `uarray` is a thin layer, but it has a big surface area)
Now there are things where explicit opt-in is obvious. And the FFT example is one of those, there is no way to implicitly choose another backend (except by just replacing it, i.e. monkeypatching) [1]. And right now I think these are _very_ different.
Now for the end-users choosing one array-like over another, seems nicer as an implicit mechanism (why should I not mix sparse, dask and numpy arrays!?). This is the promise `__array_function__` tries to make. Unless convinced otherwise, my guess is that most library authors would strive for implicit support (i.e. sklearn, skimage, scipy).
Circling back to creation and coercion. In a purely Object type system, these would be classmethods, I guess, but in NumPy and the libraries above, we are lost.
Solution 1: Create explicit opt-in, e.g. through uarray. (NEP-31)
- Required end-user opt-in.
- Seems cleaner in many ways
- Requires a full copy of the API.
bullet 1 and 3 are not required. if we decide to make it default, then there's no separate namespace
It does require explicit opt-in to have any benefits to the user.
Solution 2: Add some coercion "protocol" (NEP-30) and expose a way to create new arrays more conveniently. This would practically mean adding an `array_type=np.ndarray` argument.
- _Not_ used by end-users! End users should use dask.linspace!
- Adds "strange" API somewhere in numpy, and possible a new
"protocol" (additionally to coercion).[2]
I still feel these solve different issues. The second one is intended to make array likes work implicitly in libraries (without end users having to do anything). While the first seems to force the end user to opt in, sometimes unnecessarily:
def my_library_func(array_like): exp = np.exp(array_like) idx = np.arange(len(exp)) return idx, exp
Would have all the information for implicit opt-in/Array-like support, but cannot do it right now.
Can you explain this a bit more? `len(exp)` is a number, so `np.arange(number)` doesn't really have any information here.
Right, but as a library author, I want a way a way to make it use the same type as `array_like` in this particular function, that is the point! The end-user already signaled they prefer say dask, due to the array that was actually passed in. (but this is just repeating what is below I think).
This is what I have been wondering, if uarray/unumpy, can in some way help me make this work (even _without_ the end user opting in).
good question. if that needs to work in the absence of the user doing anything, it should be something like
with unumpy.determine_backend(exp): unumpy.arange(len(exp)) # or np.arange if we make unumpy default
to get the equivalent to `np.arange_like(len(exp), array_type=exp)`.
Note, that `determine_backend` thing doesn't exist today.
Exactly, that is what I have been wondering about, there may be more issues around that. If it existed, we may be able to solve the implicit library usage by making libraries use unumpy (or similar). Although, at that point we half replace `__array_function__` maybe. However, the main point is that without such a functionality, NEP 30 and NEP 31 seem to solve slightly different issues with respect to how they interact with the end-user (opt in)?
We may decide that we do not want to solve the library users issue of wanting to support implicit opt-in for array like inputs because it is a rabbit hole. But we may need to discuss/argue a bit more that it really is a deep enough rabbit hole that it is not worth the trouble.
The reason is that simply, right now I am very clear on the need for this use case, but not sure about the need for end user opt in, since end users can just use dask.arange().
I don't get the last part. The arange is inside a library function, so a user can't just go in and change things there.
A "user" here means "end user". An end user writes a script, and they can easily change `arr = np.linspace(10)` to `arr = dask.linspace(10)`, or more likely just use one within one script and the other within another script, while both use the same sklearn functions. (Although using a backend switching may be nicer in some contexts)
A library provider (library user of unumpy/numpy) of course cannot just use dask conveniently, unless they write their own `guess_numpy_like_module()` function first.
Cheers,
Ralf
Cheers,
Sebastian
[1] To be honest, I do think a lot of the "issues" around monkeypatching exists just as much with backend choosing, the main difference seems to me that a lot of that:
- monkeypatching was not done explicit
(import mkl_fft; mkl_fft.monkeypatch_numpy())? 2. A backend system allows libaries to prefer one locally? (which I think is a big advantage)
[2] There are the options of adding `linspace_like` functions somewhere in a numpy submodule, or adding `linspace(..., array_type=np.ndarray)`, or simply inventing a new "protocl" (which is not really a protocol?), and make it `ndarray.__numpy_like_creation_functions__.arange()`.
Actually, after writing this I just realized something. With
1.17.x
we have:
In [1]: import dask.array as da
In [2]: d = da.from_array(np.linspace(0, 1))
In [3]: np.fft.fft(d)
Out[3]: dask.array<fft, shape=(50,), dtype=complex128, chunksize=(50,)>
In Anaconda `np.fft.fft` *is* `mkl_fft._numpy_fft.fft`, so this
won't
work. We have no bug report yet because 1.17.x hasn't landed in
conda
defaults yet (perhaps this is a/the reason why?), but it will be a problem.
The import numpy.overridable part is meant to help garner
adoption,
and to prefer the unumpy module if it is available (which will continue to be developed separately). That way it isn't so
tightly
coupled to the release cycle. One alternative Sebastian Berg mentioned (and I am on board with) is just moving unumpy into
the
NumPy organisation. What we fear keeping it separate is that the simple act of a pip install unumpy will keep people from using
it
or trying it out.
Note that this is not the most critical aspect. I pushed for vendoring as numpy.overridable because I want to not derail the comparison with NEP 30 et al. with a "should we add a dependency" discussion. The interesting part to decide on first is: do we need the unumpy override mechanism? Vendoring opt-in vs. making it
default
vs. adding a dependency is of secondary interest right now.
Cheers, Ralf
NumPy-Discussion mailing list NumPy-Discussion@python.org https://mail.python.org/mailman/listinfo/numpy-discussion
NumPy-Discussion mailing list NumPy-Discussion@python.org https://mail.python.org/mailman/listinfo/numpy-discussion
NumPy-Discussion mailing list NumPy-Discussion@python.org https://mail.python.org/mailman/listinfo/numpy-discussion
On Sat, Sep 7, 2019 at 2:18 PM sebastian sebastian@sipsolutions.net wrote:
On 2019-09-07 15:33, Ralf Gommers wrote:
On Sat, Sep 7, 2019 at 1:07 PM Sebastian Berg sebastian@sipsolutions.net wrote:
On Fri, 2019-09-06 at 14:45 -0700, Ralf Gommers wrote:
<snip>
That's part of it. The concrete problems it's solving are threefold: Array creation functions can be overridden. Array coercion is now covered. "Default implementations" will allow you to re-write your NumPy array more easily, when such efficient implementations exist in terms of other NumPy functions. That will also help achieve
similar
semantics, but as I said, they're just "default"...
There may be another very concrete one (that's not yet in the
NEP):
allowing other libraries that consume ndarrays to use overrides.
An
example is numpy.fft: currently both mkl_fft and pyfftw
monkeypatch
NumPy, something we don't like all that much (in particular for mkl_fft, because it's the default in Anaconda).
`__array_function__`
isn't able to help here, because it will always choose NumPy's own implementation for ndarray input. With unumpy you can support multiple libraries that consume ndarrays.
Another example is einsum: if you want to use opt_einsum for all inputs (including ndarrays), then you cannot use np.einsum. And
yet
another is using bottleneck ( https://kwgoodman.github.io/bottleneck-doc/reference.html) for
nan-
functions and partition. There's likely more of these.
The point is: sometimes the array protocols are preferred (e.g. Dask/Xarray-style meta-arrays), sometimes unumpy-style dispatch
works
better. It's also not necessarily an either or, they can be complementary.
Let me try to move the discussion from the github issue here (this may not be the best place). (https://github.com/numpy/numpy/issues/14441 which asked for easier creation functions together with `__array_function__`).
I think an important note mentioned here is how users interact with unumpy, vs. __array_function__. The former is an explicit opt-in, while the latter is implicit choice based on an `array-like` abstract base class and functional type based dispatching.
To quote NEP 18 on this: "The downsides are that this would require an explicit opt-in from all existing code, e.g., import numpy.api as np, and in the long term would result in the maintenance of two separate NumPy APIs. Also, many functions from numpy itself are already overloaded (but inadequately), so confusion about high vs. low level APIs in NumPy would still persist." (I do think this is a point we should not just ignore, `uarray` is a thin layer, but it has a big surface area)
Now there are things where explicit opt-in is obvious. And the FFT example is one of those, there is no way to implicitly choose another backend (except by just replacing it, i.e. monkeypatching) [1]. And right now I think these are _very_ different.
Now for the end-users choosing one array-like over another, seems nicer as an implicit mechanism (why should I not mix sparse, dask and numpy arrays!?). This is the promise `__array_function__` tries to make. Unless convinced otherwise, my guess is that most library authors would strive for implicit support (i.e. sklearn, skimage, scipy).
Circling back to creation and coercion. In a purely Object type system, these would be classmethods, I guess, but in NumPy and the libraries above, we are lost.
Solution 1: Create explicit opt-in, e.g. through uarray. (NEP-31)
- Required end-user opt-in.
- Seems cleaner in many ways
- Requires a full copy of the API.
bullet 1 and 3 are not required. if we decide to make it default, then there's no separate namespace
It does require explicit opt-in to have any benefits to the user.
Solution 2: Add some coercion "protocol" (NEP-30) and expose a way to create new arrays more conveniently. This would practically mean adding an `array_type=np.ndarray` argument.
- _Not_ used by end-users! End users should use dask.linspace!
- Adds "strange" API somewhere in numpy, and possible a new
"protocol" (additionally to coercion).[2]
I still feel these solve different issues. The second one is intended to make array likes work implicitly in libraries (without end users having to do anything). While the first seems to force the end user to opt in, sometimes unnecessarily:
def my_library_func(array_like): exp = np.exp(array_like) idx = np.arange(len(exp)) return idx, exp
Would have all the information for implicit opt-in/Array-like support, but cannot do it right now.
Can you explain this a bit more? `len(exp)` is a number, so `np.arange(number)` doesn't really have any information here.
Right, but as a library author, I want a way a way to make it use the same type as `array_like` in this particular function, that is the point! The end-user already signaled they prefer say dask, due to the array that was actually passed in. (but this is just repeating what is below I think).
Okay, you meant conceptually:)
This is what I have been wondering, if uarray/unumpy, can in some way help me make this work (even _without_ the end user opting in).
good question. if that needs to work in the absence of the user doing anything, it should be something like
with unumpy.determine_backend(exp): unumpy.arange(len(exp)) # or np.arange if we make unumpy default
to get the equivalent to `np.arange_like(len(exp), array_type=exp)`.
Note, that `determine_backend` thing doesn't exist today.
Exactly, that is what I have been wondering about, there may be more issues around that. If it existed, we may be able to solve the implicit library usage by making libraries use unumpy (or similar). Although, at that point we half replace `__array_function__` maybe.
I don't really think so. Libraries can/will still use __array_function__ for most functionality, and just add a `with determine_backend` for the places where __array_function__ doesn't work.
However, the main point is that without such a functionality, NEP 30 and NEP 31 seem to solve slightly different issues with respect to how they interact with the end-user (opt in)?
Yes, I agree with that.
Cheers, Ralf
We may decide that we do not want to solve the library users issue of wanting to support implicit opt-in for array like inputs because it is a rabbit hole. But we may need to discuss/argue a bit more that it really is a deep enough rabbit hole that it is not worth the trouble.
The reason is that simply, right now I am very clear on the need for this use case, but not sure about the need for end user opt in, since end users can just use dask.arange().
I don't get the last part. The arange is inside a library function, so a user can't just go in and change things there.
A "user" here means "end user". An end user writes a script, and they can easily change `arr = np.linspace(10)` to `arr = dask.linspace(10)`, or more likely just use one within one script and the other within another script, while both use the same sklearn functions. (Although using a backend switching may be nicer in some contexts)
A library provider (library user of unumpy/numpy) of course cannot just use dask conveniently, unless they write their own `guess_numpy_like_module()` function first.
Cheers,
Ralf
Cheers,
Sebastian
[1] To be honest, I do think a lot of the "issues" around monkeypatching exists just as much with backend choosing, the main difference seems to me that a lot of that:
- monkeypatching was not done explicit
(import mkl_fft; mkl_fft.monkeypatch_numpy())? 2. A backend system allows libaries to prefer one locally? (which I think is a big advantage)
[2] There are the options of adding `linspace_like` functions somewhere in a numpy submodule, or adding `linspace(..., array_type=np.ndarray)`, or simply inventing a new "protocl" (which is not really a protocol?), and make it `ndarray.__numpy_like_creation_functions__.arange()`.
Actually, after writing this I just realized something. With
1.17.x
we have:
In [1]: import dask.array as da
In [2]: d = da.from_array(np.linspace(0, 1))
In [3]: np.fft.fft(d)
Out[3]: dask.array<fft, shape=(50,), dtype=complex128, chunksize=(50,)>
In Anaconda `np.fft.fft` *is* `mkl_fft._numpy_fft.fft`, so this
won't
work. We have no bug report yet because 1.17.x hasn't landed in
conda
defaults yet (perhaps this is a/the reason why?), but it will be a problem.
The import numpy.overridable part is meant to help garner
adoption,
and to prefer the unumpy module if it is available (which will continue to be developed separately). That way it isn't so
tightly
coupled to the release cycle. One alternative Sebastian Berg mentioned (and I am on board with) is just moving unumpy into
the
NumPy organisation. What we fear keeping it separate is that the simple act of a pip install unumpy will keep people from using
it
or trying it out.
Note that this is not the most critical aspect. I pushed for vendoring as numpy.overridable because I want to not derail the comparison with NEP 30 et al. with a "should we add a dependency" discussion. The interesting part to decide on first is: do we need the unumpy override mechanism? Vendoring opt-in vs. making it
default
vs. adding a dependency is of secondary interest right now.
Cheers, Ralf
NumPy-Discussion mailing list NumPy-Discussion@python.org https://mail.python.org/mailman/listinfo/numpy-discussion
NumPy-Discussion mailing list NumPy-Discussion@python.org https://mail.python.org/mailman/listinfo/numpy-discussion
NumPy-Discussion mailing list NumPy-Discussion@python.org https://mail.python.org/mailman/listinfo/numpy-discussion
NumPy-Discussion mailing list NumPy-Discussion@python.org https://mail.python.org/mailman/listinfo/numpy-discussion
On 07.09.19 22:06, Sebastian Berg wrote:
On Fri, 2019-09-06 at 14:45 -0700, Ralf Gommers wrote:
<snip>
Let me try to move the discussion from the github issue here (this may not be the best place). (https://github.com/numpy/numpy/issues/14441 which asked for easier creation functions together with `__array_function__`).
I think an important note mentioned here is how users interact with unumpy, vs. __array_function__. The former is an explicit opt-in, while the latter is implicit choice based on an `array-like` abstract base class and functional type based dispatching.
To quote NEP 18 on this: "The downsides are that this would require an explicit opt-in from all existing code, e.g., import numpy.api as np, and in the long term would result in the maintenance of two separate NumPy APIs. Also, many functions from numpy itself are already overloaded (but inadequately), so confusion about high vs. low level APIs in NumPy would still persist." (I do think this is a point we should not just ignore, `uarray` is a thin layer, but it has a big surface area)
Now there are things where explicit opt-in is obvious. And the FFT example is one of those, there is no way to implicitly choose another backend (except by just replacing it, i.e. monkeypatching) [1]. And right now I think these are _very_ different.
Now for the end-users choosing one array-like over another, seems nicer as an implicit mechanism (why should I not mix sparse, dask and numpy arrays!?). This is the promise `__array_function__` tries to make. Unless convinced otherwise, my guess is that most library authors would strive for implicit support (i.e. sklearn, skimage, scipy).
You can, once you register the backend it becomes implicit, so all backends are tried until one succeeds. Unless you explicitly say "I do not want another backend" (only/coerce=True).
Circling back to creation and coercion. In a purely Object type system, these would be classmethods, I guess, but in NumPy and the libraries above, we are lost.
Solution 1: Create explicit opt-in, e.g. through uarray. (NEP-31)
- Required end-user opt-in.
- Seems cleaner in many ways
- Requires a full copy of the API.
Solution 2: Add some coercion "protocol" (NEP-30) and expose a way to create new arrays more conveniently. This would practically mean adding an `array_type=np.ndarray` argument.
- _Not_ used by end-users! End users should use dask.linspace!
- Adds "strange" API somewhere in numpy, and possible a new "protocol" (additionally to coercion).[2]
I still feel these solve different issues. The second one is intended to make array likes work implicitly in libraries (without end users having to do anything). While the first seems to force the end user to opt in, sometimes unnecessarily:
def my_library_func(array_like): exp = np.exp(array_like) idx = np.arange(len(exp)) return idx, exp
Would have all the information for implicit opt-in/Array-like support, but cannot do it right now. This is what I have been wondering, if uarray/unumpy, can in some way help me make this work (even _without_ the end user opting in). The reason is that simply, right now I am very clear on the need for this use case, but not sure about the need for end user opt in, since end users can just use dask.arange().
Sure, the end user can, but library authors cannot. And end users may want to easily port code to GPU or between back-ends, just as library authors might.
Cheers,
Sebastian
[1] To be honest, I do think a lot of the "issues" around monkeypatching exists just as much with backend choosing, the main difference seems to me that a lot of that: 1. monkeypatching was not done explicit (import mkl_fft; mkl_fft.monkeypatch_numpy())? 2. A backend system allows libaries to prefer one locally? (which I think is a big advantage)
[2] There are the options of adding `linspace_like` functions somewhere in a numpy submodule, or adding `linspace(..., array_type=np.ndarray)`, or simply inventing a new "protocl" (which is not really a protocol?), and make it `ndarray.__numpy_like_creation_functions__.arange()`.
Handling things like RandomState can get complicated here.
<snip>
On Tue, 2019-09-10 at 17:28 +0200, Hameer Abbasi wrote:
On 07.09.19 22:06, Sebastian Berg wrote:
On Fri, 2019-09-06 at 14:45 -0700, Ralf Gommers wrote:
<snip>
Let me try to move the discussion from the github issue here (this may not be the best place). ( https://github.com/numpy/numpy/issues/14441 which asked for easier creation functions together with `__array_function__`).
I think an important note mentioned here is how users interact with unumpy, vs. __array_function__. The former is an explicit opt-in, while the latter is implicit choice based on an `array-like` abstract base class and functional type based dispatching.
To quote NEP 18 on this: "The downsides are that this would require an explicit opt-in from all existing code, e.g., import numpy.api as np, and in the long term would result in the maintenance of two separate NumPy APIs. Also, many functions from numpy itself are already overloaded (but inadequately), so confusion about high vs. low level APIs in NumPy would still persist." (I do think this is a point we should not just ignore, `uarray` is a thin layer, but it has a big surface area)
Now there are things where explicit opt-in is obvious. And the FFT example is one of those, there is no way to implicitly choose another backend (except by just replacing it, i.e. monkeypatching) [1]. And right now I think these are _very_ different.
Now for the end-users choosing one array-like over another, seems nicer as an implicit mechanism (why should I not mix sparse, dask and numpy arrays!?). This is the promise `__array_function__` tries to make. Unless convinced otherwise, my guess is that most library authors would strive for implicit support (i.e. sklearn, skimage, scipy).
You can, once you register the backend it becomes implicit, so all backends are tried until one succeeds. Unless you explicitly say "I do not want another backend" (only/coerce=True).
The thing here being "once you register the backend". Thus requiring at least in some form an explicit opt-in by the end user. Also, unless you use the with statement (with all the scoping rules attached), you cannot plug the coercion/creation hole left by `__array_function__`.
Circling back to creation and coercion. In a purely Object type system, these would be classmethods, I guess, but in NumPy and the
<snip>
def my_library_func(array_like): exp = np.exp(array_like) idx = np.arange(len(exp)) return idx, exp
Would have all the information for implicit opt-in/Array-like support, but cannot do it right now. This is what I have been wondering, if uarray/unumpy, can in some way help me make this work (even _without_ the end user opting in). The reason is that simply, right now I am very clear on the need for this use case, but not sure about the need for end user opt in, since end users can just use dask.arange().
Sure, the end user can, but library authors cannot. And end users may want to easily port code to GPU or between back-ends, just as library authors might.
Yes, but library authors want to solve the particular thing above right now, and I am still not sure how uarray helps there. If it does, then only with a added complexity _and_ (at least currently) explicit end- user opt-in.
Now, I am not a particularly good judge for these things, but I have been trying to figure out how things can improve with it and still I am tempted to say that uarray is a giant step in no particular direction at all. Of course it _can_ solve everything, but right now it seems like it might require a py2 -> py3 like transition. And even then it is so powerful, that it probably comes with its own bunch of issues (such as far away side effects due to scoping of with statements).
Best,
Sebastian
Cheers,
Sebastian
[1] To be honest, I do think a lot of the "issues" around monkeypatching exists just as much with backend choosing, the main difference seems to me that a lot of that: 1. monkeypatching was not done explicit (import mkl_fft; mkl_fft.monkeypatch_numpy())? 2. A backend system allows libaries to prefer one locally? (which I think is a big advantage)
[2] There are the options of adding `linspace_like` functions somewhere in a numpy submodule, or adding `linspace(..., array_type=np.ndarray)`, or simply inventing a new "protocl" (which is not really a protocol?), and make it `ndarray.__numpy_like_creation_functions__.arange()`.
Handling things like RandomState can get complicated here.
<snip>
NumPy-Discussion mailing list NumPy-Discussion@python.org https://mail.python.org/mailman/listinfo/numpy-discussion
On Tue, Sep 10, 2019 at 10:53 AM Sebastian Berg sebastian@sipsolutions.net wrote:
On Tue, 2019-09-10 at 17:28 +0200, Hameer Abbasi wrote:
On 07.09.19 22:06, Sebastian Berg wrote:
Now for the end-users choosing one array-like over another, seems nicer as an implicit mechanism (why should I not mix sparse, dask and numpy arrays!?). This is the promise `__array_function__` tries to make. Unless convinced otherwise, my guess is that most library authors would strive for implicit support (i.e. sklearn, skimage, scipy).
You can, once you register the backend it becomes implicit, so all backends are tried until one succeeds. Unless you explicitly say "I do not want another backend" (only/coerce=True).
The thing here being "once you register the backend". Thus requiring at least in some form an explicit opt-in by the end user. Also, unless you use the with statement (with all the scoping rules attached), you cannot plug the coercion/creation hole left by `__array_function__`.
The need for this is clear I think. We're discussion on the unumpy repo whether this can be done with a minor change to how unumpy works, or by having backend auto-register somehow on import. It should be possible without mandating that the end user has to explicitly do something, but needs some thought. Stay tuned.
Cheers, Ralf
On Fri, Sep 6, 2019 at 12:53 AM Nathaniel Smith njs@pobox.com wrote:
On Mon, Sep 2, 2019 at 11:21 PM Ralf Gommers ralf.gommers@gmail.com wrote:
On Mon, Sep 2, 2019 at 2:09 PM Nathaniel Smith njs@pobox.com wrote:
On Tue, Sep 3, 2019 at 2:04 AM Hameer Abbasi einstein.edison@gmail.com wrote:
The fact that we're having to design more and more protocols for a lot of very similar things is, to me, an indicator that we do have holistic problems that ought to be solved by a single protocol.
But the reason we've had trouble designing these protocols is that they're each different :-). If it was just a matter of copying __array_ufunc__ we'd have been done in a few minutes...
I don't think that argument is correct. That we now have two very similar protocols is simply a matter of history and limited developer time. NEP 18 discusses in several places that __array_ufunc__ should be brought in line with __array_ufunc__, and that we can migrate a function from one protocol to the other. There's no technical reason other than backwards compat and dev time why we couldn't use __array_function__ for ufuncs also.
Cheers, Ralf
On Fri, Sep 6, 2019 at 11:53 AM Ralf Gommers ralf.gommers@gmail.com wrote:
On Fri, Sep 6, 2019 at 12:53 AM Nathaniel Smith njs@pobox.com wrote:
On Tue, Sep 3, 2019 at 2:04 AM Hameer Abbasi einstein.edison@gmail.com wrote:
The fact that we're having to design more and more protocols for a lot of very similar things is, to me, an indicator that we do have holistic problems that ought to be solved by a single protocol.
But the reason we've had trouble designing these protocols is that they're each different :-). If it was just a matter of copying __array_ufunc__ we'd have been done in a few minutes...
I don't think that argument is correct. That we now have two very similar protocols is simply a matter of history and limited developer time. NEP 18 discusses in several places that __array_ufunc__ should be brought in line with __array_ufunc__, and that we can migrate a function from one protocol to the other. There's no technical reason other than backwards compat and dev time why we couldn't use __array_function__ for ufuncs also.
Huh, that's interesting! Apparently we have a profoundly different understanding of what we're doing here. To me, __array_ufunc__ and __array_function__ are completely different. In fact I'd say __array_ufunc__ is a good idea and __array_function__ is a bad idea, and would definitely not be in favor of combining them together.
The key difference is that __array_ufunc__ allows for *generic* implementations. Most duck array libraries can write a single implementation of __array_ufunc__ that works for *all* ufuncs, even new third-party ufuncs that the duck array library has never heard of, because ufuncs all share the same structure of a loop wrapped around a core operation, and they can treat the core operation as a black box. For example:
- Dask can split up the operation across its tiled sub-arrays, and then for each tile it invokes the core operation. - xarray can do its label-based axis matching, and then invoke the core operation. - bcolz can loop over the array uncompressing one block at a time, invoking the core operation on each. - sparse arrays can check the ufunc .identity attribute to find out whether 0 is an identity, and if so invoke the operation directly on the non-zero entries; otherwise, it can loop over the array and densify it in blocks and invoke the core operation on each. (It would be useful to have a bit more metadata on the ufunc, so e.g. np.subtract could declare that zero is a right-identity but not a left-identity, but that's a simple enough extension to make at some point.)
Result: __array_ufunc__ makes it totally possible to take a ufunc from scipy.special or a random new on created with numba, and have it immediately work on an xarray wrapped around dask wrapped around bcolz, out-of-the-box. That's a clean, generic interface. [1]
OTOH, __array_function__ doesn't allow this kind of simplification: if we were using __array_function__ for ufuncs, every library would have to special-case every individual ufunc, which leads to dramatically more work and more potential for bugs.
To me, the whole point of interfaces is to reduce coupling. When you have N interacting modules, it's unmaintainable if every change requires considering every N! combination. From this perspective, __array_function__ isn't good, but it is still somewhat constrained: the result of each operation is still determined by the objects involved, nothing else. In this regard, uarray even more extreme than __array_function__, because arbitrary operations can be arbitrarily changed by arbitrarily distant code. It sort of feels like the argument for uarray is: well, designing maintainable interfaces is a lot of work, so forget it, let's just make it easy to monkeypatch everything and call it a day.
That said, in my replies in this thread I've been trying to stay productive and focus on narrower concrete issues. I'm pretty sure that __array_function__ and uarray will turn out to be bad ideas and will fail, but that's not a proven fact, it's just an informed guess. And the road that I favor also has lots of risks and uncertainty. So I don't have a problem with trying both as experiments and learning more! But hopefully that explains why it's not at all obvious that uarray solves the protocol design problems we've been talking about.
-n
[1] There are also some cases that __array_ufunc__ doesn't handle as nicely. One obvious one is that GPU/TPU libraries still need to special-case individual ufuncs. But that's not a limitation of __array_ufunc__, it's a limitation of GPUs – they can't run CPU code, so they can't use the CPU implementation of the core operations. Another limitation is that __array_ufunc__ is weak at handling operations that involve mixed libraries (e.g. np.add(bcolz_array, sparse_array)) – to work well, this might require that bcolz have special-case handling for sparse arrays, or vice-versa, so you still potentially have some N**2 special cases, though at least here N is the number of duck array libraries, not the number of ufuncs. I think this is an interesting target for future work. But in general, __array_ufunc__ goes a long way to taming the complexity of interacting libraries and ufuncs.
-- Nathaniel J. Smith -- https://vorpus.org
On 08.09.19 09:53, Nathaniel Smith wrote:
On Fri, Sep 6, 2019 at 11:53 AM Ralf Gommers ralf.gommers@gmail.com wrote:
On Fri, Sep 6, 2019 at 12:53 AM Nathaniel Smith njs@pobox.com wrote:
On Tue, Sep 3, 2019 at 2:04 AM Hameer Abbasi einstein.edison@gmail.com wrote:
The fact that we're having to design more and more protocols for a lot of very similar things is, to me, an indicator that we do have holistic problems that ought to be solved by a single protocol.
But the reason we've had trouble designing these protocols is that they're each different . If it was just a matter of copying __array_ufunc__ we'd have been done in a few minutes...
I don't think that argument is correct. That we now have two very similar protocols is simply a matter of history and limited developer time. NEP 18 discusses in several places that __array_ufunc__ should be brought in line with __array_ufunc__, and that we can migrate a function from one protocol to the other. There's no technical reason other than backwards compat and dev time why we couldn't use __array_function__ for ufuncs also.
Huh, that's interesting! Apparently we have a profoundly different understanding of what we're doing here. To me, __array_ufunc__ and __array_function__ are completely different. In fact I'd say __array_ufunc__ is a good idea and __array_function__ is a bad idea, and would definitely not be in favor of combining them together.
The key difference is that __array_ufunc__ allows for *generic* implementations. Most duck array libraries can write a single implementation of __array_ufunc__ that works for *all* ufuncs, even new third-party ufuncs that the duck array library has never heard of, because ufuncs all share the same structure of a loop wrapped around a core operation, and they can treat the core operation as a black box. For example:
- Dask can split up the operation across its tiled sub-arrays, and
then for each tile it invokes the core operation.
- xarray can do its label-based axis matching, and then invoke the
core operation.
- bcolz can loop over the array uncompressing one block at a time,
invoking the core operation on each.
- sparse arrays can check the ufunc .identity attribute to find out
whether 0 is an identity, and if so invoke the operation directly on the non-zero entries; otherwise, it can loop over the array and densify it in blocks and invoke the core operation on each. (It would be useful to have a bit more metadata on the ufunc, so e.g. np.subtract could declare that zero is a right-identity but not a left-identity, but that's a simple enough extension to make at some point.)
Result: __array_ufunc__ makes it totally possible to take a ufunc from scipy.special or a random new on created with numba, and have it immediately work on an xarray wrapped around dask wrapped around bcolz, out-of-the-box. That's a clean, generic interface. [1]
OTOH, __array_function__ doesn't allow this kind of simplification: if we were using __array_function__ for ufuncs, every library would have to special-case every individual ufunc, which leads to dramatically more work and more potential for bugs.
But uarray does allow this kind of simplification. You would do the following inside a uarray backend:
def __ua_function__(func, args, kwargs): with ua.skip_backend(self_backend): # Do code here, dispatches to everything but
This is possible today and is done in the dask backend inside unumpy for example.
To me, the whole point of interfaces is to reduce coupling. When you have N interacting modules, it's unmaintainable if every change requires considering every N! combination. From this perspective, __array_function__ isn't good, but it is still somewhat constrained: the result of each operation is still determined by the objects involved, nothing else. In this regard, uarray even more extreme than __array_function__, because arbitrary operations can be arbitrarily changed by arbitrarily distant code. It sort of feels like the argument for uarray is: well, designing maintainable interfaces is a lot of work, so forget it, let's just make it easy to monkeypatch everything and call it a day.
That said, in my replies in this thread I've been trying to stay productive and focus on narrower concrete issues. I'm pretty sure that __array_function__ and uarray will turn out to be bad ideas and will fail, but that's not a proven fact, it's just an informed guess. And the road that I favor also has lots of risks and uncertainty. So I don't have a problem with trying both as experiments and learning more! But hopefully that explains why it's not at all obvious that uarray solves the protocol design problems we've been talking about.
-n
[1] There are also some cases that __array_ufunc__ doesn't handle as nicely. One obvious one is that GPU/TPU libraries still need to special-case individual ufuncs. But that's not a limitation of __array_ufunc__, it's a limitation of GPUs – they can't run CPU code, so they can't use the CPU implementation of the core operations. Another limitation is that __array_ufunc__ is weak at handling operations that involve mixed libraries (e.g. np.add(bcolz_array, sparse_array)) – to work well, this might require that bcolz have special-case handling for sparse arrays, or vice-versa, so you still potentially have some N**2 special cases, though at least here N is the number of duck array libraries, not the number of ufuncs. I think this is an interesting target for future work. But in general, __array_ufunc__ goes a long way to taming the complexity of interacting libraries and ufuncs.
-- Nathaniel J. Smith -- https://vorpus.org _______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@python.org https://mail.python.org/mailman/listinfo/numpy-discussion
On Sun, Sep 8, 2019 at 1:04 AM Hameer Abbasi einstein.edison@gmail.com wrote:
On 08.09.19 09:53, Nathaniel Smith wrote:
OTOH, __array_function__ doesn't allow this kind of simplification: if we were using __array_function__ for ufuncs, every library would have to special-case every individual ufunc, which leads to dramatically more work and more potential for bugs.
But uarray does allow this kind of simplification. You would do the following inside a uarray backend:
def __ua_function__(func, args, kwargs): with ua.skip_backend(self_backend): # Do code here, dispatches to everything but
You can dispatch to the underlying operation, sure, but you can't implement a generic ufunc loop because you don't know that 'func' is actually a bound ufunc method, or have any way to access the underlying ufunc object. (E.g. consider the case where 'func' is 'np.add.reduce'.) The critical part of my example was that it's a new ufunc that none of these libraries have ever heard of before.
Ufuncs have lot of consistent structure beyond what generic Python callables have, and the whole point of __array_ufunc__ is that implementors can rely on that structure. You get to work at a higher level of abstraction.
A similar but simpler example would be the protocol we've sketched out for concatenation: the idea would be to capture the core similarity between np.concatenate/np.hstack/np.vstack/np.dstack/np.column_stack/np.row_stack/any other variants, so that implementors only have to worry about the higher-level concept of "concatenation" rather than the raw APIs of all those individual functions.
-n
-n
On 08.09.19 10:56, Nathaniel Smith wrote:
On Sun, Sep 8, 2019 at 1:04 AM Hameer Abbasi einstein.edison@gmail.com wrote:
On 08.09.19 09:53, Nathaniel Smith wrote:
OTOH, __array_function__ doesn't allow this kind of simplification: if we were using __array_function__ for ufuncs, every library would have to special-case every individual ufunc, which leads to dramatically more work and more potential for bugs.
But uarray does allow this kind of simplification. You would do the following inside a uarray backend:
def __ua_function__(func, args, kwargs): with ua.skip_backend(self_backend): # Do code here, dispatches to everything but
You can dispatch to the underlying operation, sure, but you can't implement a generic ufunc loop because you don't know that 'func' is actually a bound ufunc method, or have any way to access the underlying ufunc object. (E.g. consider the case where 'func' is 'np.add.reduce'.) The critical part of my example was that it's a new ufunc that none of these libraries have ever heard of before.
Ufuncs have lot of consistent structure beyond what generic Python callables have, and the whole point of __array_ufunc__ is that implementors can rely on that structure. You get to work at a higher level of abstraction.
A similar but simpler example would be the protocol we've sketched out for concatenation: the idea would be to capture the core similarity between np.concatenate/np.hstack/np.vstack/np.dstack/np.column_stack/np.row_stack/any other variants, so that implementors only have to worry about the higher-level concept of "concatenation" rather than the raw APIs of all those individual functions.
There's a solution for that too: Default implementations. Implement concatenate, and you've got a default implementation for all of those you mentioned.
Similarly for transpose/swapaxis/moveaxis and family.
-n
-n
On 08.09.19 10:56, Nathaniel Smith wrote:
On Sun, Sep 8, 2019 at 1:04 AM Hameer Abbasi einstein.edison@gmail.com wrote:
On 08.09.19 09:53, Nathaniel Smith wrote:
OTOH, __array_function__ doesn't allow this kind of simplification: if we were using __array_function__ for ufuncs, every library would have to special-case every individual ufunc, which leads to dramatically more work and more potential for bugs.
But uarray does allow this kind of simplification. You would do the following inside a uarray backend:
def __ua_function__(func, args, kwargs): with ua.skip_backend(self_backend): # Do code here, dispatches to everything but
You can dispatch to the underlying operation, sure, but you can't implement a generic ufunc loop because you don't know that 'func' is actually a bound ufunc method, or have any way to access the underlying ufunc object. (E.g. consider the case where 'func' is 'np.add.reduce'.) The critical part of my example was that it's a new ufunc that none of these libraries have ever heard of before.
You don't get np.add.reduce, you get np.ufunc.reduce with self=np.add. So you can access the underlying ufunc and the method, nothing limiting about that.
Ufuncs have lot of consistent structure beyond what generic Python callables have, and the whole point of __array_ufunc__ is that implementors can rely on that structure. You get to work at a higher level of abstraction.
A similar but simpler example would be the protocol we've sketched out for concatenation: the idea would be to capture the core similarity between np.concatenate/np.hstack/np.vstack/np.dstack/np.column_stack/np.row_stack/any other variants, so that implementors only have to worry about the higher-level concept of "concatenation" rather than the raw APIs of all those individual functions.
-n
-n
On Sun, Sep 8, 2019 at 12:54 AM Nathaniel Smith njs@pobox.com wrote:
On Fri, Sep 6, 2019 at 11:53 AM Ralf Gommers ralf.gommers@gmail.com wrote:
On Fri, Sep 6, 2019 at 12:53 AM Nathaniel Smith njs@pobox.com wrote:
On Tue, Sep 3, 2019 at 2:04 AM Hameer Abbasi einstein.edison@gmail.com
wrote:
The fact that we're having to design more and more protocols for a lot of very similar things is, to me, an indicator that we do have
holistic
problems that ought to be solved by a single protocol.
But the reason we've had trouble designing these protocols is that they're each different :-). If it was just a matter of copying __array_ufunc__ we'd have been done in a few minutes...
I don't think that argument is correct. That we now have two very
similar protocols is simply a matter of history and limited developer time. NEP 18 discusses in several places that __array_ufunc__ should be brought in line with __array_ufunc__, and that we can migrate a function from one protocol to the other. There's no technical reason other than backwards compat and dev time why we couldn't use __array_function__ for ufuncs also.
Huh, that's interesting! Apparently we have a profoundly different understanding of what we're doing here.
That is interesting indeed. We should figure this out first - no point discussing a NEP about plugging the gaps in our override system when we don't have a common understanding of why we wanted/needed an override system in the first place.
To me, __array_ufunc__ and
__array_function__ are completely different. In fact I'd say __array_ufunc__ is a good idea and __array_function__ is a bad idea,
It's early days, but "customer feedback" certainly has been more enthusiastic for __array_function__. Also from what I've seen so far it works well. Example: at the SciPy sprints someone put together Xarray plus pydata/sparse to use distributed sparse arrays for visualizing some large genetic (I think) data sets. That was made to work in a single day, with impressively little code.
and would definitely not be in favor of combining them together.
I'm not saying we should. But __array_ufunc__ is basically a slight specialization - knowing that the function that was called is a ufunc can be handy but is usually irrelevant.
The key difference is that __array_ufunc__ allows for *generic* implementations.
Implementations of what?
Most duck array libraries can write a single
implementation of __array_ufunc__ that works for *all* ufuncs, even new third-party ufuncs that the duck array library has never heard of,
I see where you're going with this. You are thinking of reusing the ufunc implementation to do a computation. That's a minor use case (imho), and I can't remember seeing it used.
The original use case was scipy.sparse matrices. The executive summary of NEP 13 talks about this. It's about calling `np.some_ufunc(other_ndarray_like)` and "handing over control" to that object rather than the numpy function starting to execute. Also note that NEP 13 states in the summary "This covers some of the same ground as Travis Oliphant’s proposal to retro-fit NumPy with multi-methods" (reminds one of uarray....).
For scipy.sparse, the layout of the data doesn't make sense to numpy. All that was desired was that the sparse matrix needs to know what function was called, so it can call its own implementation of that function instead.
because ufuncs all share the same structure of a loop wrapped around a
core operation, and they can treat the core operation as a black box. For example:
- Dask can split up the operation across its tiled sub-arrays, and
then for each tile it invokes the core operation.
Works for __array_function__ too. Note, *not* by explicitly reusing the numpy function. Dask anyway has its own functions that mirror the numpy API. Dask's __array_function__ just does the forwarding to its own functions.
Also, a Dask array could be a collection of CuPy arrays, and CuPy implements __array_ufunc__. So explicitly reusing the NumPy ufunc implementation on whatever comes in would be, well, not so nice.
- xarray can do its label-based axis matching, and then invoke the
core operation.
Could do this with __array_function__ too
- bcolz can loop over the array uncompressing one block at a time,
invoking the core operation on each.
not sure about this one
- sparse arrays can check the ufunc .identity attribute
this is case where knowing if something is a ufunc helps use a property of it. so there the more specialized nature of __array_ufunc__ helps. Seems niche though, and could probably also be done by checking if a function is an instance of np.ufunc via __array_function__
to find out
whether 0 is an identity, and if so invoke the operation directly on the non-zero entries; otherwise, it can loop over the array and densify it in blocks and invoke the core operation on each. (It would be useful to have a bit more metadata on the ufunc, so e.g. np.subtract could declare that zero is a right-identity but not a left-identity, but that's a simple enough extension to make at some point.)
Result: __array_ufunc__ makes it totally possible to take a ufunc from scipy.special or a random new on created with numba, and have it immediately work on an xarray wrapped around dask wrapped around bcolz, out-of-the-box. That's a clean, generic interface. [1]
This last point, using third-party ufuncs, is the interesting one to me. They have to be generated with the NumPy ufunc machinery, so the dispatch mechanism is attached to them. We could do third party functions for __array_function__ too, but that would require making @array_function_dispatch public, which we haven't done (yet?).
OTOH, __array_function__ doesn't allow this kind of simplification: if we were using __array_function__ for ufuncs, every library would have to special-case every individual ufunc, which leads to dramatically more work and more potential for bugs.
This all assumes that "reusing the ufunc's implementation" is the one thing that matters. To me that's a small side benefit, which we haven't seen a whole lot of use of in the 2+ years that __array_ufunc__ was available. I think that what (for example) CuPy does - use __array_ufunc__ to simply take over control, is both the major use case and the original motivation for introducing the protocol.
To me, the whole point of interfaces is to reduce coupling. When you have N interacting modules, it's unmaintainable if every change requires considering every N! combination. From this perspective, __array_function__ isn't good, but it is still somewhat constrained: the result of each operation is still determined by the objects involved, nothing else. In this regard, uarray even more extreme than __array_function__, because arbitrary operations can be arbitrarily changed by arbitrarily distant code. It sort of feels like the argument for uarray is: well, designing maintainable interfaces is a lot of work, so forget it, let's just make it easy to monkeypatch everything and call it a day.
That said, in my replies in this thread I've been trying to stay productive and focus on narrower concrete issues. I'm pretty sure that __array_function__ and uarray will turn out to be bad ideas and will fail, but that's not a proven fact, it's just an informed guess. And the road that I favor also has lots of risks and uncertainty.
But what is that road, and what do you think the goal is? To me it's: separate our API from our implementation. Yours seems to be "reuse our implementations" for __array_ufunc__, but I can't see how that generalizes beyond ufuncs.
So I don't have a problem with trying both as experiments and learning
more! But hopefully that explains why it's not at all obvious that uarray solves the protocol design problems we've been talking about.
-n
[1] There are also some cases that __array_ufunc__ doesn't handle as nicely. One obvious one is that GPU/TPU libraries still need to special-case individual ufuncs. But that's not a limitation of __array_ufunc__, it's a limitation of GPUs
I think this is an important point. GPUs are massively popular, and when very likely just continue to grow in importance. So anything we do in this space that says "well it works, just not for GPUs" is probably not going to solve our most pressing problems.
– they can't run CPU code,
so they can't use the CPU implementation of the core operations. Another limitation is that __array_ufunc__ is weak at handling operations that involve mixed libraries (e.g. np.add(bcolz_array, sparse_array)) – to work well, this might require that bcolz have special-case handling for sparse arrays, or vice-versa, so you still potentially have some N**2 special cases, though at least here N is the number of duck array libraries, not the number of ufuncs. I think this is an interesting target for future work. But in general, __array_ufunc__ goes a long way to taming the complexity of interacting libraries and ufuncs.
With *only* ufuncs you can't create that many interesting applications, you need the other functions too......
Cheers, Ralf
On Sun, Sep 8, 2019 at 8:40 AM Ralf Gommers ralf.gommers@gmail.com wrote:
On Sun, Sep 8, 2019 at 12:54 AM Nathaniel Smith njs@pobox.com wrote:
On Fri, Sep 6, 2019 at 11:53 AM Ralf Gommers ralf.gommers@gmail.com wrote:
On Fri, Sep 6, 2019 at 12:53 AM Nathaniel Smith njs@pobox.com wrote:
On Tue, Sep 3, 2019 at 2:04 AM Hameer Abbasi einstein.edison@gmail.com wrote:
The fact that we're having to design more and more protocols for a lot of very similar things is, to me, an indicator that we do have holistic problems that ought to be solved by a single protocol.
But the reason we've had trouble designing these protocols is that they're each different :-). If it was just a matter of copying __array_ufunc__ we'd have been done in a few minutes...
I don't think that argument is correct. That we now have two very similar protocols is simply a matter of history and limited developer time. NEP 18 discusses in several places that __array_ufunc__ should be brought in line with __array_ufunc__, and that we can migrate a function from one protocol to the other. There's no technical reason other than backwards compat and dev time why we couldn't use __array_function__ for ufuncs also.
Huh, that's interesting! Apparently we have a profoundly different understanding of what we're doing here.
That is interesting indeed. We should figure this out first - no point discussing a NEP about plugging the gaps in our override system when we don't have a common understanding of why we wanted/needed an override system in the first place.
To me, __array_ufunc__ and __array_function__ are completely different. In fact I'd say __array_ufunc__ is a good idea and __array_function__ is a bad idea,
It's early days, but "customer feedback" certainly has been more enthusiastic for __array_function__. Also from what I've seen so far it works well. Example: at the SciPy sprints someone put together Xarray plus pydata/sparse to use distributed sparse arrays for visualizing some large genetic (I think) data sets. That was made to work in a single day, with impressively little code.
Yeah, it's true, and __array_function__ made a bunch of stuff that used to be impossible become possible, I'm not saying it didn't. My prediction is that the longer we live with it, the more limits we'll hit and the more problems we'll have with long-term maintainability. I don't think initial enthusiasm is a good predictor of that either way.
The key difference is that __array_ufunc__ allows for *generic* implementations.
Implementations of what?
Generic in the sense that you can write __array_ufunc__ once and have it work for all ufuncs.
Most duck array libraries can write a single implementation of __array_ufunc__ that works for *all* ufuncs, even new third-party ufuncs that the duck array library has never heard of,
I see where you're going with this. You are thinking of reusing the ufunc implementation to do a computation. That's a minor use case (imho), and I can't remember seeing it used.
I mean, I just looked at dask and xarray, and they're both doing exactly what I said, right now in shipping code. What use cases are you targeting here if you consider dask and xarray out-of-scope? :-)
this is case where knowing if something is a ufunc helps use a property of it. so there the more specialized nature of __array_ufunc__ helps. Seems niche though, and could probably also be done by checking if a function is an instance of np.ufunc via __array_function__
Sparse arrays aren't very niche... and the isinstance trick is possible in some cases, but (a) it's relying on an undocumented implementation detail of __array_function__; according to __array_function__'s API contract, you could just as easily get passed the ufunc's __call__ method instead of the object itself, and (b) it doesn't work at all for ufunc methods like reduce, outer, accumulate. These are both show-stoppers IMO.
This last point, using third-party ufuncs, is the interesting one to me. They have to be generated with the NumPy ufunc machinery, so the dispatch mechanism is attached to them. We could do third party functions for __array_function__ too, but that would require making @array_function_dispatch public, which we haven't done (yet?).
With __array_function__ it's theoretically possible to do the dispatch on third-party functions, but when someone defines a new function they always have to go update all the duck array libraries to hard-code in some special knowledge of their new function. So in my example, even if we made @array_function_dispatch public, you still couldn't use your nice new numba-created gufunc unless you first convinced dask, xarray, and bcolz to all accept patches to support your new gufunc. With __array_ufunc__, it works out-of-the-box.
But what is that road, and what do you think the goal is? To me it's: separate our API from our implementation. Yours seems to be "reuse our implementations" for __array_ufunc__, but I can't see how that generalizes beyond ufuncs.
The road is to define *abstractions* for the operations we expose through our API, so that duck array implementors can work against a contract with well-defined preconditions and postconditions, so they can write code the works reliably even when the surrounding environment changes. That's the only way to keep things maintainable AFAICT. If the API contract is just a vague handwave at the numpy API, then no-one knows which details actually matter, it's impossible to test, implementations will inevitably end up with subtle long-standing bugs, and literally any change in numpy could potentially break duck array users, we don't know. So my motivation is that I like testing, I don't like bugs, and I like being able to maintain things with confidence :-). The principles are much more general than ufuncs; that's just a pertinent example.
I think this is an important point. GPUs are massively popular, and when very likely just continue to grow in importance. So anything we do in this space that says "well it works, just not for GPUs" is probably not going to solve our most pressing problems.
I'm not saying "__array_ufunc__ doesn't work for GPUs". I'm saying that when it comes to GPUs, there's an upper bound for how good you can hope to do, and __array_ufunc__ achieves that upper bound. So does __array_function__. So if we only care about GPUs, they're about equally good. But if we also care about dask and xarray and compressed storage and sparse storage and ... then __array_ufunc__ is strictly superior in those cases. So replacing __array_ufunc__ with __array_function__ would be a major backwards step.
-n
On Sun, Sep 8, 2019 at 7:27 PM Nathaniel Smith njs@pobox.com wrote:
On Sun, Sep 8, 2019 at 8:40 AM Ralf Gommers ralf.gommers@gmail.com wrote:
On Sun, Sep 8, 2019 at 12:54 AM Nathaniel Smith njs@pobox.com wrote:
On Fri, Sep 6, 2019 at 11:53 AM Ralf Gommers ralf.gommers@gmail.com
wrote:
On Fri, Sep 6, 2019 at 12:53 AM Nathaniel Smith njs@pobox.com
wrote:
On Tue, Sep 3, 2019 at 2:04 AM Hameer Abbasi <
einstein.edison@gmail.com> wrote:
The fact that we're having to design more and more protocols for a
lot
of very similar things is, to me, an indicator that we do have
holistic
problems that ought to be solved by a single protocol.
But the reason we've had trouble designing these protocols is that they're each different :-). If it was just a matter of copying __array_ufunc__ we'd have been done in a few minutes...
I don't think that argument is correct. That we now have two very
similar protocols is simply a matter of history and limited developer time. NEP 18 discusses in several places that __array_ufunc__ should be brought in line with __array_ufunc__, and that we can migrate a function from one protocol to the other. There's no technical reason other than backwards compat and dev time why we couldn't use __array_function__ for ufuncs also.
Huh, that's interesting! Apparently we have a profoundly different understanding of what we're doing here.
That is interesting indeed. We should figure this out first - no point
discussing a NEP about plugging the gaps in our override system when we don't have a common understanding of why we wanted/needed an override system in the first place.
To me, __array_ufunc__ and __array_function__ are completely different. In fact I'd say __array_ufunc__ is a good idea and __array_function__ is a bad idea,
It's early days, but "customer feedback" certainly has been more
enthusiastic for __array_function__. Also from what I've seen so far it works well. Example: at the SciPy sprints someone put together Xarray plus pydata/sparse to use distributed sparse arrays for visualizing some large genetic (I think) data sets. That was made to work in a single day, with impressively little code.
Yeah, it's true, and __array_function__ made a bunch of stuff that used to be impossible become possible, I'm not saying it didn't. My prediction is that the longer we live with it, the more limits we'll hit and the more problems we'll have with long-term maintainability. I don't think initial enthusiasm is a good predictor of that either way.
The key difference is that __array_ufunc__ allows for *generic* implementations.
Implementations of what?
Generic in the sense that you can write __array_ufunc__ once and have it work for all ufuncs.
Most duck array libraries can write a single implementation of __array_ufunc__ that works for *all* ufuncs, even new third-party ufuncs that the duck array library has never heard of,
I see where you're going with this. You are thinking of reusing the
ufunc implementation to do a computation. That's a minor use case (imho), and I can't remember seeing it used.
I mean, I just looked at dask and xarray, and they're both doing exactly what I said, right now in shipping code. What use cases are you targeting here if you consider dask and xarray out-of-scope? :-)
this is case where knowing if something is a ufunc helps use a property
of it. so there the more specialized nature of __array_ufunc__ helps. Seems niche though, and could probably also be done by checking if a function is an instance of np.ufunc via __array_function__
Sparse arrays aren't very niche... and the isinstance trick is possible in some cases, but (a) it's relying on an undocumented implementation detail of __array_function__; according to __array_function__'s API contract, you could just as easily get passed the ufunc's __call__ method instead of the object itself, and (b) it doesn't work at all for ufunc methods like reduce, outer, accumulate. These are both show-stoppers IMO.
This last point, using third-party ufuncs, is the interesting one to me.
They have to be generated with the NumPy ufunc machinery, so the dispatch mechanism is attached to them. We could do third party functions for __array_function__ too, but that would require making @array_function_dispatch public, which we haven't done (yet?).
With __array_function__ it's theoretically possible to do the dispatch on third-party functions, but when someone defines a new function they always have to go update all the duck array libraries to hard-code in some special knowledge of their new function. So in my example, even if we made @array_function_dispatch public, you still couldn't use your nice new numba-created gufunc unless you first convinced dask, xarray, and bcolz to all accept patches to support your new gufunc. With __array_ufunc__, it works out-of-the-box.
But what is that road, and what do you think the goal is? To me it's:
separate our API from our implementation. Yours seems to be "reuse our implementations" for __array_ufunc__, but I can't see how that generalizes beyond ufuncs.
The road is to define *abstractions* for the operations we expose through our API, so that duck array implementors can work against a contract with well-defined preconditions and postconditions, so they can write code the works reliably even when the surrounding environment changes. That's the only way to keep things maintainable AFAICT. If the API contract is just a vague handwave at the numpy API, then no-one knows which details actually matter, it's impossible to test, implementations will inevitably end up with subtle long-standing bugs, and literally any change in numpy could potentially break duck array users, we don't know. So my motivation is that I like testing, I don't like bugs, and I like being able to maintain things with confidence :-). The principles are much more general than ufuncs; that's just a pertinent example.
I think this is an important point. GPUs are massively popular, and when
very likely just continue to grow in importance. So anything we do in this space that says "well it works, just not for GPUs" is probably not going to solve our most pressing problems.
I'm not saying "__array_ufunc__ doesn't work for GPUs". I'm saying that when it comes to GPUs, there's an upper bound for how good you can hope to do, and __array_ufunc__ achieves that upper bound. So does __array_function__. So if we only care about GPUs, they're about equally good. But if we also care about dask and xarray and compressed storage and sparse storage and ... then __array_ufunc__ is strictly superior in those cases. So replacing __array_ufunc__ with __array_function__ would be a major backwards step.
One case that hasn’t been brought up in this thread is unit-handling. For example, unyt’s array_ufunc implementation explicitly handles ufuncs and will bail if someone tries to use a ufunc that unyt doesn’t know about. I tried to implement a completely generic solution but ended up concluding I couldn’t do that without silently generating answers with incorrect units.
I definitely agree with your analysis that this sort of implementation is error-prone, in fact we just had to do a bugfix release to fix clip suddenly not working now that it’s a ufunc in numpy 1.17.
-n
-- Nathaniel J. Smith -- https://vorpus.org _______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@python.org https://mail.python.org/mailman/listinfo/numpy-discussion
On Sun, Sep 8, 2019 at 6:27 PM Nathaniel Smith njs@pobox.com wrote:
On Sun, Sep 8, 2019 at 8:40 AM Ralf Gommers ralf.gommers@gmail.com wrote:
On Sun, Sep 8, 2019 at 12:54 AM Nathaniel Smith njs@pobox.com wrote:
On Fri, Sep 6, 2019 at 11:53 AM Ralf Gommers ralf.gommers@gmail.com
wrote:
On Fri, Sep 6, 2019 at 12:53 AM Nathaniel Smith njs@pobox.com
wrote:
On Tue, Sep 3, 2019 at 2:04 AM Hameer Abbasi <
einstein.edison@gmail.com> wrote:
The fact that we're having to design more and more protocols for a
lot
of very similar things is, to me, an indicator that we do have
holistic
problems that ought to be solved by a single protocol.
But the reason we've had trouble designing these protocols is that they're each different :-). If it was just a matter of copying __array_ufunc__ we'd have been done in a few minutes...
I don't think that argument is correct. That we now have two very
similar protocols is simply a matter of history and limited developer time. NEP 18 discusses in several places that __array_ufunc__ should be brought in line with __array_ufunc__, and that we can migrate a function from one protocol to the other. There's no technical reason other than backwards compat and dev time why we couldn't use __array_function__ for ufuncs also.
Huh, that's interesting! Apparently we have a profoundly different understanding of what we're doing here.
That is interesting indeed. We should figure this out first - no point
discussing a NEP about plugging the gaps in our override system when we don't have a common understanding of why we wanted/needed an override system in the first place.
To me, __array_ufunc__ and __array_function__ are completely different. In fact I'd say __array_ufunc__ is a good idea and __array_function__ is a bad idea,
It's early days, but "customer feedback" certainly has been more
enthusiastic for __array_function__. Also from what I've seen so far it works well. Example: at the SciPy sprints someone put together Xarray plus pydata/sparse to use distributed sparse arrays for visualizing some large genetic (I think) data sets. That was made to work in a single day, with impressively little code.
Yeah, it's true, and __array_function__ made a bunch of stuff that used to be impossible become possible, I'm not saying it didn't. My prediction is that the longer we live with it, the more limits we'll hit and the more problems we'll have with long-term maintainability. I don't think initial enthusiasm is a good predictor of that either way.
The key difference is that __array_ufunc__ allows for *generic* implementations.
Implementations of what?
Generic in the sense that you can write __array_ufunc__ once and have it work for all ufuncs.
Most duck array libraries can write a single implementation of __array_ufunc__ that works for *all* ufuncs, even new third-party ufuncs that the duck array library has never heard of,
I see where you're going with this. You are thinking of reusing the
ufunc implementation to do a computation. That's a minor use case (imho), and I can't remember seeing it used.
I mean, I just looked at dask and xarray, and they're both doing exactly what I said, right now in shipping code. What use cases are you targeting here if you consider dask and xarray out-of-scope? :-)
I don't think that's the interesting part, or even right. When you call `np.cos(dask_array_of_cupy_arrays)`, it certainly will not reuse the NumPy ufunc np.cos. It will call da.cos, and that will in turn call cupy.cos. Yes it will call np.cos if you feed it a dask array that contains a NumPy ndarray under the hood. But that's equally true of np.mean, which is not a ufunc. The story here is ~95% parallel for __array_ufunc__ and __array_function__. When I said not seeing used, I meant in ways that are fundamentally different between those two protocols.
this is case where knowing if something is a ufunc helps use a property
of it. so there the more specialized nature of __array_ufunc__ helps. Seems niche though, and could probably also be done by checking if a function is an instance of np.ufunc via __array_function__
Sparse arrays aren't very niche... and the isinstance trick is possible in some cases, but (a) it's relying on an undocumented implementation detail of __array_function__; according to __array_function__'s API contract, you could just as easily get passed the ufunc's __call__ method instead of the object itself,
That seems to be a matter of making it documented? Currently the dispatcher is only attached to functions, not methods.
and (b) it
doesn't work at all for ufunc methods like reduce, outer, accumulate.
No idea without looking in more detail if this can be made to work, but a quick count in the SciPy code base says ~10 uses of .reduce, 2 of .outer and 0 of .accumulate. So hardly showstoppers I'd say.
These are both show-stoppers IMO.
This last point, using third-party ufuncs, is the interesting one to me.
They have to be generated with the NumPy ufunc machinery, so the dispatch mechanism is attached to them. We could do third party functions for __array_function__ too, but that would require making @array_function_dispatch public, which we haven't done (yet?).
With __array_function__ it's theoretically possible to do the dispatch on third-party functions, but when someone defines a new function they always have to go update all the duck array libraries to hard-code in some special knowledge of their new function. So in my example, even if we made @array_function_dispatch public, you still couldn't use your nice new numba-created gufunc unless you first convinced dask, xarray, and bcolz to all accept patches to support your new gufunc. With __array_ufunc__, it works out-of-the-box.
Yep that's true. May still be better than not doing anything though, in some cases. You'll get a TypeError with a clear message for functions that aren't implemented, for something that otherwise likely doesn't work either.
But what is that road, and what do you think the goal is? To me it's:
separate our API from our implementation. Yours seems to be "reuse our implementations" for __array_ufunc__, but I can't see how that generalizes beyond ufuncs.
The road is to define *abstractions* for the operations we expose through our API, so that duck array implementors can work against a contract with well-defined preconditions and postconditions, so they can write code the works reliably even when the surrounding environment changes. That's the only way to keep things maintainable AFAICT. If the API contract is just a vague handwave at the numpy API, then no-one knows which details actually matter, it's impossible to test, implementations will inevitably end up with subtle long-standing bugs, and literally any change in numpy could potentially break duck array users, we don't know. So my motivation is that I like testing, I don't like bugs, and I like being able to maintain things with confidence :-). The principles are much more general than ufuncs; that's just a pertinent example.
Well, it's hard to argue with that in the abstract. I like all those things too:)
The question is, what does that mean concretely? Most of the NumPy API, (g)ufuncs excepted, doesn't have well-defined abstractions, and it's hard to imagine we'll get those even if we could be more liberal with backwards compat. Most functions are just, well, functions. You can dispatch on them, or not. Your preference seems to be the latter, but I have a hard time figuring out how that translates into anything but "do nothing". Do you have a concrete alternative?
I think we've chosen to try the former - dispatch on functions so we can reuse the NumPy API. It could work out well, it could give some long-term maintenance issues, time will tell. The question is now if and how to plug the gap that __array_function__ left. It's main limitation is "doesn't work for functions that don't have an array-like input" - that left out ~10-20% of functions. So now we have a proposal for a structural solution to that last 10-20%. It seems logical to want that gap plugged, rather than go back and say "we shouldn't have gone for the first 80%, so let's go no further".
I think this is an important point. GPUs are massively popular, and when
very likely just continue to grow in importance. So anything we do in this space that says "well it works, just not for GPUs" is probably not going to solve our most pressing problems.
I'm not saying "__array_ufunc__ doesn't work for GPUs". I'm saying that when it comes to GPUs, there's an upper bound for how good you can hope to do, and __array_ufunc__ achieves that upper bound. So does __array_function__. So if we only care about GPUs, they're about equally good.
Indeed.
But if we also care about dask and xarray and compressed
storage and sparse storage and ... then __array_ufunc__ is strictly superior in those cases.
That it's superior not really interesting though is it? Their main characteristic (the actual override) is identical, and then ufuncs go a bit further. I think to convince me you're going to have to come up with an actual alternative plan to `__array_ufunc__ + __array_function__ + unumpy-or-alternative-to-it`.
And re maintenance worries: I think cleaning up our API surface and namespaces will go *much* further than yes/no on overrides.
So replacing __array_ufunc__ with __array_function__ would be a major backwards step.
To be 100% clear, no one is actually proposing this.
Cheers, Ralf
On Mon, Sep 9, 2019 at 6:27 PM Ralf Gommers ralf.gommers@gmail.com wrote:
I think we've chosen to try the former - dispatch on functions so we can reuse the NumPy API. It could work out well, it could give some long-term maintenance issues, time will tell. The question is now if and how to plug the gap that __array_function__ left. It's main limitation is "doesn't work for functions that don't have an array-like input" - that left out ~10-20% of functions. So now we have a proposal for a structural solution to that last 10-20%. It seems logical to want that gap plugged, rather than go back and say "we shouldn't have gone for the first 80%, so let's go no further".
I'm excited about solving the remaining 10-20% of use cases for flexible array dispatching, but the unumpy interface suggested here (numpy.overridable) feels like a redundant redo of __array_function__ and __array_ufunc__.
I would much rather continue to develop specialized protocols for the remaining usecases. Summarizing those I've seen in this thread, these include: 1. Overrides for customizing array creation and coercion. 2. Overrides to implement operations for new dtypes. 3. Overriding implementations of NumPy functions, e.g., FFT and ufuncs with MKL.
(1) could mostly be solved by adding np.duckarray() and another function for duck array coercion. There is still the matter of overriding np.zeros and the like, which perhaps justifies another new protocol, but in my experience the use-cases for truly an array from scratch are quite rare.
(2) should be tackled as part of overhauling NumPy's dtype system to better support user defined dtypes. But it should definitely be in the form of specialized protocols, e.g., which pass in preallocated arrays to into ufuncs for a new dtype. By design, new dtypes should not be able to customize the semantics of array *structure*.
(3) could potentially motivate a new solution, but it should exist *inside* of select existing NumPy implementations, after checking for overrides with __array_function__. If the only option NumPy provides for overriding np.fft is to implement np.overrideable.fft, I doubt that would suffice to convince MKL developers from monkey patching it -- they already decided that a separate namespace is not good enough for them.
I also share Nathaniel's concern that the overrides in unumpy are too powerful, by allowing for control from arbitrary function arguments and even *non-local* control (i.e., global variables) from context managers. This level of flexibility can make code very hard to debug, especially in larger codebases.
Best, Stephan
On Mon, 2019-09-09 at 20:32 -0700, Stephan Hoyer wrote:
On Mon, Sep 9, 2019 at 6:27 PM Ralf Gommers ralf.gommers@gmail.com wrote:
I think we've chosen to try the former - dispatch on functions so we can reuse the NumPy API. It could work out well, it could give some long-term maintenance issues, time will tell. The question is now if and how to plug the gap that __array_function__ left. It's main limitation is "doesn't work for functions that don't have an array-like input" - that left out ~10-20% of functions. So now we have a proposal for a structural solution to that last 10-20%. It seems logical to want that gap plugged, rather than go back and say "we shouldn't have gone for the first 80%, so let's go no further".
I'm excited about solving the remaining 10-20% of use cases for flexible array dispatching, but the unumpy interface suggested here (numpy.overridable) feels like a redundant redo of __array_function__ and __array_ufunc__.
I would much rather continue to develop specialized protocols for the remaining usecases. Summarizing those I've seen in this thread, these include:
- Overrides for customizing array creation and coercion.
- Overrides to implement operations for new dtypes.
- Overriding implementations of NumPy functions, e.g., FFT and
ufuncs with MKL.
(1) could mostly be solved by adding np.duckarray() and another function for duck array coercion. There is still the matter of overriding np.zeros and the like, which perhaps justifies another new protocol, but in my experience the use-cases for truly an array from scratch are quite rare.
There is an issue open about adding more functions for that. Made me wonder if giving a method of choosing the duck-array whose `__array_function__` is used, could not solve it reasonably. Similar to explicitly choosing a specific template version to call in templated code. In other words `np.arange<type(duck_array)>(100)` (but with a completely different syntax, probably hidden away only for libraries to use).
Maybe it is indeed time to write up a list of options to plug that hole, and then see where it brings us.
Best,
Sebastian
(2) should be tackled as part of overhauling NumPy's dtype system to better support user defined dtypes. But it should definitely be in the form of specialized protocols, e.g., which pass in preallocated arrays to into ufuncs for a new dtype. By design, new dtypes should not be able to customize the semantics of array *structure*.
(3) could potentially motivate a new solution, but it should exist *inside* of select existing NumPy implementations, after checking for overrides with __array_function__. If the only option NumPy provides for overriding np.fft is to implement np.overrideable.fft, I doubt that would suffice to convince MKL developers from monkey patching it -- they already decided that a separate namespace is not good enough for them.
I also share Nathaniel's concern that the overrides in unumpy are too powerful, by allowing for control from arbitrary function arguments and even *non-local* control (i.e., global variables) from context managers. This level of flexibility can make code very hard to debug, especially in larger codebases.
Best, Stephan
NumPy-Discussion mailing list NumPy-Discussion@python.org https://mail.python.org/mailman/listinfo/numpy-discussion
In other words `np.arange<type(duck_array)>(100)` (but
with a completely different syntax, probably hidden away only for libraries to use).
It sounds an bit like you're describing factory classmethods there. Is the solution to this problem to move (leaving behind aliases) `np.arange` to `ndarray.arange`, `np.zeros` to `ndarray.zeros`, etc - callers then would use `type(duckarray).zeros` if they're trying to generalize.
Eric
On Mon, Sep 9, 2019, 21:18 Sebastian Berg sebastian@sipsolutions.net wrote:
On Mon, 2019-09-09 at 20:32 -0700, Stephan Hoyer wrote:
On Mon, Sep 9, 2019 at 6:27 PM Ralf Gommers ralf.gommers@gmail.com wrote:
I think we've chosen to try the former - dispatch on functions so we can reuse the NumPy API. It could work out well, it could give some long-term maintenance issues, time will tell. The question is now if and how to plug the gap that __array_function__ left. It's main limitation is "doesn't work for functions that don't have an array-like input" - that left out ~10-20% of functions. So now we have a proposal for a structural solution to that last 10-20%. It seems logical to want that gap plugged, rather than go back and say "we shouldn't have gone for the first 80%, so let's go no further".
I'm excited about solving the remaining 10-20% of use cases for flexible array dispatching, but the unumpy interface suggested here (numpy.overridable) feels like a redundant redo of __array_function__ and __array_ufunc__.
I would much rather continue to develop specialized protocols for the remaining usecases. Summarizing those I've seen in this thread, these include:
- Overrides for customizing array creation and coercion.
- Overrides to implement operations for new dtypes.
- Overriding implementations of NumPy functions, e.g., FFT and
ufuncs with MKL.
(1) could mostly be solved by adding np.duckarray() and another function for duck array coercion. There is still the matter of overriding np.zeros and the like, which perhaps justifies another new protocol, but in my experience the use-cases for truly an array from scratch are quite rare.
There is an issue open about adding more functions for that. Made me wonder if giving a method of choosing the duck-array whose `__array_function__` is used, could not solve it reasonably. Similar to explicitly choosing a specific template version to call in templated code. In other words `np.arange<type(duck_array)>(100)` (but with a completely different syntax, probably hidden away only for libraries to use).
Maybe it is indeed time to write up a list of options to plug that hole, and then see where it brings us.
Best,
Sebastian
(2) should be tackled as part of overhauling NumPy's dtype system to better support user defined dtypes. But it should definitely be in the form of specialized protocols, e.g., which pass in preallocated arrays to into ufuncs for a new dtype. By design, new dtypes should not be able to customize the semantics of array *structure*.
(3) could potentially motivate a new solution, but it should exist *inside* of select existing NumPy implementations, after checking for overrides with __array_function__. If the only option NumPy provides for overriding np.fft is to implement np.overrideable.fft, I doubt that would suffice to convince MKL developers from monkey patching it -- they already decided that a separate namespace is not good enough for them.
I also share Nathaniel's concern that the overrides in unumpy are too powerful, by allowing for control from arbitrary function arguments and even *non-local* control (i.e., global variables) from context managers. This level of flexibility can make code very hard to debug, especially in larger codebases.
Best, Stephan
NumPy-Discussion mailing list NumPy-Discussion@python.org https://mail.python.org/mailman/listinfo/numpy-discussion
NumPy-Discussion mailing list NumPy-Discussion@python.org https://mail.python.org/mailman/listinfo/numpy-discussion
On Mon, 2019-09-09 at 22:26 -0700, Eric Wieser wrote:
In other words `np.arange<type(duck_array)>(100)` (but
with a completely different syntax, probably hidden away only for libraries to use).
It sounds an bit like you're describing factory classmethods there. Is the solution to this problem to move (leaving behind aliases) `np.arange` to `ndarray.arange`, `np.zeros` to `ndarray.zeros`, etc - callers then would use `type(duckarray).zeros` if they're trying to generalize.
Yeah, factory classmethod is probably the better way to describe it. The question is where you hide them away conveniently (and how to access them). And of course if/what completely different alternatives exist.
In a sense, `__array_function__` is a bit like a collection of operator dunder methods, I guess. So, we need another collection for classmethods. And that was the quick, possibly silly, idea to also use `__array_function__`.
So yeah, there is not much of a point in not simply creating another place for them, or even using individual dunder classmethods. But we still an "operator"/function to access them nicely, unless we want to force `type(duckarray).…` on library authors.
I guess the important thing is mostly what would be convenient to downstreams implementers.
- Sebastian
Eric
On Mon, Sep 9, 2019, 21:18 Sebastian Berg <sebastian@sipsolutions.net
wrote: On Mon, 2019-09-09 at 20:32 -0700, Stephan Hoyer wrote:
On Mon, Sep 9, 2019 at 6:27 PM Ralf Gommers <
ralf.gommers@gmail.com>
wrote:
I think we've chosen to try the former - dispatch on functions
so
we can reuse the NumPy API. It could work out well, it could
give
some long-term maintenance issues, time will tell. The question
is
now if and how to plug the gap that __array_function__ left.
It's
main limitation is "doesn't work for functions that don't have
an
array-like input" - that left out ~10-20% of functions. So now
we
have a proposal for a structural solution to that last 10-20%.
It
seems logical to want that gap plugged, rather than go back and
say
"we shouldn't have gone for the first 80%, so let's go no
further".
I'm excited about solving the remaining 10-20% of use cases for flexible array dispatching, but the unumpy interface suggested
here
(numpy.overridable) feels like a redundant redo of
__array_function__
and __array_ufunc__.
I would much rather continue to develop specialized protocols for
the
remaining usecases. Summarizing those I've seen in this thread,
these
include:
- Overrides for customizing array creation and coercion.
- Overrides to implement operations for new dtypes.
- Overriding implementations of NumPy functions, e.g., FFT and
ufuncs with MKL.
(1) could mostly be solved by adding np.duckarray() and another function for duck array coercion. There is still the matter of overriding np.zeros and the like, which perhaps justifies another
new
protocol, but in my experience the use-cases for truly an array
from
scratch are quite rare.
There is an issue open about adding more functions for that. Made me wonder if giving a method of choosing the duck-array whose `__array_function__` is used, could not solve it reasonably. Similar to explicitly choosing a specific template version to call in templated code. In other words `np.arange<type(duck_array)>(100)` (but with a completely different syntax, probably hidden away only for libraries to use).
Maybe it is indeed time to write up a list of options to plug that hole, and then see where it brings us.
Best,
Sebastian
(2) should be tackled as part of overhauling NumPy's dtype system
to
better support user defined dtypes. But it should definitely be
in
the form of specialized protocols, e.g., which pass in
preallocated
arrays to into ufuncs for a new dtype. By design, new dtypes
should
not be able to customize the semantics of array *structure*.
(3) could potentially motivate a new solution, but it should
exist
*inside* of select existing NumPy implementations, after checking
for
overrides with __array_function__. If the only option NumPy
provides
for overriding np.fft is to implement np.overrideable.fft, I
doubt
that would suffice to convince MKL developers from monkey
patching it
-- they already decided that a separate namespace is not good
enough
for them.
I also share Nathaniel's concern that the overrides in unumpy are
too
powerful, by allowing for control from arbitrary function
arguments
and even *non-local* control (i.e., global variables) from
context
managers. This level of flexibility can make code very hard to
debug,
especially in larger codebases.
Best, Stephan
NumPy-Discussion mailing list NumPy-Discussion@python.org https://mail.python.org/mailman/listinfo/numpy-discussion
NumPy-Discussion mailing list NumPy-Discussion@python.org https://mail.python.org/mailman/listinfo/numpy-discussion
NumPy-Discussion mailing list NumPy-Discussion@python.org https://mail.python.org/mailman/listinfo/numpy-discussion
On 10.09.19 05:32, Stephan Hoyer wrote:
On Mon, Sep 9, 2019 at 6:27 PM Ralf Gommers <ralf.gommers@gmail.com mailto:ralf.gommers@gmail.com> wrote:
I think we've chosen to try the former - dispatch on functions so we can reuse the NumPy API. It could work out well, it could give some long-term maintenance issues, time will tell. The question is now if and how to plug the gap that __array_function__ left. It's main limitation is "doesn't work for functions that don't have an array-like input" - that left out ~10-20% of functions. So now we have a proposal for a structural solution to that last 10-20%. It seems logical to want that gap plugged, rather than go back and say "we shouldn't have gone for the first 80%, so let's go no further".
I'm excited about solving the remaining 10-20% of use cases for flexible array dispatching, but the unumpy interface suggested here (numpy.overridable) feels like a redundant redo of __array_function__ and __array_ufunc__.
I would much rather continue to develop specialized protocols for the remaining usecases. Summarizing those I've seen in this thread, these include:
- Overrides for customizing array creation and coercion.
- Overrides to implement operations for new dtypes.
- Overriding implementations of NumPy functions, e.g., FFT and ufuncs
with MKL.
(1) could mostly be solved by adding np.duckarray() and another function for duck array coercion. There is still the matter of overriding np.zeros and the like, which perhaps justifies another new protocol, but in my experience the use-cases for truly an array from scratch are quite rare.
While they're rare for libraries like XArray; CuPy, Dask and PyData/Sparse need these.
(2) should be tackled as part of overhauling NumPy's dtype system to better support user defined dtypes. But it should definitely be in the form of specialized protocols, e.g., which pass in preallocated arrays to into ufuncs for a new dtype. By design, new dtypes should not be able to customize the semantics of array *structure*.
We already have a split in the type system with e.g. Cython's buffers, Numba's parallel type system. This is a different issue altogether, e.g. allowing a unyt dtype to spawn a unyt array, rather than forcing a re-write of unyt to cooperate with NumPy's new dtype system.
(3) could potentially motivate a new solution, but it should exist *inside* of select existing NumPy implementations, after checking for overrides with __array_function__. If the only option NumPy provides for overriding np.fft is to implement np.overrideable.fft, I doubt that would suffice to convince MKL developers from monkey patching it -- they already decided that a separate namespace is not good enough for them.
That has already been addressed by Ralf in another email. We're proposing to merge that into NumPy proper.
Also, you're missing a few:
4. Having default implementations that allow overrides of a large part of the API while defining only a small part. This holds for e.g. transpose/concatenate.
5. Generation of Random numbers (overriding RandomState). CuPy has its own implementation which would be nice to override.
I also share Nathaniel's concern that the overrides in unumpy are too powerful, by allowing for control from arbitrary function arguments and even *non-local* control (i.e., global variables) from context managers. This level of flexibility can make code very hard to debug, especially in larger codebases.
Backend switching needs global context, in any case. There isn't a good way around that other than the class dundermethods outlined in another thread, which would require rewrites of large amounts of code.
Best, Stephan
NumPy-Discussion mailing list NumPy-Discussion@python.org https://mail.python.org/mailman/listinfo/numpy-discussion
On Tue, Sep 10, 2019 at 6:06 AM Hameer Abbasi einstein.edison@gmail.com wrote:
On 10.09.19 05:32, Stephan Hoyer wrote:
On Mon, Sep 9, 2019 at 6:27 PM Ralf Gommers ralf.gommers@gmail.com wrote:
I think we've chosen to try the former - dispatch on functions so we can reuse the NumPy API. It could work out well, it could give some long-term maintenance issues, time will tell. The question is now if and how to plug the gap that __array_function__ left. It's main limitation is "doesn't work for functions that don't have an array-like input" - that left out ~10-20% of functions. So now we have a proposal for a structural solution to that last 10-20%. It seems logical to want that gap plugged, rather than go back and say "we shouldn't have gone for the first 80%, so let's go no further".
I'm excited about solving the remaining 10-20% of use cases for flexible array dispatching, but the unumpy interface suggested here (numpy.overridable) feels like a redundant redo of __array_function__ and __array_ufunc__.
I would much rather continue to develop specialized protocols for the remaining usecases. Summarizing those I've seen in this thread, these include:
- Overrides for customizing array creation and coercion.
- Overrides to implement operations for new dtypes.
- Overriding implementations of NumPy functions, e.g., FFT and ufuncs
with MKL.
(1) could mostly be solved by adding np.duckarray() and another function for duck array coercion. There is still the matter of overriding np.zeros and the like, which perhaps justifies another new protocol, but in my experience the use-cases for truly an array from scratch are quite rare.
While they're rare for libraries like XArray; CuPy, Dask and PyData/Sparse need these.
(2) should be tackled as part of overhauling NumPy's dtype system to better support user defined dtypes. But it should definitely be in the form of specialized protocols, e.g., which pass in preallocated arrays to into ufuncs for a new dtype. By design, new dtypes should not be able to customize the semantics of array *structure*.
We already have a split in the type system with e.g. Cython's buffers, Numba's parallel type system. This is a different issue altogether, e.g. allowing a unyt dtype to spawn a unyt array, rather than forcing a re-write of unyt to cooperate with NumPy's new dtype system.
I guess you're proposing that operations like np.sum(numpy_array, dtype=other_dtype) could rely on other_dtype for the implementation and potentially return a non-NumPy array? I'm not sure this is well motivated -- it would be helpful to discuss actual use-cases.
The most commonly used NumPy functionality related to dtypes can be found only in methods on np.ndarray, e.g., astype() and view(). But I don't think there's any proposal to change that.
- Having default implementations that allow overrides of a large part of
the API while defining only a small part. This holds for e.g. transpose/concatenate.
I'm not sure how unumpy solve the problems we encountered when trying to do this with __array_function__ -- namely the way that it exposes all of NumPy's internals, or requires rewriting a lot of internal NumPy code to ensure it always casts inputs with asarray().
I think it would be useful to expose default implementations of NumPy operations somewhere to make it easier to implement __array_function__, but it doesn't make much sense to couple this to user facing overrides. These can be exposed as a separate package or numpy module (e.g., numpy.default_implementations) that uses np.duckarray(), which library authors can make use of by calling inside their __aray_function__ methods.
- Generation of Random numbers (overriding RandomState). CuPy has its
own implementation which would be nice to override.
I'm not sure that NumPy's random state objects make sense for duck arrays. Because these are stateful objects, they are pretty coupled to NumPy's implementation -- you cannot store any additional state on RandomState objects that might be needed for a new implementation. At a bare minimum, you will loss the reproducibility of random seeds, though this may be less of a concern with the new random API.
I also share Nathaniel's concern that the overrides in unumpy are too powerful, by allowing for control from arbitrary function arguments and even *non-local* control (i.e., global variables) from context managers. This level of flexibility can make code very hard to debug, especially in larger codebases.
Backend switching needs global context, in any case. There isn't a good way around that other than the class dundermethods outlined in another thread, which would require rewrites of large amounts of code.
Do we really need to support robust backend switching in NumPy? I'm not strongly opposed, but what use cases does it actually solve to be able to override np.fft.fft rather than using a new function?
At some point, if you want maximum performance you won't be writing the code using NumPy proper anyways. At best you'll be using a system with duck-array support like CuPy.
On Tue, Sep 10, 2019 at 10:59 AM Stephan Hoyer shoyer@gmail.com wrote:
On Tue, Sep 10, 2019 at 6:06 AM Hameer Abbasi einstein.edison@gmail.com wrote:
On 10.09.19 05:32, Stephan Hoyer wrote:
On Mon, Sep 9, 2019 at 6:27 PM Ralf Gommers ralf.gommers@gmail.com wrote:
I think we've chosen to try the former - dispatch on functions so we can reuse the NumPy API. It could work out well, it could give some long-term maintenance issues, time will tell. The question is now if and how to plug the gap that __array_function__ left. It's main limitation is "doesn't work for functions that don't have an array-like input" - that left out ~10-20% of functions. So now we have a proposal for a structural solution to that last 10-20%. It seems logical to want that gap plugged, rather than go back and say "we shouldn't have gone for the first 80%, so let's go no further".
I'm excited about solving the remaining 10-20% of use cases for flexible array dispatching,
Great! I think most (but not all) of us are on the same page here.
Actually now that Peter came up with the `like=` keyword idea for array creation functions I'm very interested in seeing that worked out, feels like that could be a nice solution for part of that 10-20% that did look pretty bad before.
but the unumpy interface suggested here (numpy.overridable) feels like a
redundant redo of __array_function__ and __array_ufunc__.
A bit of context: a big part of the reason I advocated for numpy.overridable is that library authors can use it *only* for the parts not already covered by the protocols we already have. If there's overlap there's several ways to deal with that, including only including part of the unumpy API surface. It does plug all the holes in one go (although you can then indeed argue it does too much), and there is no other coherent proposal/vision yet that does this. What you wrote below comes closest, and I'd love to see that worked out (e.g. the like= argument for array creation). What I don't like is an ad-hoc plugging of one hole at a time without visibility on how many more protocols and new workaround functions in the API we would need. So hopefully we can come to an apples-to-apples comparison of two design alternatives.
Also, we just discussed this whole thread in the community call, and it's clear that it's a complex matter with many different angles. It's very hard to get a full overview. Our conclusion in the call was that this will benefit from an in-person discussion. The sprint in November may be a really good opportunity for that.
In the meantime we can of course keep working out ideas/docs. For now I think it's clear that we (the NEP authors) have some homework to do - that may take some time.
I would much rather continue to develop specialized protocols for the remaining usecases. Summarizing those I've seen in this thread, these include:
- Overrides for customizing array creation and coercion.
- Overrides to implement operations for new dtypes.
- Overriding implementations of NumPy functions, e.g., FFT and ufuncs
with MKL.
(1) could mostly be solved by adding np.duckarray() and another function for duck array coercion. There is still the matter of overriding np.zeros and the like, which perhaps justifies another new protocol, but in my experience the use-cases for truly an array from scratch are quite rare.
While they're rare for libraries like XArray; CuPy, Dask and PyData/Sparse need these.
(2) should be tackled as part of overhauling NumPy's dtype system to better support user defined dtypes. But it should definitely be in the form of specialized protocols, e.g., which pass in preallocated arrays to into ufuncs for a new dtype. By design, new dtypes should not be able to customize the semantics of array *structure*.
We already have a split in the type system with e.g. Cython's buffers, Numba's parallel type system. This is a different issue altogether, e.g. allowing a unyt dtype to spawn a unyt array, rather than forcing a re-write of unyt to cooperate with NumPy's new dtype system.
I guess you're proposing that operations like np.sum(numpy_array, dtype=other_dtype) could rely on other_dtype for the implementation and potentially return a non-NumPy array? I'm not sure this is well motivated -- it would be helpful to discuss actual use-cases.
The most commonly used NumPy functionality related to dtypes can be found only in methods on np.ndarray, e.g., astype() and view(). But I don't think there's any proposal to change that.
- Having default implementations that allow overrides of a large part of
the API while defining only a small part. This holds for e.g. transpose/concatenate.
I'm not sure how unumpy solve the problems we encountered when trying to do this with __array_function__ -- namely the way that it exposes all of NumPy's internals, or requires rewriting a lot of internal NumPy code to ensure it always casts inputs with asarray().
I think it would be useful to expose default implementations of NumPy operations somewhere to make it easier to implement __array_function__, but it doesn't make much sense to couple this to user facing overrides. These can be exposed as a separate package or numpy module (e.g., numpy.default_implementations) that uses np.duckarray(), which library authors can make use of by calling inside their __aray_function__ methods.
- Generation of Random numbers (overriding RandomState). CuPy has its
own implementation which would be nice to override.
I'm not sure that NumPy's random state objects make sense for duck arrays. Because these are stateful objects, they are pretty coupled to NumPy's implementation -- you cannot store any additional state on RandomState objects that might be needed for a new implementation. At a bare minimum, you will loss the reproducibility of random seeds, though this may be less of a concern with the new random API.
I also share Nathaniel's concern that the overrides in unumpy are too powerful, by allowing for control from arbitrary function arguments and even *non-local* control (i.e., global variables) from context managers. This level of flexibility can make code very hard to debug, especially in larger codebases.
Backend switching needs global context, in any case. There isn't a good way around that other than the class dundermethods outlined in another thread, which would require rewrites of large amounts of code.
Do we really need to support robust backend switching in NumPy? I'm not strongly opposed, but what use cases does it actually solve to be able to override np.fft.fft rather than using a new function?
I don't know, but that feels like an odd question. We wanted an FFT backend system. Now applying __array_function__ to numpy.fft happened without a real discussion, but as a backend system I don't think it would have met the criteria. Something that works for CuPy, Dask and Xarray, but not for Pyfftw or mkl_fft is only half a solution.
Cheers, Ralf
On Wed, Sep 11, 2019 at 4:18 PM Ralf Gommers ralf.gommers@gmail.com wrote:
On Tue, Sep 10, 2019 at 10:59 AM Stephan Hoyer shoyer@gmail.com wrote:
On Tue, Sep 10, 2019 at 6:06 AM Hameer Abbasi einstein.edison@gmail.com wrote:
On 10.09.19 05:32, Stephan Hoyer wrote:
On Mon, Sep 9, 2019 at 6:27 PM Ralf Gommers ralf.gommers@gmail.com wrote:
I think we've chosen to try the former - dispatch on functions so we can reuse the NumPy API. It could work out well, it could give some long-term maintenance issues, time will tell. The question is now if and how to plug the gap that __array_function__ left. It's main limitation is "doesn't work for functions that don't have an array-like input" - that left out ~10-20% of functions. So now we have a proposal for a structural solution to that last 10-20%. It seems logical to want that gap plugged, rather than go back and say "we shouldn't have gone for the first 80%, so let's go no further".
I'm excited about solving the remaining 10-20% of use cases for flexible array dispatching,
Great! I think most (but not all) of us are on the same page here.
Actually now that Peter came up with the `like=` keyword idea for array creation functions I'm very interested in seeing that worked out, feels like that could be a nice solution for part of that 10-20% that did look pretty bad before.
but the unumpy interface suggested here (numpy.overridable) feels like a
redundant redo of __array_function__ and __array_ufunc__.
A bit of context: a big part of the reason I advocated for numpy.overridable is that library authors can use it *only* for the parts not already covered by the protocols we already have. If there's overlap there's several ways to deal with that, including only including part of the unumpy API surface. It does plug all the holes in one go (although you can then indeed argue it does too much), and there is no other coherent proposal/vision yet that does this. What you wrote below comes closest, and I'd love to see that worked out (e.g. the like= argument for array creation). What I don't like is an ad-hoc plugging of one hole at a time without visibility on how many more protocols and new workaround functions in the API we would need. So hopefully we can come to an apples-to-apples comparison of two design alternatives.
Also, we just discussed this whole thread in the community call, and it's clear that it's a complex matter with many different angles. It's very hard to get a full overview. Our conclusion in the call was that this will benefit from an in-person discussion. The sprint in November may be a really good opportunity for that.
Sounds good, I'm looking forward to the discussion at the November sprint!
In the meantime we can of course keep working out ideas/docs. For now I think it's clear that we (the NEP authors) have some homework to do - that may take some time.
I would much rather continue to develop specialized protocols for the remaining usecases. Summarizing those I've seen in this thread, these include:
- Overrides for customizing array creation and coercion.
- Overrides to implement operations for new dtypes.
- Overriding implementations of NumPy functions, e.g., FFT and ufuncs
with MKL.
(1) could mostly be solved by adding np.duckarray() and another function for duck array coercion. There is still the matter of overriding np.zeros and the like, which perhaps justifies another new protocol, but in my experience the use-cases for truly an array from scratch are quite rare.
While they're rare for libraries like XArray; CuPy, Dask and PyData/Sparse need these.
(2) should be tackled as part of overhauling NumPy's dtype system to better support user defined dtypes. But it should definitely be in the form of specialized protocols, e.g., which pass in preallocated arrays to into ufuncs for a new dtype. By design, new dtypes should not be able to customize the semantics of array *structure*.
We already have a split in the type system with e.g. Cython's buffers, Numba's parallel type system. This is a different issue altogether, e.g. allowing a unyt dtype to spawn a unyt array, rather than forcing a re-write of unyt to cooperate with NumPy's new dtype system.
I guess you're proposing that operations like np.sum(numpy_array, dtype=other_dtype) could rely on other_dtype for the implementation and potentially return a non-NumPy array? I'm not sure this is well motivated -- it would be helpful to discuss actual use-cases.
The most commonly used NumPy functionality related to dtypes can be found only in methods on np.ndarray, e.g., astype() and view(). But I don't think there's any proposal to change that.
- Having default implementations that allow overrides of a large part
of the API while defining only a small part. This holds for e.g. transpose/concatenate.
I'm not sure how unumpy solve the problems we encountered when trying to do this with __array_function__ -- namely the way that it exposes all of NumPy's internals, or requires rewriting a lot of internal NumPy code to ensure it always casts inputs with asarray().
I think it would be useful to expose default implementations of NumPy operations somewhere to make it easier to implement __array_function__, but it doesn't make much sense to couple this to user facing overrides. These can be exposed as a separate package or numpy module (e.g., numpy.default_implementations) that uses np.duckarray(), which library authors can make use of by calling inside their __aray_function__ methods.
- Generation of Random numbers (overriding RandomState). CuPy has its
own implementation which would be nice to override.
I'm not sure that NumPy's random state objects make sense for duck arrays. Because these are stateful objects, they are pretty coupled to NumPy's implementation -- you cannot store any additional state on RandomState objects that might be needed for a new implementation. At a bare minimum, you will loss the reproducibility of random seeds, though this may be less of a concern with the new random API.
I also share Nathaniel's concern that the overrides in unumpy are too powerful, by allowing for control from arbitrary function arguments and even *non-local* control (i.e., global variables) from context managers. This level of flexibility can make code very hard to debug, especially in larger codebases.
Backend switching needs global context, in any case. There isn't a good way around that other than the class dundermethods outlined in another thread, which would require rewrites of large amounts of code.
Do we really need to support robust backend switching in NumPy? I'm not strongly opposed, but what use cases does it actually solve to be able to override np.fft.fft rather than using a new function?
I don't know, but that feels like an odd question. We wanted an FFT backend system. Now applying __array_function__ to numpy.fft happened without a real discussion, but as a backend system I don't think it would have met the criteria. Something that works for CuPy, Dask and Xarray, but not for Pyfftw or mkl_fft is only half a solution.
I agree, __array_function__ is not a backend system.
Cheers, Ralf
NumPy-Discussion mailing list NumPy-Discussion@python.org https://mail.python.org/mailman/listinfo/numpy-discussion
On 09.09.19 03:26, Nathaniel Smith wrote:
[snip] Generic in the sense that you can write __array_ufunc__ once and have it work for all ufuncs.
You can do that too with __ua_function__, you get np.ufunc.__call__, with self=<any-ufunc>. The same holds for say, RandomState objects, once implemented.
Most duck array libraries can write a single implementation of __array_ufunc__ that works for *all* ufuncs, even new third-party ufuncs that the duck array library has never heard of,
I see where you're going with this. You are thinking of reusing the ufunc implementation to do a computation. That's a minor use case (imho), and I can't remember seeing it used.
I mean, I just looked at dask and xarray, and they're both doing exactly what I said, right now in shipping code. What use cases are you targeting here if you consider dask and xarray out-of-scope? :-)
this is case where knowing if something is a ufunc helps use a property of it. so there the more specialized nature of __array_ufunc__ helps. Seems niche though, and could probably also be done by checking if a function is an instance of np.ufunc via __array_function__
Sparse arrays aren't very niche... and the isinstance trick is possible in some cases, but (a) it's relying on an undocumented implementation detail of __array_function__; according to __array_function__'s API contract, you could just as easily get passed the ufunc's __call__ method instead of the object itself, and (b) it doesn't work at all for ufunc methods like reduce, outer, accumulate. These are both show-stoppers IMO.
It does work for all ufunc methods. You just get passed in the appropriate method (ufunc.reduce, ufunc.accumulate, ...), with self=<any-ufunc>.
[snip]
Hi Nathaniel,
On 02.09.19 23:09, Nathaniel Smith wrote:
On Mon, Sep 2, 2019 at 2:15 AM Hameer Abbasi einstein.edison@gmail.com wrote:
Me, Ralf Gommers and Peter Bell (both cc’d) have come up with a proposal on how to solve the array creation and duck array problems. The solution is outlined in NEP-31, currently in the form of a PR, [1]
Thanks for putting this together! It'd be great to have more engagement between uarray and numpy.
============================================================
NEP 31 — Context-local and global overrides of the NumPy API
============================================================
Now that I've read this over, my main feedback is that right now it seems too vague and high-level to give it a fair evaluation? The idea of a NEP is to lay out a problem and proposed solution in enough detail that it can be evaluated and critiqued, but this felt to me more like it was pointing at some other documents for all the details and then promising that uarray has solutions for all our problems.
This NEP takes a more holistic approach: It assumes that there are parts of the API that need to be overridable, and that these will grow over time. It provides a general framework and a mechanism to avoid a design of a new protocol each time this is required.
The idea of a holistic approach makes me nervous, because I'm not sure we have holistic problems.
The fact that we're having to design more and more protocols for a lot of very similar things is, to me, an indicator that we do have holistic problems that ought to be solved by a single protocol.
Sometimes a holistic approach is the right thing; other times it means sweeping the actual problems under the rug, so things *look* simple and clean but in fact nothing has been solved, and they just end up biting us later. And from the NEP as currently written, I can't tell whether this is the good kind of holistic or the bad kind of holistic.
Now I'm writing vague handwavey things, so let me follow my own advice and make it more concrete with an example :-).
When Stephan and I were writing NEP 22, the single thing we spent the most time discussing was the problem of duck-array coercion, and in particular what to do about existing code that does np.asarray(duck_array_obj).
The reason this is challenging is that there's a lot of code written in Cython/C/C++ that calls np.asarray, and then blindly casts the return value to a PyArray struct and starts accessing the raw memory fields. If np.asarray starts returning anything besides a real-actual np.ndarray object, then this code will start corrupting random memory, leading to a segfault at best.
Stephan felt strongly that this meant that existing np.asarray calls *must not* ever return anything besides an np.ndarray object, and therefore we needed to add a new function np.asduckarray(), or maybe an explicit opt-in flag like np.asarray(..., allow_duck_array=True).
I agreed that this was a problem, but thought we might be able to get away with an "opt-out" system, where we add an allow_duck_array= flag, but make it *default* to True, and document that the Cython/C/C++ users who want to work with a raw np.ndarray object should modify their code to explicitly call np.asarray(obj, allow_duck_array=False). This would mean that for a while people who tried to pass duck-arrays into legacy library would get segfaults, but there would be a clear path for fixing these issues as they were discovered.
Either way, there are also some other details to figure out: how does this affect the C version of asarray? What about np.asfortranarray – probably that should default to allow_duck_array=False, even if we did make np.asarray default to allow_duck_array=True, right?
Now if I understand right, your proposal would be to make it so any code in any package could arbitrarily change the behavior of np.asarray for all inputs, e.g. I could just decide that np.asarray([1, 2, 3]) should return some arbitrary non-np.ndarray object. It seems like this has a much greater potential for breaking existing Cython/C/C++ code, and the NEP doesn't currently describe why this extra power is useful, and it doesn't currently describe how it plans to mitigate the downsides. (For example, if a caller needs a real np.ndarray, then is there some way to explicitly request one? The NEP doesn't say.) Maybe this is all fine and there are solutions to these issues, but any proposal to address duck array coercion needs to at least talk about these issues!
I believe I addressed this in a previous email, but the NEP doesn't suggest overriding numpy.asarray or numpy.array. It suggests overriding numpy.overridable.asarray and numpy.overridable.array, so existing code will continue to work as-is and overrides are opt-in rather than forced on you.
The argument about this kind of code could be applied to return values from other functions as well. That said, there is a way to request a NumPy array object explicitly:
with ua.set_backend(np):
x = np.asarray(...)
And that's just one example... array coercion is a particularly central and tricky problem, but the numpy API big, and there are probably other problems like this. For another example, I don't understand what the NEP is proposing to do about dtypes at all.
Just as there are other kinds of arrays, there may be other kinds of dtypes that are not NumPy dtypes. They cannot be attached to a NumPy array object (as Sebastian pointed out to me in last week's Community meeting), but they can still provide other powerful features.
That's why I think the NEP needs to be fleshed out a lot more before it will be possible to evaluate fairly.
-n
I just pushed a new version of the NEP to my PR, the full-text of which is below.
============================================================ NEP 31 — Context-local and global overrides of the NumPy API ============================================================
:Author: Hameer Abbasi habbasi@quansight.com :Author: Ralf Gommers rgommers@quansight.com :Author: Peter Bell peterbell10@live.co.uk :Status: Draft :Type: Standards Track :Created: 2019-08-22
Abstract --------
This NEP proposes to make all of NumPy's public API overridable via an extensible backend mechanism, using a library called ``uarray`` `[1]`_
``uarray`` provides global and context-local overrides, as well as a dispatch mechanism similar to NEP-18 `[2]`_. First experiences with ``__array_function__`` show that it is necessary to be able to override NumPy functions that *do not take an array-like argument*, and hence aren't overridable via ``__array_function__``. The most pressing need is array creation and coercion functions - see e.g. NEP-30 `[9]`_.
This NEP proposes to allow, in an opt-in fashion, overriding any part of the NumPy API. It is intended as a comprehensive resolution to NEP-22 `[3]`_, and obviates the need to add an ever-growing list of new protocols for each new type of function or object that needs to become overridable.
Motivation and Scope --------------------
The motivation behind ``uarray`` is manyfold: First, there have been several attempts to allow dispatch of parts of the NumPy API, including (most prominently), the ``__array_ufunc__`` protocol in NEP-13 `[4]`_, and the ``__array_function__`` protocol in NEP-18 `[2]`_, but this has shown the need for further protocols to be developed, including a protocol for coercion (see `[5]`_). The reasons these overrides are needed have been extensively discussed in the references, and this NEP will not attempt to go into the details of why these are needed. Another pain point requiring yet another protocol is the duck-array protocol (see `[9]`_).
This NEP takes a more holistic approach: It assumes that there are parts of the API that need to be overridable, and that these will grow over time. It provides a general framework and a mechanism to avoid a design of a new protocol each time this is required.
This NEP proposes the following: That ``unumpy`` `[8]`_ becomes the recommended override mechanism for the parts of the NumPy API not yet covered by ``__array_function__`` or ``__array_ufunc__``, and that ``uarray`` is vendored into a new namespace within NumPy to give users and downstream dependencies access to these overrides. This vendoring mechanism is similar to what SciPy decided to do for making ``scipy.fft`` overridable (see `[10]`_).
Detailed description --------------------
**Note:** *This section will not attempt to go into too much detail about ``uarray``, that is the purpose of the ``uarray`` documentation.* `[1]`_ *However, the NumPy community will have input into the design of ``uarray``, via the issue tracker.*
``uarray`` Primer ^^^^^^^^^^^^^^^^^
Defining backends ~~~~~~~~~~~~~~~~~
``uarray`` consists of two main protocols: ``__ua_convert__`` and ``__ua_function__``, called in that order, along with ``__ua_domain__``, which is a string defining the domain of the backend. If any of the protocols return ``NotImplemented``, we fall back to the next backend.
``__ua_convert__`` is for conversion and coercion. It has the signature ``(dispatchables, coerce)``, where ``dispatchables`` is an iterable of ``ua.Dispatchable`` objects and ``coerce`` is a boolean indicating whether or not to force the conversion. ``ua.Dispatchable`` is a simple class consisting of three simple values: ``type``, ``value``, and ``coercible``. ``__ua_convert__`` returns an iterable of the converted values, or ``NotImplemented`` in the case of failure. Returning ``NotImplemented`` here will cause ``uarray`` to move to the next available backend.
``__ua_function__`` has the signature ``(func, args, kwargs)`` and defines the actual implementation of the function. It recieves the function and its arguments. Returning ``NotImplemented`` will cause a move to the default implementation of the function if one exists, and failing that, the next backend.
If all backends are exhausted, a ``ua.BackendNotImplementedError`` is raised.
Backends can be registered for permanent use if required.
Defining overridable multimethods ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
To define an overridable function (a multimethod), one needs a few things:
1. A dispatcher that returns an iterable of ``ua.Dispatchable`` objects. 2. A reverse dispatcher that replaces dispatchable values with the supplied ones. 3. A domain. 4. Optionally, a default implementation, which can be provided in terms of other multimethods.
As an example, consider the following::
import uarray as ua
def full_argreplacer(args, kwargs, dispatchables): def full(shape, fill_value, dtype=None, order='C'): return (shape, fill_value), dict( dtype=dispatchables[0], order=order )
return full(*args, **kwargs)
@ua.create_multimethod(full_argreplacer, domain="numpy") def full(shape, fill_value, dtype=None, order='C'): return (ua.Dispatchable(dtype, np.dtype),)
A large set of examples can be found in the ``unumpy`` repository, `[8]`_. This simple act of overriding callables allows us to override:
* Methods * Properties, via ``fget`` and ``fset`` * Entire objects, via ``__get__``.
Using overrides ~~~~~~~~~~~~~~~
The way we propose the overrides will be used by end users is::
import numpy.overridable as np with np.set_backend(backend): x = np.asarray(my_array, dtype=dtype)
And a library that implements a NumPy-like API will use it in the following manner (as an example)::
import numpy.overridable as np _ua_implementations = {}
__ua_domain__ = "numpy"
def __ua_function__(func, args, kwargs): fn = _ua_implementations.get(func, None) return fn(*args, **kwargs) if fn is not None else NotImplemented
def implements(ua_func): def inner(func): _ua_implementations[ua_func] = func return func
return inner
@implements(np.asarray) def asarray(a, dtype=None, order=None): # Code here # Either this method or __ua_convert__ must # return NotImplemented for unsupported types, # Or they shouldn't be marked as dispatchable.
# Provides a default implementation for ones and zeros. @implements(np.full) def full(shape, fill_value, dtype=None, order='C'): # Code here
The only change this NEP proposes at its acceptance, is to make ``unumpy`` the officially recommended way to override NumPy. ``unumpy`` will remain a separate repository/package (which we propose to vendor to avoid a hard dependency, and use the separate ``unumpy`` package only if it is installed) rather than depend on for the time being), and will be developed primarily with the input of duck-array authors and secondarily, custom dtype authors, via the usual GitHub workflow. There are a few reasons for this:
* Faster iteration in the case of bugs or issues. * Faster design changes, in the case of needed functionality. * ``unumpy`` will work with older versions of NumPy as well. * The user and library author opt-in to the override process, rather than breakages happening when it is least expected. In simple terms, bugs in ``unumpy`` mean that ``numpy`` remains unaffected.
Duck-array coercion ~~~~~~~~~~~~~~~~~~~
There are inherent problems about returning objects that are not NumPy arrays from ``numpy.array`` or ``numpy.asarray``, particularly in the context of C/C++ or Cython code that may get an object with a different memory layout than the one it expects. However, we believe this problem may apply not only to these two functions but all functions that return NumPy arrays. For this reason, overrides are opt-in for the user, by using the submodule ``numpy.overridable`` rather than ``numpy``. NumPy will continue to work unaffected by anything in ``numpy.overridable``.
If the user wishes to obtain a NumPy array, there are two ways of doing it:
1. Use ``numpy.asarray`` (the non-overridable version). 2. Use ``numpy.overridable.asarray`` with the NumPy backend set and coercion enabled::
import numpy.overridable as np
with ua.set_backend(np): x = np.asarray(...)
Advantanges of ``unumpy`` over other solutions ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
``unumpy`` offers a number of advantanges over the approach of defining a new protocol for every problem encountered: Whenever there is something requiring an override, ``unumpy`` will be able to offer a unified API with very minor changes. For example:
* ``ufunc`` objects can be overridden via their ``__call__``, ``reduce`` and other methods. * Other functions can be overridden in a similar fashion. * ``np.asduckarray`` goes away, and becomes ``np.asarray`` with a backend set. * The same holds for array creation functions such as ``np.zeros``, ``np.empty`` and so on.
This also holds for the future: Making something overridable would require only minor changes to ``unumpy``.
Another promise ``unumpy`` holds is one of default implementations. Default implementations can be provided for any multimethod, in terms of others. This allows one to override a large part of the NumPy API by defining only a small part of it. This is to ease the creation of new duck-arrays, by providing default implementations of many functions that can be easily expressed in terms of others, as well as a repository of utility functions that help in the implementation of duck-arrays that most duck-arrays would require.
The last benefit is a clear way to coerce to a given backend (via the ``coerce`` keyword in ``ua.set_backend``), and a protocol for coercing not only arrays, but also ``dtype`` objects and ``ufunc`` objects with similar ones from other libraries. This is due to the existence of actual, third party dtype packages, and their desire to blend into the NumPy ecosystem (see `[6]`_). This is a separate issue compared to the C-level dtype redesign proposed in `[7]`_, it's about allowing third-party dtype implementations to work with NumPy, much like third-party array implementations. These can provide features such as, for example, units, jagged arrays or other such features that are outside the scope of NumPy.
Mixing NumPy and ``unumpy`` in the same file ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Normally, one would only want to import only one of ``unumpy`` or ``numpy``, you would import it as ``np`` for familiarity. However, there may be situations where one wishes to mix NumPy and the overrides, and there are a few ways to do this, depending on the user's style::
import numpy.overridable as unumpy import numpy as np
or::
import numpy as np
# Use unumpy via np.overridable
Related Work ------------
Previous override mechanisms ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
* NEP-18, the ``__array_function__`` protocol. `[2]`_ * NEP-13, the ``__array_ufunc__`` protocol. `[3]`_
Existing NumPy-like array implementations ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
* Dask: https://dask.org/ * CuPy: https://cupy.chainer.org/ * PyData/Sparse: https://sparse.pydata.org/ * Xnd: https://xnd.readthedocs.io/ * Astropy's Quantity: https://docs.astropy.org/en/stable/units/
Existing and potential consumers of alternative arrays ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
* Dask: https://dask.org/ * scikit-learn: https://scikit-learn.org/ * xarray: https://xarray.pydata.org/ * TensorLy: http://tensorly.org/
Existing alternate dtype implementations ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
* ``ndtypes``: https://ndtypes.readthedocs.io/en/latest/ * Datashape: https://datashape.readthedocs.io * Plum: https://plum-py.readthedocs.io/
Implementation --------------
The implementation of this NEP will require the following steps:
* Implementation of ``uarray`` multimethods corresponding to the NumPy API, including classes for overriding ``dtype``, ``ufunc`` and ``array`` objects, in the ``unumpy`` repository. * Moving backends from ``unumpy`` into the respective array libraries.
Backward compatibility ----------------------
There are no backward incompatible changes proposed in this NEP.
Alternatives ------------
The current alternative to this problem is NEP-30 plus adding more protocols (not yet specified) in addition to it. Even then, some parts of the NumPy API will remain non-overridable, so it's a partial alternative.
The main alternative to vendoring ``unumpy`` is to simply move it into NumPy completely and not distribute it as a separate package. This would also achieve the proposed goals, however we prefer to keep it a separate package for now, for reasons already stated above.
Discussion ----------
* ``uarray`` blogpost: https://labs.quansight.org/blog/2019/07/uarray-update-api-changes-overhead-a... * The discussion section of NEP-18: https://numpy.org/neps/nep-0018-array-function-protocol.html#discussion * NEP-22: https://numpy.org/neps/nep-0022-ndarray-duck-typing-overview.html * Dask issue #4462: https://github.com/dask/dask/issues/4462 * PR #13046: https://github.com/numpy/numpy/pull/13046 * Dask issue #4883: https://github.com/dask/dask/issues/4883 * Issue #13831: https://github.com/numpy/numpy/issues/13831 * Discussion PR 1: https://github.com/hameerabbasi/numpy/pull/3 * Discussion PR 2: https://github.com/hameerabbasi/numpy/pull/4
References and Footnotes ------------------------
.. _[1]:
[1] uarray, A general dispatch mechanism for Python: https://uarray.readthedocs.io
.. _[2]:
[2] NEP 18 — A dispatch mechanism for NumPy’s high level array functions: https://numpy.org/neps/nep-0018-array-function-protocol.html
.. _[3]:
[3] NEP 22 — Duck typing for NumPy arrays – high level overview: https://numpy.org/neps/nep-0022-ndarray-duck-typing-overview.html
.. _[4]:
[4] NEP 13 — A Mechanism for Overriding Ufuncs: https://numpy.org/neps/nep-0013-ufunc-overrides.html
.. _[5]:
[5] Reply to Adding to the non-dispatched implementation of NumPy methods: http://numpy-discussion.10968.n7.nabble.com/Adding-to-the-non-dispatched-imp...
.. _[6]:
[6] Custom Dtype/Units discussion: http://numpy-discussion.10968.n7.nabble.com/Custom-Dtype-Units-discussion-td...
.. _[7]:
[7] The epic dtype cleanup plan: https://github.com/numpy/numpy/issues/2899
.. _[8]:
[8] unumpy: NumPy, but implementation-independent: https://unumpy.readthedocs.io
.. _[9]:
[9] NEP 30 — Duck Typing for NumPy Arrays - Implementation: https://www.numpy.org/neps/nep-0030-duck-array-protocol.html
.. _[10]:
[10] http://scipy.github.io/devdocs/fft.html#backend-control
Copyright ---------
This document has been placed in the public domain.
Hello everyone;
Thanks to all the feedback from the community, in particular Sebastian Berg, we have a new draft of NEP-31.
Please find the full text quoted below for discussion and reference. Any feedback and discussion is welcome.
============================================================ NEP 31 — Context-local and global overrides of the NumPy API ============================================================
:Author: Hameer Abbasi habbasi@quansight.com :Author: Ralf Gommers rgommers@quansight.com :Author: Peter Bell pbell@quansight.com :Status: Draft :Type: Standards Track :Created: 2019-08-22
Abstract --------
This NEP proposes to make all of NumPy's public API overridable via an extensible backend mechanism.
Acceptance of this NEP means NumPy would provide global and context-local overrides, as well as a dispatch mechanism similar to NEP-18 [2]_. First experiences with ``__array_function__`` show that it is necessary to be able to override NumPy functions that *do not take an array-like argument*, and hence aren't overridable via ``__array_function__``. The most pressing need is array creation and coercion functions, such as ``numpy.zeros`` or ``numpy.asarray``; see e.g. NEP-30 [9]_.
This NEP proposes to allow, in an opt-in fashion, overriding any part of the NumPy API. It is intended as a comprehensive resolution to NEP-22 [3]_, and obviates the need to add an ever-growing list of new protocols for each new type of function or object that needs to become overridable.
Motivation and Scope --------------------
The motivation behind ``uarray`` is manyfold: First, there have been several attempts to allow dispatch of parts of the NumPy API, including (most prominently), the ``__array_ufunc__`` protocol in NEP-13 [4]_, and the ``__array_function__`` protocol in NEP-18 [2]_, but this has shown the need for further protocols to be developed, including a protocol for coercion (see [5]_, [9]_). The reasons these overrides are needed have been extensively discussed in the references, and this NEP will not attempt to go into the details of why these are needed; but in short: It is necessary for library authors to be able to coerce arbitrary objects into arrays of their own types, such as CuPy needing to coerce to a CuPy array, for example, instead of a NumPy array.
These kinds of overrides are useful for both the end-user as well as library authors. End-users may have written or wish to write code that they then later speed up or move to a different implementation, say PyData/Sparse. They can do this simply by setting a backend. Library authors may also wish to write code that is portable across array implementations, for example ``sklearn`` may wish to write code for a machine learning algorithm that is portable across array implementations while also using array creation functions.
This NEP takes a holistic approach: It assumes that there are parts of the API that need to be overridable, and that these will grow over time. It provides a general framework and a mechanism to avoid a design of a new protocol each time this is required. This was the goal of ``uarray``: to allow for overrides in an API without needing the design of a new protocol.
This NEP proposes the following: That ``unumpy`` [8]_ becomes the recommended override mechanism for the parts of the NumPy API not yet covered by ``__array_function__`` or ``__array_ufunc__``, and that ``uarray`` is vendored into a new namespace within NumPy to give users and downstream dependencies access to these overrides. This vendoring mechanism is similar to what SciPy decided to do for making ``scipy.fft`` overridable (see [10]_).
Detailed description --------------------
Using overrides ~~~~~~~~~~~~~~~
The way we propose the overrides will be used by end users is::
# On the library side import numpy.overridable as unp
def library_function(array): array = unp.asarray(array) # Code using unumpy as usual return array
# On the user side: import numpy.overridable as unp import uarray as ua import dask.array as da
ua.register_backend(da)
library_function(dask_array) # works and returns dask_array
with unp.set_backend(da): library_function([1, 2, 3, 4]) # actually returns a Dask array.
Here, ``backend`` can be any compatible object defined either by NumPy or an external library, such as Dask or CuPy. Ideally, it should be the module ``dask.array`` or ``cupy`` itself.
Composing backends ~~~~~~~~~~~~~~~~~~
There are some backends which may depend on other backends, for example xarray depending on `numpy.fft`, and transforming a time axis into a frequency axis, or Dask/xarray holding an array other than a NumPy array inside it. This would be handled in the following manner inside code::
with ua.set_backend(cupy), ua.set_backend(dask.array): # Code that has distributed GPU arrays here
Proposals ~~~~~~~~~
The only change this NEP proposes at its acceptance, is to make ``unumpy`` the officially recommended way to override NumPy. ``unumpy`` will remain a separate repository/package (which we propose to vendor to avoid a hard dependency, and use the separate ``unumpy`` package only if it is installed, rather than depend on for the time being). In concrete terms, ``numpy.overridable`` becomes an alias for ``unumpy``, if available with a fallback to the a vendored version if not. ``uarray`` and ``unumpy`` and will be developed primarily with the input of duck-array authors and secondarily, custom dtype authors, via the usual GitHub workflow. There are a few reasons for this:
* Faster iteration in the case of bugs or issues. * Faster design changes, in the case of needed functionality. * ``unumpy`` will work with older versions of NumPy as well. * The user and library author opt-in to the override process, rather than breakages happening when it is least expected. In simple terms, bugs in ``unumpy`` mean that ``numpy`` remains unaffected.
Advantanges of ``unumpy`` over other solutions ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
``unumpy`` offers a number of advantanges over the approach of defining a new protocol for every problem encountered: Whenever there is something requiring an override, ``unumpy`` will be able to offer a unified API with very minor changes. For example:
* ``ufunc`` objects can be overridden via their ``__call__``, ``reduce`` and other methods. * Other functions can be overridden in a similar fashion. * ``np.asduckarray`` goes away, and becomes ``np.overridable.asarray`` with a backend set. * The same holds for array creation functions such as ``np.zeros``, ``np.empty`` and so on.
This also holds for the future: Making something overridable would require only minor changes to ``unumpy``.
Another promise ``unumpy`` holds is one of default implementations. Default implementations can be provided for any multimethod, in terms of others. This allows one to override a large part of the NumPy API by defining only a small part of it. This is to ease the creation of new duck-arrays, by providing default implementations of many functions that can be easily expressed in terms of others, as well as a repository of utility functions that help in the implementation of duck-arrays that most duck-arrays would require.
It also allows one to override functions in a manner which ``__array_function__`` simply cannot, such as overriding ``np.einsum`` with the version from the ``opt_einsum`` package, or Intel MKL overriding FFT, BLAS or ``ufunc`` objects. They would define a backend with the appropriate multimethods, and the user would select them via a ``with`` statement, or registering them as a backend.
The last benefit is a clear way to coerce to a given backend (via the ``coerce`` keyword in ``ua.set_backend``), and a protocol for coercing not only arrays, but also ``dtype`` objects and ``ufunc`` objects with similar ones from other libraries. This is due to the existence of actual, third party dtype packages, and their desire to blend into the NumPy ecosystem (see [6]_). This is a separate issue compared to the C-level dtype redesign proposed in [7]_, it's about allowing third-party dtype implementations to work with NumPy, much like third-party array implementations. These can provide features such as, for example, units, jagged arrays or other such features that are outside the scope of NumPy.
Mixing NumPy and ``unumpy`` in the same file ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Normally, one would only want to import only one of ``unumpy`` or ``numpy``, you would import it as ``np`` for familiarity. However, there may be situations where one wishes to mix NumPy and the overrides, and there are a few ways to do this, depending on the user's style::
from numpy import overridable as unp import numpy as np
or::
import numpy as np
# Use unumpy via np.overridable
Duck-array coercion ~~~~~~~~~~~~~~~~~~~
There are inherent problems about returning objects that are not NumPy arrays from ``numpy.array`` or ``numpy.asarray``, particularly in the context of C/C++ or Cython code that may get an object with a different memory layout than the one it expects. However, we believe this problem may apply not only to these two functions but all functions that return NumPy arrays. For this reason, overrides are opt-in for the user, by using the submodule ``numpy.overridable`` rather than ``numpy``. NumPy will continue to work unaffected by anything in ``numpy.overridable``.
If the user wishes to obtain a NumPy array, there are two ways of doing it:
1. Use ``numpy.asarray`` (the non-overridable version). 2. Use ``numpy.overridable.asarray`` with the NumPy backend set and coercion enabled
Related Work ------------
Other override mechanisms ~~~~~~~~~~~~~~~~~~~~~~~~~
* NEP-18, the ``__array_function__`` protocol. [2]_ * NEP-13, the ``__array_ufunc__`` protocol. [3]_ * NEP-30, the ``__duck_array__`` protocol. [9]_
Existing NumPy-like array implementations ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
* Dask: https://dask.org/ * CuPy: https://cupy.chainer.org/ * PyData/Sparse: https://sparse.pydata.org/ * Xnd: https://xnd.readthedocs.io/ * Astropy's Quantity: https://docs.astropy.org/en/stable/units/
Existing and potential consumers of alternative arrays ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
* Dask: https://dask.org/ * scikit-learn: https://scikit-learn.org/ * xarray: https://xarray.pydata.org/ * TensorLy: http://tensorly.org/
Existing alternate dtype implementations ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
* ``ndtypes``: https://ndtypes.readthedocs.io/en/latest/ * Datashape: https://datashape.readthedocs.io * Plum: https://plum-py.readthedocs.io/
Implementation --------------
The implementation of this NEP will require the following steps:
* Implementation of ``uarray`` multimethods corresponding to the NumPy API, including classes for overriding ``dtype``, ``ufunc`` and ``array`` objects, in the ``unumpy`` repository. * Moving backends from ``unumpy`` into the respective array libraries.
``uarray`` Primer ~~~~~~~~~~~~~~~~~
**Note:** *This section will not attempt to go into too much detail about uarray, that is the purpose of the uarray documentation.* [1]_ *However, the NumPy community will have input into the design of uarray, via the issue tracker.*
``unumpy`` is the interface that defines a set of overridable functions (multimethods) compatible with the numpy API. To do this, it uses the ``uarray`` library. ``uarray`` is a general purpose tool for creating multimethods that dispatch to one of multiple different possible backend implementations. In this sense, it is similar to the ``__array_function__`` protocol but with the key difference that the backend is explicitly installed by the end-user and not coupled into the array type.
Decoupling the backend from the array type gives much more flexibility to end-users and backend authors. For example, it is possible to:
* override functions not taking arrays as arguments * create backends out of source from the array type * install multiple backends for the same array type
This decoupling also means that ``uarray`` is not constrained to dispatching over array-like types. The backend is free to inspect the entire set of function arguments to determine if it can implement the function e.g. ``dtype`` parameter dispatching.
Defining backends ^^^^^^^^^^^^^^^^^
``uarray`` consists of two main protocols: ``__ua_convert__`` and ``__ua_function__``, called in that order, along with ``__ua_domain__``. ``__ua_convert__`` is for conversion and coercion. It has the signature ``(dispatchables, coerce)``, where ``dispatchables`` is an iterable of ``ua.Dispatchable`` objects and ``coerce`` is a boolean indicating whether or not to force the conversion. ``ua.Dispatchable`` is a simple class consisting of three simple values: ``type``, ``value``, and ``coercible``. ``__ua_convert__`` returns an iterable of the converted values, or ``NotImplemented`` in the case of failure.
``__ua_function__`` has the signature ``(func, args, kwargs)`` and defines the actual implementation of the function. It recieves the function and its arguments. Returning ``NotImplemented`` will cause a move to the default implementation of the function if one exists, and failing that, the next backend.
Here is what will happen assuming a ``uarray`` multimethod is called:
1. We canonicalise the arguments so any arguments without a default are placed in ``*args`` and those with one are placed in ``**kwargs``. 2. We check the list of backends.
a. If it is empty, we try the default implementation.
3. We check if the backend's ``__ua_convert__`` method exists. If it exists:
a. We pass it the output of the dispatcher, which is an iterable of ``ua.Dispatchable`` objects. b. We feed this output, along with the arguments, to the argument replacer. ``NotImplemented`` means we move to 3 with the next backend. c. We store the replaced arguments as the new arguments.
4. We feed the arguments into ``__ua_function__``, and return the output, and exit if it isn't ``NotImplemented``. 5. If the default implementation exists, we try it with the current backend. 6. On failure, we move to 3 with the next backend. If there are no more backends, we move to 7. 7. We raise a ``ua.BackendNotImplementedError``.
Defining overridable multimethods ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
To define an overridable function (a multimethod), one needs a few things:
1. A dispatcher that returns an iterable of ``ua.Dispatchable`` objects. 2. A reverse dispatcher that replaces dispatchable values with the supplied ones. 3. A domain. 4. Optionally, a default implementation, which can be provided in terms of other multimethods.
As an example, consider the following::
import uarray as ua
def full_argreplacer(args, kwargs, dispatchables): def full(shape, fill_value, dtype=None, order='C'): return (shape, fill_value), dict( dtype=dispatchables[0], order=order )
return full(*args, **kwargs)
@ua.create_multimethod(full_argreplacer, domain="numpy") def full(shape, fill_value, dtype=None, order='C'): return (ua.Dispatchable(dtype, np.dtype),)
A large set of examples can be found in the ``unumpy`` repository, [8]_. This simple act of overriding callables allows us to override:
* Methods * Properties, via ``fget`` and ``fset`` * Entire objects, via ``__get__``.
Examples for NumPy ^^^^^^^^^^^^^^^^^^
A library that implements a NumPy-like API will use it in the following manner (as an example)::
import numpy.overridable as unp _ua_implementations = {}
__ua_domain__ = "numpy"
def __ua_function__(func, args, kwargs): fn = _ua_implementations.get(func, None) return fn(*args, **kwargs) if fn is not None else NotImplemented
def implements(ua_func): def inner(func): _ua_implementations[ua_func] = func return func
return inner
@implements(unp.asarray) def asarray(a, dtype=None, order=None): # Code here # Either this method or __ua_convert__ must # return NotImplemented for unsupported types, # Or they shouldn't be marked as dispatchable.
# Provides a default implementation for ones and zeros. @implements(unp.full) def full(shape, fill_value, dtype=None, order='C'): # Code here
Backward compatibility ----------------------
There are no backward incompatible changes proposed in this NEP.
Alternatives ------------
The current alternative to this problem is a combination of NEP-18 [2]_, NEP-13 [4]_ and NEP-30 [9]_ plus adding more protocols (not yet specified) in addition to it. Even then, some parts of the NumPy API will remain non-overridable, so it's a partial alternative.
The main alternative to vendoring ``unumpy`` is to simply move it into NumPy completely and not distribute it as a separate package. This would also achieve the proposed goals, however we prefer to keep it a separate package for now, for reasons already stated above.
The third alternative is to move ``unumpy`` into the NumPy organisation and develop it as a NumPy project. This will also achieve the said goals, and is also a possibility that can be considered by this NEP. However, the act of doing an extra ``pip install`` or ``conda install`` may discourage some users from adopting this method.
Discussion ----------
* ``uarray`` blogpost: https://labs.quansight.org/blog/2019/07/uarray-update-api-changes-overhead-a... * The discussion section of NEP-18: https://numpy.org/neps/nep-0018-array-function-protocol.html#discussion * NEP-22: https://numpy.org/neps/nep-0022-ndarray-duck-typing-overview.html * Dask issue #4462: https://github.com/dask/dask/issues/4462 * PR #13046: https://github.com/numpy/numpy/pull/13046 * Dask issue #4883: https://github.com/dask/dask/issues/4883 * Issue #13831: https://github.com/numpy/numpy/issues/13831 * Discussion PR 1: https://github.com/hameerabbasi/numpy/pull/3 * Discussion PR 2: https://github.com/hameerabbasi/numpy/pull/4 * Discussion PR 3: https://github.com/numpy/numpy/pull/14389
References and Footnotes ------------------------
.. [1] uarray, A general dispatch mechanism for Python: https://uarray.readthedocs.io
.. [2] NEP 18 — A dispatch mechanism for NumPy’s high level array functions: https://numpy.org/neps/nep-0018-array-function-protocol.html
.. [3] NEP 22 — Duck typing for NumPy arrays – high level overview: https://numpy.org/neps/nep-0022-ndarray-duck-typing-overview.html
.. [4] NEP 13 — A Mechanism for Overriding Ufuncs: https://numpy.org/neps/nep-0013-ufunc-overrides.html
.. [5] Reply to Adding to the non-dispatched implementation of NumPy methods: http://numpy-discussion.10968.n7.nabble.com/Adding-to-the-non-dispatched-imp...
.. [6] Custom Dtype/Units discussion: http://numpy-discussion.10968.n7.nabble.com/Custom-Dtype-Units-discussion-td...
.. [7] The epic dtype cleanup plan: https://github.com/numpy/numpy/issues/2899
.. [8] unumpy: NumPy, but implementation-independent: https://unumpy.readthedocs.io
.. [9] NEP 30 — Duck Typing for NumPy Arrays - Implementation: https://www.numpy.org/neps/nep-0030-duck-array-protocol.html
.. [10] http://scipy.github.io/devdocs/fft.html#backend-control
Copyright ---------
This document has been placed in the public domain.
Thanks to all the feedback, we have a new PR of NEP-31.
Please find the full-text quoted below:
============================================================
NEP 31 — Context-local and global overrides of the NumPy API
============================================================
:Author: Hameer Abbasi habbasi@quansight.com
:Author: Ralf Gommers rgommers@quansight.com
:Author: Peter Bell pbell@quansight.com
:Status: Draft
:Type: Standards Track
:Created: 2019-08-22
Abstract
--------
This NEP proposes to make all of NumPy's public API overridable via an
extensible backend mechanism.
Acceptance of this NEP means NumPy would provide global and context-local
overrides, as well as a dispatch mechanism similar to NEP-18 [2]_. First
experiences with ``__array_function__`` show that it is necessary to be able
to override NumPy functions that *do not take an array-like argument*, and
hence aren't overridable via ``__array_function__``. The most pressing need is
array creation and coercion functions, such as ``numpy.zeros`` or
``numpy.asarray``; see e.g. NEP-30 [9]_.
This NEP proposes to allow, in an opt-in fashion, overriding any part of the
NumPy API. It is intended as a comprehensive resolution to NEP-22 [3]_, and
obviates the need to add an ever-growing list of new protocols for each new
type of function or object that needs to become overridable.
Motivation and Scope
--------------------
The motivation behind ``uarray`` is manyfold: First, there have been several
attempts to allow dispatch of parts of the NumPy API, including (most
prominently), the ``__array_ufunc__`` protocol in NEP-13 [4]_, and the
``__array_function__`` protocol in NEP-18 [2]_, but this has shown the need
for further protocols to be developed, including a protocol for coercion (see
[5]_, [9]_). The reasons these overrides are needed have been extensively
discussed in the references, and this NEP will not attempt to go into the
details of why these are needed; but in short: It is necessary for library
authors to be able to coerce arbitrary objects into arrays of their own types,
such as CuPy needing to coerce to a CuPy array, for example, instead of
a NumPy array.
These kinds of overrides are useful for both the end-user as well as library
authors. End-users may have written or wish to write code that they then later
speed up or move to a different implementation, say PyData/Sparse. They can do
this simply by setting a backend. Library authors may also wish to write code
that is portable across array implementations, for example ``sklearn`` may wish
to write code for a machine learning algorithm that is portable across array
implementations while also using array creation functions.
This NEP takes a holistic approach: It assumes that there are parts of
the API that need to be overridable, and that these will grow over time. It
provides a general framework and a mechanism to avoid a design of a new
protocol each time this is required. This was the goal of ``uarray``: to
allow for overrides in an API without needing the design of a new protocol.
This NEP proposes the following: That ``unumpy`` [8]_ becomes the
recommended override mechanism for the parts of the NumPy API not yet covered
by ``__array_function__`` or ``__array_ufunc__``, and that ``uarray`` is
vendored into a new namespace within NumPy to give users and downstream
dependencies access to these overrides. This vendoring mechanism is similar
to what SciPy decided to do for making ``scipy.fft`` overridable (see [10]_).
Detailed description
--------------------
Using overrides
~~~~~~~~~~~~~~~
The way we propose the overrides will be used by end users is::
# On the library side
import numpy.overridable as unp
def library_function(array):
array = unp.asarray(array)
# Code using unumpy as usual
return array
# On the user side:
import numpy.overridable as unp
import uarray as ua
import dask.array as da
ua.register_backend(da)
library_function(dask_array) # works and returns dask_array
with unp.set_backend(da):
library_function([1, 2, 3, 4]) # actually returns a Dask array.
Here, ``backend`` can be any compatible object defined either by NumPy or an
external library, such as Dask or CuPy. Ideally, it should be the module
``dask.array`` or ``cupy`` itself.
Composing backends
~~~~~~~~~~~~~~~~~~
There are some backends which may depend on other backends, for example xarray
depending on `numpy.fft`, and transforming a time axis into a frequency axis,
or Dask/xarray holding an array other than a NumPy array inside it. This would
be handled in the following manner inside code::
with ua.set_backend(cupy), ua.set_backend(dask.array):
# Code that has distributed GPU arrays here
Proposals
~~~~~~~~~
The only change this NEP proposes at its acceptance, is to make ``unumpy`` the
officially recommended way to override NumPy, along with making some submodules
overridable by default via ``uarray``. ``unumpy`` will remain a separate
repository/package (which we propose to vendor to avoid a hard dependency, and
use the separate ``unumpy`` package only if it is installed, rather than depend
on for the time being). In concrete terms, ``numpy.overridable`` becomes an
alias for ``unumpy``, if available with a fallback to the a vendored version if
not. ``uarray`` and ``unumpy`` and will be developed primarily with the input
of duck-array authors and secondarily, custom dtype authors, via the usual
GitHub workflow. There are a few reasons for this:
* Faster iteration in the case of bugs or issues.
* Faster design changes, in the case of needed functionality.
* ``unumpy`` will work with older versions of NumPy as well.
* The user and library author opt-in to the override process,
rather than breakages happening when it is least expected.
In simple terms, bugs in ``unumpy`` mean that ``numpy`` remains
unaffected.
* For ``numpy.fft``, ``numpy.linalg`` and ``numpy.random``, the functions in
the main namespace will mirror those in the ``numpy.overridable`` namespace.
The reason for this is that there may exist functions in the in these
submodules that need backends, even for ``numpy.ndarray`` inputs.
Advantanges of ``unumpy`` over other solutions
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
``unumpy`` offers a number of advantanges over the approach of defining a new
protocol for every problem encountered: Whenever there is something requiring
an override, ``unumpy`` will be able to offer a unified API with very minor
changes. For example:
* ``ufunc`` objects can be overridden via their ``__call__``, ``reduce`` and
other methods.
* Other functions can be overridden in a similar fashion.
* ``np.asduckarray`` goes away, and becomes ``np.overridable.asarray`` with a
backend set.
* The same holds for array creation functions such as ``np.zeros``,
``np.empty`` and so on.
This also holds for the future: Making something overridable would require only
minor changes to ``unumpy``.
Another promise ``unumpy`` holds is one of default implementations. Default
implementations can be provided for any multimethod, in terms of others. This
allows one to override a large part of the NumPy API by defining only a small
part of it. This is to ease the creation of new duck-arrays, by providing
default implementations of many functions that can be easily expressed in
terms of others, as well as a repository of utility functions that help in the
implementation of duck-arrays that most duck-arrays would require. This would
allow us to avoid designing entire protocols, e.g., a protocol for stacking
and concatenating would be replaced by simply implementing ``stack`` and/or
``concatenate`` and then providing default implementations for everything else
in that class. The same applies for transposing, and many other functions for
which protocols haven't been proposed, such as ``isin`` in terms of ``in1d``,
``setdiff1d`` in terms of ``unique``, and so on.
It also allows one to override functions in a manner which
``__array_function__`` simply cannot, such as overriding ``np.einsum`` with the
version from the ``opt_einsum`` package, or Intel MKL overriding FFT, BLAS
or ``ufunc`` objects. They would define a backend with the appropriate
multimethods, and the user would select them via a ``with`` statement, or
registering them as a backend.
The last benefit is a clear way to coerce to a given backend (via the
``coerce`` keyword in ``ua.set_backend``), and a protocol
for coercing not only arrays, but also ``dtype`` objects and ``ufunc`` objects
with similar ones from other libraries. This is due to the existence of actual,
third party dtype packages, and their desire to blend into the NumPy ecosystem
(see [6]_). This is a separate issue compared to the C-level dtype redesign
proposed in [7]_, it's about allowing third-party dtype implementations to
work with NumPy, much like third-party array implementations. These can provide
features such as, for example, units, jagged arrays or other such features that
are outside the scope of NumPy.
Mixing NumPy and ``unumpy`` in the same file
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Normally, one would only want to import only one of ``unumpy`` or ``numpy``,
you would import it as ``np`` for familiarity. However, there may be situations
where one wishes to mix NumPy and the overrides, and there are a few ways to do
this, depending on the user's style::
from numpy import overridable as unp
import numpy as np
or::
import numpy as np
# Use unumpy via np.overridable
Duck-array coercion
~~~~~~~~~~~~~~~~~~~
There are inherent problems about returning objects that are not NumPy arrays
from ``numpy.array`` or ``numpy.asarray``, particularly in the context of C/C++
or Cython code that may get an object with a different memory layout than the
one it expects. However, we believe this problem may apply not only to these
two functions but all functions that return NumPy arrays. For this reason,
overrides are opt-in for the user, by using the submodule ``numpy.overridable``
rather than ``numpy``. NumPy will continue to work unaffected by anything in
``numpy.overridable``.
If the user wishes to obtain a NumPy array, there are two ways of doing it:
1. Use ``numpy.asarray`` (the non-overridable version).
2. Use ``numpy.overridable.asarray`` with the NumPy backend set and coercion
enabled
Aliases outside of the ``numpy.overridable`` namespace
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
All functionality in ``numpy.random``, ``numpy.linalg`` and ``numpy.fft``
will be aliased to their respective overridable versions inside
``numpy.overridable``. The reason for this is that there are alternative
implementations of RNGs (``mkl-random``), linear algebra routines (``eigen``,
``blis``) and FFT routines (``mkl-fft``, ``pyFFTW``) that need to operate on
``numpy.ndarray`` inputs, but still need the ability to switch behaviour.
This is different from monkeypatching in a few different ways:
* The caller-facing signature of the function is always the same,
so there is at least the loose sense of an API contract. Monkeypatching
does not provide this ability.
* There is the ability of locally switching the backend.
* It has been `suggested http://numpy-discussion.10968.n7.nabble.com/NEP-31-Context-local-and-global-overrides-of-the-NumPy-API-tp47452p47472.html`_
that the reason that 1.17 hasn't landed in the Anaconda defaults channel is
due to the incompatibility between monkeypatching and ``__array_function__``,
as monkeypatching would bypass the protocol completely.
* Statements of the form ``from numpy import x; x`` and ``np.x`` would have
different results depending on whether the import was made before or
after monkeypatching happened.
All this isn't possible at all with ``__array_function__`` or
``__array_ufunc__``.
It has been formally realised (at least in part) that a backend system is
needed for this, in the `NumPy roadmap https://numpy.org/neps/roadmap.html#other-functionality`_.
For ``numpy.random``, it's still necessary to make the C-API fit the one
proposed in `NEP-19 https://numpy.org/neps/nep-0019-rng-policy.html`_.
This is impossible for `mkl-random`, because then it would need to be
rewritten to fit that framework. The guarantees on stream
compatibility will be the same as before, but if there's a backend that affects
``numpy.random`` set, we make no guarantees about stream compatibility, and it
is up to the backend author to provide their own guarantees.
Providing a way for implicit dispatch
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
It has been suggested that the ability to dispatch methods which do not take
a dispatchable is needed, while guessing that backend from another dispatchable.
As a concrete example, consider the following:
.. code:: python
with unumpy.determine_backend(array_like, np.ndarray):
unumpy.arange(len(array_like))
While this does not exist yet in ``uarray``, it is trivial to add it. The need for
this kind of code exists because one might want to have an alternative for the
proposed ``*_like`` functions, or the ``like=`` keyword argument. The need for these
exists because there are functions in the NumPy API that do not take a dispatchable
argument, but there is still the need to select a backend based on a different
dispatchable.
The need for an opt-in module
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
The need for an opt-in module is realised because of a few reasons:
* There are parts of the API (like `numpy.asarray`) that simply cannot be
overridden due to incompatibility concerns with C/Cython extensions, however,
one may want to coerce to a duck-array using ``asarray`` with a backend set.
* There are possible issues around an implicit option and monkeypatching, such
as those mentioned above.
NEP 18 notes that this may require maintenance of two separate APIs. However,
this burden may be lessened by, for example, parametrizing all tests over
``numpy.overridable`` separately via a fixture. This also has the side-effect
of thoroughly testing it, unlike ``__array_function__``. We also feel that it
provides an oppurtunity to separate the NumPy API contract properly from the
implementation.
Benefits to end-users and mixing backends
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Mixing backends is easy in ``uarray``, one only has to do:
.. code:: python
# Explicitly say which backends you want to mix
ua.register_backend(backend1)
ua.register_backend(backend2)
ua.register_backend(backend3)
# Freely use code that mixes backends here.
The benefits to end-users extend beyond just writing new code. Old code
(usually in the form of scripts) can be easily ported to different backends
by a simple import switch and a line adding the preferred backend. This way,
users may find it easier to port existing code to GPU or distributed computing.
Related Work
------------
Other override mechanisms
~~~~~~~~~~~~~~~~~~~~~~~~~
* NEP-18, the ``__array_function__`` protocol. [2]_
* NEP-13, the ``__array_ufunc__`` protocol. [3]_
* NEP-30, the ``__duck_array__`` protocol. [9]_
Existing NumPy-like array implementations
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
* Dask: https://dask.org/
* CuPy: https://cupy.chainer.org/
* PyData/Sparse: https://sparse.pydata.org/
* Xnd: https://xnd.readthedocs.io/
* Astropy's Quantity: https://docs.astropy.org/en/stable/units/
Existing and potential consumers of alternative arrays
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
* Dask: https://dask.org/
* scikit-learn: https://scikit-learn.org/
* xarray: https://xarray.pydata.org/
* TensorLy: http://tensorly.org/
Existing alternate dtype implementations
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
* ``ndtypes``: https://ndtypes.readthedocs.io/en/latest/
* Datashape: https://datashape.readthedocs.io
* Plum: https://plum-py.readthedocs.io/
Alternate implementations of parts of the NumPy API
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
* ``mkl_random``: https://github.com/IntelPython/mkl_random
* ``mkl_fft``: https://github.com/IntelPython/mkl_fft
* ``bottleneck``: https://github.com/pydata/bottleneck
* ``opt_einsum``: https://github.com/dgasmith/opt_einsum
Implementation
--------------
The implementation of this NEP will require the following steps:
* Implementation of ``uarray`` multimethods corresponding to the
NumPy API, including classes for overriding ``dtype``, ``ufunc``
and ``array`` objects, in the ``unumpy`` repository.
* Moving backends from ``unumpy`` into the respective array libraries.
``uarray`` Primer
~~~~~~~~~~~~~~~~~
**Note:** *This section will not attempt to go into too much detail about
uarray, that is the purpose of the uarray documentation.* [1]_
*However, the NumPy community will have input into the design of
uarray, via the issue tracker.*
``unumpy`` is the interface that defines a set of overridable functions
(multimethods) compatible with the numpy API. To do this, it uses the
``uarray`` library. ``uarray`` is a general purpose tool for creating
multimethods that dispatch to one of multiple different possible backend
implementations. In this sense, it is similar to the ``__array_function__``
protocol but with the key difference that the backend is explicitly installed
by the end-user and not coupled into the array type.
Decoupling the backend from the array type gives much more flexibility to
end-users and backend authors. For example, it is possible to:
* override functions not taking arrays as arguments
* create backends out of source from the array type
* install multiple backends for the same array type
This decoupling also means that ``uarray`` is not constrained to dispatching
over array-like types. The backend is free to inspect the entire set of
function arguments to determine if it can implement the function e.g. ``dtype``
parameter dispatching.
Defining backends
^^^^^^^^^^^^^^^^^
``uarray`` consists of two main protocols: ``__ua_convert__`` and
``__ua_function__``, called in that order, along with ``__ua_domain__``.
``__ua_convert__`` is for conversion and coercion. It has the signature
``(dispatchables, coerce)``, where ``dispatchables`` is an iterable of
``ua.Dispatchable`` objects and ``coerce`` is a boolean indicating whether or
not to force the conversion. ``ua.Dispatchable`` is a simple class consisting
of three simple values: ``type``, ``value``, and ``coercible``.
``__ua_convert__`` returns an iterable of the converted values, or
``NotImplemented`` in the case of failure.
``__ua_function__`` has the signature ``(func, args, kwargs)`` and defines
the actual implementation of the function. It recieves the function and its
arguments. Returning ``NotImplemented`` will cause a move to the default
implementation of the function if one exists, and failing that, the next
backend.
Here is what will happen assuming a ``uarray`` multimethod is called:
1. We canonicalise the arguments so any arguments without a default
are placed in ``*args`` and those with one are placed in ``**kwargs``.
2. We check the list of backends.
a. If it is empty, we try the default implementation.
3. We check if the backend's ``__ua_convert__`` method exists. If it exists:
a. We pass it the output of the dispatcher,
which is an iterable of ``ua.Dispatchable`` objects.
b. We feed this output, along with the arguments,
to the argument replacer. ``NotImplemented`` means we move to 3
with the next backend.
c. We store the replaced arguments as the new arguments.
4. We feed the arguments into ``__ua_function__``, and return the output, and
exit if it isn't ``NotImplemented``.
5. If the default implementation exists, we try it with the current backend.
6. On failure, we move to 3 with the next backend. If there are no more
backends, we move to 7.
7. We raise a ``ua.BackendNotImplementedError``.
Defining overridable multimethods
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
To define an overridable function (a multimethod), one needs a few things:
1. A dispatcher that returns an iterable of ``ua.Dispatchable`` objects.
2. A reverse dispatcher that replaces dispatchable values with the supplied
ones.
3. A domain.
4. Optionally, a default implementation, which can be provided in terms of
other multimethods.
As an example, consider the following::
import uarray as ua
def full_argreplacer(args, kwargs, dispatchables):
def full(shape, fill_value, dtype=None, order='C'):
return (shape, fill_value), dict(
dtype=dispatchables[0],
order=order
)
return full(*args, **kwargs)
@ua.create_multimethod(full_argreplacer, domain="numpy")
def full(shape, fill_value, dtype=None, order='C'):
return (ua.Dispatchable(dtype, np.dtype),)
A large set of examples can be found in the ``unumpy`` repository, [8]_.
This simple act of overriding callables allows us to override:
* Methods
* Properties, via ``fget`` and ``fset``
* Entire objects, via ``__get__``.
Examples for NumPy
^^^^^^^^^^^^^^^^^^
A library that implements a NumPy-like API will use it in the following
manner (as an example)::
import numpy.overridable as unp
_ua_implementations = {}
__ua_domain__ = "numpy"
def __ua_function__(func, args, kwargs):
fn = _ua_implementations.get(func, None)
return fn(*args, **kwargs) if fn is not None else NotImplemented
def implements(ua_func):
def inner(func):
_ua_implementations[ua_func] = func
return func
return inner
@implements(unp.asarray)
def asarray(a, dtype=None, order=None):
# Code here
# Either this method or __ua_convert__ must
# return NotImplemented for unsupported types,
# Or they shouldn't be marked as dispatchable.
# Provides a default implementation for ones and zeros.
@implements(unp.full)
def full(shape, fill_value, dtype=None, order='C'):
# Code here
Backward compatibility
----------------------
There are no backward incompatible changes proposed in this NEP.
Alternatives
------------
The current alternative to this problem is a combination of NEP-18 [2]_,
NEP-13 [4]_ and NEP-30 [9]_ plus adding more protocols (not yet specified)
in addition to it. Even then, some parts of the NumPy API will remain
non-overridable, so it's a partial alternative.
The main alternative to vendoring ``unumpy`` is to simply move it into NumPy
completely and not distribute it as a separate package. This would also achieve
the proposed goals, however we prefer to keep it a separate package for now,
for reasons already stated above.
The third alternative is to move ``unumpy`` into the NumPy organisation and
develop it as a NumPy project. This will also achieve the said goals, and is
also a possibility that can be considered by this NEP. However, the act of
doing an extra ``pip install`` or ``conda install`` may discourage some users
from adopting this method.
An alternative to requiring opt-in is mainly to *not* override ``np.asarray``
and ``np.array``, and making the rest of the NumPy API surface overridable,
instead providing ``np.duckarray`` and ``np.asduckarray``
as duck-array friendly alternatives that used the respective overrides. However,
this has the downside of adding a minor overhead to NumPy calls.
Discussion
----------
* ``uarray`` blogpost: https://labs.quansight.org/blog/2019/07/uarray-update-api-changes-overhead-a...
* The discussion section of NEP-18: https://numpy.org/neps/nep-0018-array-function-protocol.html#discussion
* NEP-22: https://numpy.org/neps/nep-0022-ndarray-duck-typing-overview.html
* Dask issue #4462: https://github.com/dask/dask/issues/4462
* PR #13046: https://github.com/numpy/numpy/pull/13046
* Dask issue #4883: https://github.com/dask/dask/issues/4883
* Issue #13831: https://github.com/numpy/numpy/issues/13831
* Discussion PR 1: https://github.com/hameerabbasi/numpy/pull/3
* Discussion PR 2: https://github.com/hameerabbasi/numpy/pull/4
* Discussion PR 3: https://github.com/numpy/numpy/pull/14389
References and Footnotes
------------------------
.. [1] uarray, A general dispatch mechanism for Python: https://uarray.readthedocs.io
.. [2] NEP 18 — A dispatch mechanism for NumPy’s high level array functions: https://numpy.org/neps/nep-0018-array-function-protocol.html
.. [3] NEP 22 — Duck typing for NumPy arrays – high level overview: https://numpy.org/neps/nep-0022-ndarray-duck-typing-overview.html
.. [4] NEP 13 — A Mechanism for Overriding Ufuncs: https://numpy.org/neps/nep-0013-ufunc-overrides.html
.. [5] Reply to Adding to the non-dispatched implementation of NumPy methods: http://numpy-discussion.10968.n7.nabble.com/Adding-to-the-non-dispatched-imp...
.. [6] Custom Dtype/Units discussion: http://numpy-discussion.10968.n7.nabble.com/Custom-Dtype-Units-discussion-td...
.. [7] The epic dtype cleanup plan: https://github.com/numpy/numpy/issues/2899
.. [8] unumpy: NumPy, but implementation-independent: https://unumpy.readthedocs.io
.. [9] NEP 30 — Duck Typing for NumPy Arrays - Implementation: https://www.numpy.org/neps/nep-0030-duck-array-protocol.html
.. [10] http://scipy.github.io/devdocs/fft.html#backend-control
Copyright
---------
This document has been placed in the public domain.
From: NumPy-Discussion numpy-discussion-bounces+hameerabbasi=yahoo.com@python.org on behalf of Hameer Abbasi einstein.edison@gmail.com Reply to: Discussion of Numerical Python numpy-discussion@python.org Date: Thursday, 5. September 2019 at 17:12 To: numpy-discussion@python.org Subject: Re: [Numpy-discussion] NEP 31 — Context-local and global overrides of the NumPy API
Hello everyone;
Thanks to all the feedback from the community, in particular Sebastian Berg, we have a new draft of NEP-31.
Please find the full text quoted below for discussion and reference. Any feedback and discussion is welcome.
============================================================ NEP 31 — Context-local and global overrides of the NumPy API ============================================================
:Author: Hameer Abbasi habbasi@quansight.com :Author: Ralf Gommers rgommers@quansight.com :Author: Peter Bell pbell@quansight.com :Status: Draft :Type: Standards Track :Created: 2019-08-22
Abstract --------
This NEP proposes to make all of NumPy's public API overridable via an extensible backend mechanism.
Acceptance of this NEP means NumPy would provide global and context-local overrides, as well as a dispatch mechanism similar to NEP-18 [2]_. First experiences with ``__array_function__`` show that it is necessary to be able to override NumPy functions that *do not take an array-like argument*, and hence aren't overridable via ``__array_function__``. The most pressing need is array creation and coercion functions, such as ``numpy.zeros`` or ``numpy.asarray``; see e.g. NEP-30 [9]_.
This NEP proposes to allow, in an opt-in fashion, overriding any part of the NumPy API. It is intended as a comprehensive resolution to NEP-22 [3]_, and obviates the need to add an ever-growing list of new protocols for each new type of function or object that needs to become overridable.
Motivation and Scope --------------------
The motivation behind ``uarray`` is manyfold: First, there have been several attempts to allow dispatch of parts of the NumPy API, including (most prominently), the ``__array_ufunc__`` protocol in NEP-13 [4]_, and the ``__array_function__`` protocol in NEP-18 [2]_, but this has shown the need for further protocols to be developed, including a protocol for coercion (see [5]_, [9]_). The reasons these overrides are needed have been extensively discussed in the references, and this NEP will not attempt to go into the details of why these are needed; but in short: It is necessary for library authors to be able to coerce arbitrary objects into arrays of their own types, such as CuPy needing to coerce to a CuPy array, for example, instead of a NumPy array.
These kinds of overrides are useful for both the end-user as well as library authors. End-users may have written or wish to write code that they then later speed up or move to a different implementation, say PyData/Sparse. They can do this simply by setting a backend. Library authors may also wish to write code that is portable across array implementations, for example ``sklearn`` may wish to write code for a machine learning algorithm that is portable across array implementations while also using array creation functions.
This NEP takes a holistic approach: It assumes that there are parts of the API that need to be overridable, and that these will grow over time. It provides a general framework and a mechanism to avoid a design of a new protocol each time this is required. This was the goal of ``uarray``: to allow for overrides in an API without needing the design of a new protocol.
This NEP proposes the following: That ``unumpy`` [8]_ becomes the recommended override mechanism for the parts of the NumPy API not yet covered by ``__array_function__`` or ``__array_ufunc__``, and that ``uarray`` is vendored into a new namespace within NumPy to give users and downstream dependencies access to these overrides. This vendoring mechanism is similar to what SciPy decided to do for making ``scipy.fft`` overridable (see [10]_).
Detailed description --------------------
Using overrides ~~~~~~~~~~~~~~~
The way we propose the overrides will be used by end users is::
# On the library side import numpy.overridable as unp
def library_function(array): array = unp.asarray(array) # Code using unumpy as usual return array
# On the user side: import numpy.overridable as unp import uarray as ua import dask.array as da
ua.register_backend(da)
library_function(dask_array) # works and returns dask_array
with unp.set_backend(da): library_function([1, 2, 3, 4]) # actually returns a Dask array.
Here, ``backend`` can be any compatible object defined either by NumPy or an external library, such as Dask or CuPy. Ideally, it should be the module ``dask.array`` or ``cupy`` itself.
Composing backends ~~~~~~~~~~~~~~~~~~
There are some backends which may depend on other backends, for example xarray depending on `numpy.fft`, and transforming a time axis into a frequency axis, or Dask/xarray holding an array other than a NumPy array inside it. This would be handled in the following manner inside code::
with ua.set_backend(cupy), ua.set_backend(dask.array): # Code that has distributed GPU arrays here
Proposals ~~~~~~~~~
The only change this NEP proposes at its acceptance, is to make ``unumpy`` the officially recommended way to override NumPy. ``unumpy`` will remain a separate repository/package (which we propose to vendor to avoid a hard dependency, and use the separate ``unumpy`` package only if it is installed, rather than depend on for the time being). In concrete terms, ``numpy.overridable`` becomes an alias for ``unumpy``, if available with a fallback to the a vendored version if not. ``uarray`` and ``unumpy`` and will be developed primarily with the input of duck-array authors and secondarily, custom dtype authors, via the usual GitHub workflow. There are a few reasons for this:
* Faster iteration in the case of bugs or issues. * Faster design changes, in the case of needed functionality. * ``unumpy`` will work with older versions of NumPy as well. * The user and library author opt-in to the override process, rather than breakages happening when it is least expected. In simple terms, bugs in ``unumpy`` mean that ``numpy`` remains unaffected.
Advantanges of ``unumpy`` over other solutions ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
``unumpy`` offers a number of advantanges over the approach of defining a new protocol for every problem encountered: Whenever there is something requiring an override, ``unumpy`` will be able to offer a unified API with very minor changes. For example:
* ``ufunc`` objects can be overridden via their ``__call__``, ``reduce`` and other methods. * Other functions can be overridden in a similar fashion. * ``np.asduckarray`` goes away, and becomes ``np.overridable.asarray`` with a backend set. * The same holds for array creation functions such as ``np.zeros``, ``np.empty`` and so on.
This also holds for the future: Making something overridable would require only minor changes to ``unumpy``.
Another promise ``unumpy`` holds is one of default implementations. Default implementations can be provided for any multimethod, in terms of others. This allows one to override a large part of the NumPy API by defining only a small part of it. This is to ease the creation of new duck-arrays, by providing default implementations of many functions that can be easily expressed in terms of others, as well as a repository of utility functions that help in the implementation of duck-arrays that most duck-arrays would require.
It also allows one to override functions in a manner which ``__array_function__`` simply cannot, such as overriding ``np.einsum`` with the version from the ``opt_einsum`` package, or Intel MKL overriding FFT, BLAS or ``ufunc`` objects. They would define a backend with the appropriate multimethods, and the user would select them via a ``with`` statement, or registering them as a backend.
The last benefit is a clear way to coerce to a given backend (via the ``coerce`` keyword in ``ua.set_backend``), and a protocol for coercing not only arrays, but also ``dtype`` objects and ``ufunc`` objects with similar ones from other libraries. This is due to the existence of actual, third party dtype packages, and their desire to blend into the NumPy ecosystem (see [6]_). This is a separate issue compared to the C-level dtype redesign proposed in [7]_, it's about allowing third-party dtype implementations to work with NumPy, much like third-party array implementations. These can provide features such as, for example, units, jagged arrays or other such features that are outside the scope of NumPy.
Mixing NumPy and ``unumpy`` in the same file ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Normally, one would only want to import only one of ``unumpy`` or ``numpy``, you would import it as ``np`` for familiarity. However, there may be situations where one wishes to mix NumPy and the overrides, and there are a few ways to do this, depending on the user's style::
from numpy import overridable as unp import numpy as np
or::
import numpy as np
# Use unumpy via np.overridable
Duck-array coercion ~~~~~~~~~~~~~~~~~~~
There are inherent problems about returning objects that are not NumPy arrays from ``numpy.array`` or ``numpy.asarray``, particularly in the context of C/C++ or Cython code that may get an object with a different memory layout than the one it expects. However, we believe this problem may apply not only to these two functions but all functions that return NumPy arrays. For this reason, overrides are opt-in for the user, by using the submodule ``numpy.overridable`` rather than ``numpy``. NumPy will continue to work unaffected by anything in ``numpy.overridable``.
If the user wishes to obtain a NumPy array, there are two ways of doing it:
1. Use ``numpy.asarray`` (the non-overridable version). 2. Use ``numpy.overridable.asarray`` with the NumPy backend set and coercion enabled
Related Work ------------
Other override mechanisms ~~~~~~~~~~~~~~~~~~~~~~~~~
* NEP-18, the ``__array_function__`` protocol. [2]_ * NEP-13, the ``__array_ufunc__`` protocol. [3]_ * NEP-30, the ``__duck_array__`` protocol. [9]_
Existing NumPy-like array implementations ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
* Dask: https://dask.org/ * CuPy: https://cupy.chainer.org/ * PyData/Sparse: https://sparse.pydata.org/ * Xnd: https://xnd.readthedocs.io/ * Astropy's Quantity: https://docs.astropy.org/en/stable/units/
Existing and potential consumers of alternative arrays ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
* Dask: https://dask.org/ * scikit-learn: https://scikit-learn.org/ * xarray: https://xarray.pydata.org/ * TensorLy: http://tensorly.org/
Existing alternate dtype implementations ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
* ``ndtypes``: https://ndtypes.readthedocs.io/en/latest/ * Datashape: https://datashape.readthedocs.io * Plum: https://plum-py.readthedocs.io/
Implementation --------------
The implementation of this NEP will require the following steps:
* Implementation of ``uarray`` multimethods corresponding to the NumPy API, including classes for overriding ``dtype``, ``ufunc`` and ``array`` objects, in the ``unumpy`` repository. * Moving backends from ``unumpy`` into the respective array libraries.
``uarray`` Primer ~~~~~~~~~~~~~~~~~
**Note:** *This section will not attempt to go into too much detail about uarray, that is the purpose of the uarray documentation.* [1]_ *However, the NumPy community will have input into the design of uarray, via the issue tracker.*
``unumpy`` is the interface that defines a set of overridable functions (multimethods) compatible with the numpy API. To do this, it uses the ``uarray`` library. ``uarray`` is a general purpose tool for creating multimethods that dispatch to one of multiple different possible backend implementations. In this sense, it is similar to the ``__array_function__`` protocol but with the key difference that the backend is explicitly installed by the end-user and not coupled into the array type.
Decoupling the backend from the array type gives much more flexibility to end-users and backend authors. For example, it is possible to:
* override functions not taking arrays as arguments * create backends out of source from the array type * install multiple backends for the same array type
This decoupling also means that ``uarray`` is not constrained to dispatching over array-like types. The backend is free to inspect the entire set of function arguments to determine if it can implement the function e.g. ``dtype`` parameter dispatching.
Defining backends ^^^^^^^^^^^^^^^^^
``uarray`` consists of two main protocols: ``__ua_convert__`` and ``__ua_function__``, called in that order, along with ``__ua_domain__``. ``__ua_convert__`` is for conversion and coercion. It has the signature ``(dispatchables, coerce)``, where ``dispatchables`` is an iterable of ``ua.Dispatchable`` objects and ``coerce`` is a boolean indicating whether or not to force the conversion. ``ua.Dispatchable`` is a simple class consisting of three simple values: ``type``, ``value``, and ``coercible``. ``__ua_convert__`` returns an iterable of the converted values, or ``NotImplemented`` in the case of failure.
``__ua_function__`` has the signature ``(func, args, kwargs)`` and defines the actual implementation of the function. It recieves the function and its arguments. Returning ``NotImplemented`` will cause a move to the default implementation of the function if one exists, and failing that, the next backend.
Here is what will happen assuming a ``uarray`` multimethod is called:
1. We canonicalise the arguments so any arguments without a default are placed in ``*args`` and those with one are placed in ``**kwargs``. 2. We check the list of backends.
a. If it is empty, we try the default implementation.
3. We check if the backend's ``__ua_convert__`` method exists. If it exists:
a. We pass it the output of the dispatcher, which is an iterable of ``ua.Dispatchable`` objects. b. We feed this output, along with the arguments, to the argument replacer. ``NotImplemented`` means we move to 3 with the next backend. c. We store the replaced arguments as the new arguments.
4. We feed the arguments into ``__ua_function__``, and return the output, and exit if it isn't ``NotImplemented``. 5. If the default implementation exists, we try it with the current backend. 6. On failure, we move to 3 with the next backend. If there are no more backends, we move to 7. 7. We raise a ``ua.BackendNotImplementedError``.
Defining overridable multimethods ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
To define an overridable function (a multimethod), one needs a few things:
1. A dispatcher that returns an iterable of ``ua.Dispatchable`` objects. 2. A reverse dispatcher that replaces dispatchable values with the supplied ones. 3. A domain. 4. Optionally, a default implementation, which can be provided in terms of other multimethods.
As an example, consider the following::
import uarray as ua
def full_argreplacer(args, kwargs, dispatchables): def full(shape, fill_value, dtype=None, order='C'): return (shape, fill_value), dict( dtype=dispatchables[0], order=order )
return full(*args, **kwargs)
@ua.create_multimethod(full_argreplacer, domain="numpy") def full(shape, fill_value, dtype=None, order='C'): return (ua.Dispatchable(dtype, np.dtype),)
A large set of examples can be found in the ``unumpy`` repository, [8]_. This simple act of overriding callables allows us to override:
* Methods * Properties, via ``fget`` and ``fset`` * Entire objects, via ``__get__``.
Examples for NumPy ^^^^^^^^^^^^^^^^^^
A library that implements a NumPy-like API will use it in the following manner (as an example)::
import numpy.overridable as unp _ua_implementations = {}
__ua_domain__ = "numpy"
def __ua_function__(func, args, kwargs): fn = _ua_implementations.get(func, None) return fn(*args, **kwargs) if fn is not None else NotImplemented
def implements(ua_func): def inner(func): _ua_implementations[ua_func] = func return func
return inner
@implements(unp.asarray) def asarray(a, dtype=None, order=None): # Code here # Either this method or __ua_convert__ must # return NotImplemented for unsupported types, # Or they shouldn't be marked as dispatchable.
# Provides a default implementation for ones and zeros. @implements(unp.full) def full(shape, fill_value, dtype=None, order='C'): # Code here
Backward compatibility ----------------------
There are no backward incompatible changes proposed in this NEP.
Alternatives ------------
The current alternative to this problem is a combination of NEP-18 [2]_, NEP-13 [4]_ and NEP-30 [9]_ plus adding more protocols (not yet specified) in addition to it. Even then, some parts of the NumPy API will remain non-overridable, so it's a partial alternative.
The main alternative to vendoring ``unumpy`` is to simply move it into NumPy completely and not distribute it as a separate package. This would also achieve the proposed goals, however we prefer to keep it a separate package for now, for reasons already stated above.
The third alternative is to move ``unumpy`` into the NumPy organisation and develop it as a NumPy project. This will also achieve the said goals, and is also a possibility that can be considered by this NEP. However, the act of doing an extra ``pip install`` or ``conda install`` may discourage some users from adopting this method.
Discussion ----------
* ``uarray`` blogpost: https://labs.quansight.org/blog/2019/07/uarray-update-api-changes-overhead-a... * The discussion section of NEP-18: https://numpy.org/neps/nep-0018-array-function-protocol.html#discussion * NEP-22: https://numpy.org/neps/nep-0022-ndarray-duck-typing-overview.html * Dask issue #4462: https://github.com/dask/dask/issues/4462 * PR #13046: https://github.com/numpy/numpy/pull/13046 * Dask issue #4883: https://github.com/dask/dask/issues/4883 * Issue #13831: https://github.com/numpy/numpy/issues/13831 * Discussion PR 1: https://github.com/hameerabbasi/numpy/pull/3 * Discussion PR 2: https://github.com/hameerabbasi/numpy/pull/4 * Discussion PR 3: https://github.com/numpy/numpy/pull/14389
References and Footnotes ------------------------
.. [1] uarray, A general dispatch mechanism for Python: https://uarray.readthedocs.io
.. [2] NEP 18 — A dispatch mechanism for NumPy’s high level array functions: https://numpy.org/neps/nep-0018-array-function-protocol.html
.. [3] NEP 22 — Duck typing for NumPy arrays – high level overview: https://numpy.org/neps/nep-0022-ndarray-duck-typing-overview.html
.. [4] NEP 13 — A Mechanism for Overriding Ufuncs: https://numpy.org/neps/nep-0013-ufunc-overrides.html
.. [5] Reply to Adding to the non-dispatched implementation of NumPy methods: http://numpy-discussion.10968.n7.nabble.com/Adding-to-the-non-dispatched-imp...
.. [6] Custom Dtype/Units discussion: http://numpy-discussion.10968.n7.nabble.com/Custom-Dtype-Units-discussion-td...
.. [7] The epic dtype cleanup plan: https://github.com/numpy/numpy/issues/2899
.. [8] unumpy: NumPy, but implementation-independent: https://unumpy.readthedocs.io
.. [9] NEP 30 — Duck Typing for NumPy Arrays - Implementation: https://www.numpy.org/neps/nep-0030-duck-array-protocol.html
.. [10] http://scipy.github.io/devdocs/fft.html#backend-control
Copyright ---------
This document has been placed in the public domain. _______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@python.org https://mail.python.org/mailman/listinfo/numpy-discussion
Hi all, and thank you for all your hard work with this.
I wanted to provide more of an "end user" perspective than I think has been present in this discussion so far. Over the past month, I've quickly skimmed some emails on this thread and skipped others altogether. I am far from a NumPy novice, but essentially *all* of the discussion went over my head. For a while my attitude was "Oh well, far smarter people than me are dealing with this, I'll let them figure it out." Looking at the participants in the thread, I worry that this is the attitude almost everyone has taken, and that the solution proposed will not be easy enough to deal with for any meaningful adoption. Certainly with `__array_function__` I only took interest when our tests broke with 1.17rc1.
Today I was particularly interested because I'm working to improve scikit-image support for pyopencl.Array inputs. I went back and read the original NEP and the latest iteration. Thank you again for the discussion, because the latest is indeed a vast improvement over the original.
I think the very motivation has the wrong focus. I would summarise it as "we've been coming up with all kinds of ways to do multiple dispatch for array-likes, and we've found that we need more ways, so let's come up with the One True Way." I think the focus should be on the users and community. Something along the lines of: "New implementations of array computing are cropping up left, right, and centre in Python (not to speak of other languages!). There are good reasons for this (GPUs, distributed computing, sparse data, etc), but it leaves users and library authors in a pickle: how can they ensure that their functions, written with NumPy array inputs and outputs in mind, work well in this ecosystem?"
With this new motivation in mind, I think that the user story below is (a) the best part of the NEP, but (b) underdeveloped. The NEP is all about "if I want my array implementation to work with this fancy dispatch system, what do I need to do?". But there should be more of "in user situations X, Y, and Z, what is the desired behaviour?"
The way we propose the overrides will be used by end users is::
# On the library side
import numpy.overridable as unp
def library_function(array): array = unp.asarray(array) # Code using unumpy as usual return array
# On the user side:
import numpy.overridable as unp import uarray as ua import dask.array as da
ua.register_backend(da)
library_function(dask_array) # works and returns dask_array
with unp.set_backend(da): library_function([1, 2, 3, 4]) # actually returns a Dask array.
Here, ``backend`` can be any compatible object defined either by NumPy or an external library, such as Dask or CuPy. Ideally, it should be the module ``dask.array`` or ``cupy`` itself.
Some questions about the above:
- What happens if I call `library_function(dask_array)` without registering `da` as a backend first? Will `unp.asarray` try to instantiate a potentially 100GB array? This seems bad. - To get `library_function`, I presumably have to do `from fancy_array_library import library_function`. Can the code in `fancy_array_library` itself register backends, and if so, should/would fancy array libraries that want to maximise compatibility pre-register a bunch of backends so that users don't have to?
Here are a couple of code snippets that I would *want* to "just work". Maybe it's unreasonable, but imho the NEP should provide these as use cases (specifically: how library_function should be written so that they work, and what dask.array and pytorch would need to do so that they work, OR, why the NEP doesn't solve them).
1. from dask import array as da from fancy_array_library import library_function # hopefully skimage one day ;)
data = da.from_zarr('myfile.zarr') result = library_function(data) # result should still be dask, all things being equal result.to_zarr('output.zarr')
2. from dask import array as da from magic_library import pytorch_predict
data = da.from_zarr('myfile.zarr') result = pytorch_predict(data) # normally here I would use e.g. data.map_overlap, but could this be done magically? result.to_zarr('output.zarr')
There's probably a whole bunch of other "user stories" one can concoct, and no doubt many from the authors of the NEP themselves, but they don't come through in the NEP text. My apologies that I haven't read *all* the references: I understand that it is frustrating if the above are addressed there, but I think it's important to have this kind of context in the NEP itself.
Thank you again, and I hope the above is helpful rather than feels like more unnecessary churn.
Juan.
On Wed, Oct 9, 2019 at 6:00 PM Juan Nunez-Iglesias jni@fastmail.com wrote:
Hi all, and thank you for all your hard work with this.
I wanted to provide more of an "end user" perspective than I think has been present in this discussion so far. Over the past month, I've quickly skimmed some emails on this thread and skipped others altogether. I am far from a NumPy novice, but essentially *all* of the discussion went over my head. For a while my attitude was "Oh well, far smarter people than me are dealing with this, I'll let them figure it out." Looking at the participants in the thread, I worry that this is the attitude almost everyone has taken, and that the solution proposed will not be easy enough to deal with for any meaningful adoption. Certainly with `__array_function__` I only took interest when our tests broke with 1.17rc1.
Today I was particularly interested because I'm working to improve scikit-image support for pyopencl.Array inputs. I went back and read the original NEP and the latest iteration. Thank you again for the discussion, because the latest is indeed a vast improvement over the original.
I think the very motivation has the wrong focus. I would summarise it as "we've been coming up with all kinds of ways to do multiple dispatch for array-likes, and we've found that we need more ways, so let's come up with the One True Way." I think the focus should be on the users and community. Something along the lines of: "New implementations of array computing are cropping up left, right, and centre in Python (not to speak of other languages!). There are good reasons for this (GPUs, distributed computing, sparse data, etc), but it leaves users and library authors in a pickle: how can they ensure that their functions, written with NumPy array inputs and outputs in mind, work well in this ecosystem?"
With this new motivation in mind, I think that the user story below is (a) the best part of the NEP, but (b) underdeveloped. The NEP is all about "if I want my array implementation to work with this fancy dispatch system, what do I need to do?". But there should be more of "in user situations X, Y, and Z, what is the desired behaviour?"
The way we propose the overrides will be used by end users is::
# On the library side import numpy.overridable as unp def library_function(array): array = unp.asarray(array) # Code using unumpy as usual return array # On the user side: import numpy.overridable as unp import uarray as ua import dask.array as da ua.register_backend(da) library_function(dask_array) # works and returns dask_array with unp.set_backend(da): library_function([1, 2, 3, 4]) # actually returns a Dask array.
Here, ``backend`` can be any compatible object defined either by NumPy or an external library, such as Dask or CuPy. Ideally, it should be the module ``dask.array`` or ``cupy`` itself.
Some questions about the above:
- What happens if I call `library_function(dask_array)` without
registering `da` as a backend first? Will `unp.asarray` try to instantiate a potentially 100GB array? This seems bad.
- To get `library_function`, I presumably have to do `from
fancy_array_library import library_function`. Can the code in `fancy_array_library` itself register backends, and if so, should/would fancy array libraries that want to maximise compatibility pre-register a bunch of backends so that users don't have to?
Here are a couple of code snippets that I would *want* to "just work". Maybe it's unreasonable, but imho the NEP should provide these as use cases (specifically: how library_function should be written so that they work, and what dask.array and pytorch would need to do so that they work, OR, why the NEP doesn't solve them).
from dask import array as da from fancy_array_library import library_function # hopefully skimage one day ;)
data = da.from_zarr('myfile.zarr') result = library_function(data) # result should still be dask, all things being equal result.to_zarr('output.zarr')
from dask import array as da from magic_library import pytorch_predict
data = da.from_zarr('myfile.zarr') result = pytorch_predict(data) # normally here I would use e.g. data.map_overlap, but could this be done magically? result.to_zarr('output.zarr')
There's probably a whole bunch of other "user stories" one can concoct, and no doubt many from the authors of the NEP themselves, but they don't come through in the NEP text. My apologies that I haven't read *all* the references: I understand that it is frustrating if the above are addressed there, but I think it's important to have this kind of context in the NEP itself.
Thank you again, and I hope the above is helpful rather than feels like more unnecessary churn.
Thanks Juan, this feedback is amazing and I couldn't agree more. I think we have to have this "end user focus" for this NEP, as well as for other large-scope design efforts: we should do or have done this for __array_ufunc__, __array_function__, the dtype redesign, etc.
I think in this case, the user stories and a "vision" on the whole topic don't belong inside this NEP. Rather, it should be a separate one like https://numpy.org/neps/nep-0022-ndarray-duck-typing-overview.html. If you read the first paragraph of that NEP, it actually starts out exactly right in the detailed description. But then it dives straight into design.
In my experience, when you mix user stories or external requirements with design, it's extremely easy to ignore or be super brief about the former, and let design considerations/details lead rather than follow from those external requirements.
Note that this is also why I wanted to update the NEP template. We've done a tweak by adding the "Motivation and Scope" section, but that doesn't go nearly far enough.
Back to this NEP: I don't think we should significantly extend it, we should write a new separate one. Rationale: these user stories apply equally to __array_function__ et al., and will have to guide how NumPy as a whole and the usage of the NumPy API evolves over the next couple of years.
Cheers, Ralf
Thanks to all the feedback, we have a new PR of NEP-31.
Please find the full-text quoted below:
============================================================ NEP 31 — Context-local and global overrides of the NumPy API ============================================================
:Author: Hameer Abbasi <habbasi@quansight.commailto:habbasi@quansight.com> :Author: Ralf Gommers <rgommers@quansight.commailto:rgommers@quansight.com> :Author: Peter Bell <pbell@quansight.commailto:pbell@quansight.com> :Status: Draft :Type: Standards Track :Created: 2019-08-22
Abstract --------
This NEP proposes to make all of NumPy's public API overridable via an extensible backend mechanism.
Acceptance of this NEP means NumPy would provide global and context-local overrides, as well as a dispatch mechanism similar to NEP-18 [2]_. First experiences with ``__array_function__`` show that it is necessary to be able to override NumPy functions that *do not take an array-like argument*, and hence aren't overridable via ``__array_function__``. The most pressing need is array creation and coercion functions, such as ``numpy.zeros`` or ``numpy.asarray``; see e.g. NEP-30 [9]_.
This NEP proposes to allow, in an opt-in fashion, overriding any part of the NumPy API. It is intended as a comprehensive resolution to NEP-22 [3]_, and obviates the need to add an ever-growing list of new protocols for each new type of function or object that needs to become overridable.
Motivation and Scope --------------------
The motivation behind ``uarray`` is manyfold: First, there have been several attempts to allow dispatch of parts of the NumPy API, including (most prominently), the ``__array_ufunc__`` protocol in NEP-13 [4]_, and the ``__array_function__`` protocol in NEP-18 [2]_, but this has shown the need for further protocols to be developed, including a protocol for coercion (see [5]_, [9]_). The reasons these overrides are needed have been extensively discussed in the references, and this NEP will not attempt to go into the details of why these are needed; but in short: It is necessary for library authors to be able to coerce arbitrary objects into arrays of their own types, such as CuPy needing to coerce to a CuPy array, for example, instead of a NumPy array.
These kinds of overrides are useful for both the end-user as well as library authors. End-users may have written or wish to write code that they then later speed up or move to a different implementation, say PyData/Sparse. They can do this simply by setting a backend. Library authors may also wish to write code that is portable across array implementations, for example ``sklearn`` may wish to write code for a machine learning algorithm that is portable across array implementations while also using array creation functions.
This NEP takes a holistic approach: It assumes that there are parts of the API that need to be overridable, and that these will grow over time. It provides a general framework and a mechanism to avoid a design of a new protocol each time this is required. This was the goal of ``uarray``: to allow for overrides in an API without needing the design of a new protocol.
This NEP proposes the following: That ``unumpy`` [8]_ becomes the recommended override mechanism for the parts of the NumPy API not yet covered by ``__array_function__`` or ``__array_ufunc__``, and that ``uarray`` is vendored into a new namespace within NumPy to give users and downstream dependencies access to these overrides. This vendoring mechanism is similar to what SciPy decided to do for making ``scipy.fft`` overridable (see [10]_).
Detailed description --------------------
Using overrides ~~~~~~~~~~~~~~~
The way we propose the overrides will be used by end users is::
# On the library side import numpy.overridable as unp
def library_function(array): array = unp.asarray(array) # Code using unumpy as usual return array
# On the user side: import numpy.overridable as unp import uarray as ua import dask.array as da
ua.register_backend(da)
library_function(dask_array) # works and returns dask_array
with unp.set_backend(da): library_function([1, 2, 3, 4]) # actually returns a Dask array.
Here, ``backend`` can be any compatible object defined either by NumPy or an external library, such as Dask or CuPy. Ideally, it should be the module ``dask.array`` or ``cupy`` itself.
Composing backends ~~~~~~~~~~~~~~~~~~
There are some backends which may depend on other backends, for example xarray depending on `numpy.fft`, and transforming a time axis into a frequency axis, or Dask/xarray holding an array other than a NumPy array inside it. This would be handled in the following manner inside code::
with ua.set_backend(cupy), ua.set_backend(dask.array): # Code that has distributed GPU arrays here
Proposals ~~~~~~~~~
The only change this NEP proposes at its acceptance, is to make ``unumpy`` the officially recommended way to override NumPy, along with making some submodules overridable by default via ``uarray``. ``unumpy`` will remain a separate repository/package (which we propose to vendor to avoid a hard dependency, and use the separate ``unumpy`` package only if it is installed, rather than depend on for the time being). In concrete terms, ``numpy.overridable`` becomes an alias for ``unumpy``, if available with a fallback to the a vendored version if not. ``uarray`` and ``unumpy`` and will be developed primarily with the input of duck-array authors and secondarily, custom dtype authors, via the usual GitHub workflow. There are a few reasons for this:
* Faster iteration in the case of bugs or issues. * Faster design changes, in the case of needed functionality. * ``unumpy`` will work with older versions of NumPy as well. * The user and library author opt-in to the override process, rather than breakages happening when it is least expected. In simple terms, bugs in ``unumpy`` mean that ``numpy`` remains unaffected. * For ``numpy.fft``, ``numpy.linalg`` and ``numpy.random``, the functions in the main namespace will mirror those in the ``numpy.overridable`` namespace. The reason for this is that there may exist functions in the in these submodules that need backends, even for ``numpy.ndarray`` inputs.
Advantanges of ``unumpy`` over other solutions ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
``unumpy`` offers a number of advantanges over the approach of defining a new protocol for every problem encountered: Whenever there is something requiring an override, ``unumpy`` will be able to offer a unified API with very minor changes. For example:
* ``ufunc`` objects can be overridden via their ``__call__``, ``reduce`` and other methods. * Other functions can be overridden in a similar fashion. * ``np.asduckarray`` goes away, and becomes ``np.overridable.asarray`` with a backend set. * The same holds for array creation functions such as ``np.zeros``, ``np.empty`` and so on.
This also holds for the future: Making something overridable would require only minor changes to ``unumpy``.
Another promise ``unumpy`` holds is one of default implementations. Default implementations can be provided for any multimethod, in terms of others. This allows one to override a large part of the NumPy API by defining only a small part of it. This is to ease the creation of new duck-arrays, by providing default implementations of many functions that can be easily expressed in terms of others, as well as a repository of utility functions that help in the implementation of duck-arrays that most duck-arrays would require. This would allow us to avoid designing entire protocols, e.g., a protocol for stacking and concatenating would be replaced by simply implementing ``stack`` and/or ``concatenate`` and then providing default implementations for everything else in that class. The same applies for transposing, and many other functions for which protocols haven't been proposed, such as ``isin`` in terms of ``in1d``, ``setdiff1d`` in terms of ``unique``, and so on.
It also allows one to override functions in a manner which ``__array_function__`` simply cannot, such as overriding ``np.einsum`` with the version from the ``opt_einsum`` package, or Intel MKL overriding FFT, BLAS or ``ufunc`` objects. They would define a backend with the appropriate multimethods, and the user would select them via a ``with`` statement, or registering them as a backend.
The last benefit is a clear way to coerce to a given backend (via the ``coerce`` keyword in ``ua.set_backend``), and a protocol for coercing not only arrays, but also ``dtype`` objects and ``ufunc`` objects with similar ones from other libraries. This is due to the existence of actual, third party dtype packages, and their desire to blend into the NumPy ecosystem (see [6]_). This is a separate issue compared to the C-level dtype redesign proposed in [7]_, it's about allowing third-party dtype implementations to work with NumPy, much like third-party array implementations. These can provide features such as, for example, units, jagged arrays or other such features that are outside the scope of NumPy.
Mixing NumPy and ``unumpy`` in the same file ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Normally, one would only want to import only one of ``unumpy`` or ``numpy``, you would import it as ``np`` for familiarity. However, there may be situations where one wishes to mix NumPy and the overrides, and there are a few ways to do this, depending on the user's style::
from numpy import overridable as unp import numpy as np
or::
import numpy as np
# Use unumpy via np.overridable
Duck-array coercion ~~~~~~~~~~~~~~~~~~~
There are inherent problems about returning objects that are not NumPy arrays from ``numpy.array`` or ``numpy.asarray``, particularly in the context of C/C++ or Cython code that may get an object with a different memory layout than the one it expects. However, we believe this problem may apply not only to these two functions but all functions that return NumPy arrays. For this reason, overrides are opt-in for the user, by using the submodule ``numpy.overridable`` rather than ``numpy``. NumPy will continue to work unaffected by anything in ``numpy.overridable``.
If the user wishes to obtain a NumPy array, there are two ways of doing it:
1. Use ``numpy.asarray`` (the non-overridable version). 2. Use ``numpy.overridable.asarray`` with the NumPy backend set and coercion enabled
Aliases outside of the ``numpy.overridable`` namespace ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
All functionality in ``numpy.random``, ``numpy.linalg`` and ``numpy.fft`` will be aliased to their respective overridable versions inside ``numpy.overridable``. The reason for this is that there are alternative implementations of RNGs (``mkl-random``), linear algebra routines (``eigen``, ``blis``) and FFT routines (``mkl-fft``, ``pyFFTW``) that need to operate on ``numpy.ndarray`` inputs, but still need the ability to switch behaviour.
This is different from monkeypatching in a few different ways:
* The caller-facing signature of the function is always the same, so there is at least the loose sense of an API contract. Monkeypatching does not provide this ability. * There is the ability of locally switching the backend. * It has been `suggested http://numpy-discussion.10968.n7.nabble.com/NEP-31-Context-local-and-global-overrides-of-the-NumPy-API-tp47452p47472.html`_http://numpy-discussion.10968.n7.nabble.com/NEP-31-Context-local-and-global-overrides-of-the-NumPy-API-tp47452p47472.html%3e%60_ that the reason that 1.17 hasn't landed in the Anaconda defaults channel is due to the incompatibility between monkeypatching and ``__array_function__``, as monkeypatching would bypass the protocol completely. * Statements of the form ``from numpy import x; x`` and ``np.x`` would have different results depending on whether the import was made before or after monkeypatching happened.
All this isn't possible at all with ``__array_function__`` or ``__array_ufunc__``.
It has been formally realised (at least in part) that a backend system is needed for this, in the `NumPy roadmap https://numpy.org/neps/roadmap.html#other-functionality`_.
For ``numpy.random``, it's still necessary to make the C-API fit the one proposed in `NEP-19 https://numpy.org/neps/nep-0019-rng-policy.html`_https://numpy.org/neps/nep-0019-rng-policy.html%3e%60_. This is impossible for `mkl-random`, because then it would need to be rewritten to fit that framework. The guarantees on stream compatibility will be the same as before, but if there's a backend that affects ``numpy.random`` set, we make no guarantees about stream compatibility, and it is up to the backend author to provide their own guarantees.
Providing a way for implicit dispatch ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
It has been suggested that the ability to dispatch methods which do not take a dispatchable is needed, while guessing that backend from another dispatchable.
As a concrete example, consider the following:
.. code:: python
with unumpy.determine_backend(array_like, np.ndarray): unumpy.arange(len(array_like))
While this does not exist yet in ``uarray``, it is trivial to add it. The need for this kind of code exists because one might want to have an alternative for the proposed ``*_like`` functions, or the ``like=`` keyword argument. The need for these exists because there are functions in the NumPy API that do not take a dispatchable argument, but there is still the need to select a backend based on a different dispatchable.
The need for an opt-in module ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
The need for an opt-in module is realised because of a few reasons:
* There are parts of the API (like `numpy.asarray`) that simply cannot be overridden due to incompatibility concerns with C/Cython extensions, however, one may want to coerce to a duck-array using ``asarray`` with a backend set. * There are possible issues around an implicit option and monkeypatching, such as those mentioned above.
NEP 18 notes that this may require maintenance of two separate APIs. However, this burden may be lessened by, for example, parametrizing all tests over ``numpy.overridable`` separately via a fixture. This also has the side-effect of thoroughly testing it, unlike ``__array_function__``. We also feel that it provides an oppurtunity to separate the NumPy API contract properly from the implementation.
Benefits to end-users and mixing backends ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Mixing backends is easy in ``uarray``, one only has to do:
.. code:: python
# Explicitly say which backends you want to mix ua.register_backend(backend1) ua.register_backend(backend2) ua.register_backend(backend3)
# Freely use code that mixes backends here.
The benefits to end-users extend beyond just writing new code. Old code (usually in the form of scripts) can be easily ported to different backends by a simple import switch and a line adding the preferred backend. This way, users may find it easier to port existing code to GPU or distributed computing.
Related Work ------------
Other override mechanisms ~~~~~~~~~~~~~~~~~~~~~~~~~
* NEP-18, the ``__array_function__`` protocol. [2]_ * NEP-13, the ``__array_ufunc__`` protocol. [3]_ * NEP-30, the ``__duck_array__`` protocol. [9]_
Existing NumPy-like array implementations ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
* Dask: https://dask.org/ * CuPy: https://cupy.chainer.org/ * PyData/Sparse: https://sparse.pydata.org/ * Xnd: https://xnd.readthedocs.io/ * Astropy's Quantity: https://docs.astropy.org/en/stable/units/
Existing and potential consumers of alternative arrays ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
* Dask: https://dask.org/ * scikit-learn: https://scikit-learn.org/ * xarray: https://xarray.pydata.org/ * TensorLy: http://tensorly.org/
Existing alternate dtype implementations ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
* ``ndtypes``: https://ndtypes.readthedocs.io/en/latest/ * Datashape: https://datashape.readthedocs.io * Plum: https://plum-py.readthedocs.io/
Alternate implementations of parts of the NumPy API ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
* ``mkl_random``: https://github.com/IntelPython/mkl_random * ``mkl_fft``: https://github.com/IntelPython/mkl_fft * ``bottleneck``: https://github.com/pydata/bottleneck * ``opt_einsum``: https://github.com/dgasmith/opt_einsum
Implementation --------------
The implementation of this NEP will require the following steps:
* Implementation of ``uarray`` multimethods corresponding to the NumPy API, including classes for overriding ``dtype``, ``ufunc`` and ``array`` objects, in the ``unumpy`` repository. * Moving backends from ``unumpy`` into the respective array libraries.
``uarray`` Primer ~~~~~~~~~~~~~~~~~
**Note:** *This section will not attempt to go into too much detail about uarray, that is the purpose of the uarray documentation.* [1]_ *However, the NumPy community will have input into the design of uarray, via the issue tracker.*
``unumpy`` is the interface that defines a set of overridable functions (multimethods) compatible with the numpy API. To do this, it uses the ``uarray`` library. ``uarray`` is a general purpose tool for creating multimethods that dispatch to one of multiple different possible backend implementations. In this sense, it is similar to the ``__array_function__`` protocol but with the key difference that the backend is explicitly installed by the end-user and not coupled into the array type.
Decoupling the backend from the array type gives much more flexibility to end-users and backend authors. For example, it is possible to:
* override functions not taking arrays as arguments * create backends out of source from the array type * install multiple backends for the same array type
This decoupling also means that ``uarray`` is not constrained to dispatching over array-like types. The backend is free to inspect the entire set of function arguments to determine if it can implement the function e.g. ``dtype`` parameter dispatching.
Defining backends ^^^^^^^^^^^^^^^^^
``uarray`` consists of two main protocols: ``__ua_convert__`` and ``__ua_function__``, called in that order, along with ``__ua_domain__``. ``__ua_convert__`` is for conversion and coercion. It has the signature ``(dispatchables, coerce)``, where ``dispatchables`` is an iterable of ``ua.Dispatchable`` objects and ``coerce`` is a boolean indicating whether or not to force the conversion. ``ua.Dispatchable`` is a simple class consisting of three simple values: ``type``, ``value``, and ``coercible``. ``__ua_convert__`` returns an iterable of the converted values, or ``NotImplemented`` in the case of failure.
``__ua_function__`` has the signature ``(func, args, kwargs)`` and defines the actual implementation of the function. It recieves the function and its arguments. Returning ``NotImplemented`` will cause a move to the default implementation of the function if one exists, and failing that, the next backend.
Here is what will happen assuming a ``uarray`` multimethod is called:
1. We canonicalise the arguments so any arguments without a default are placed in ``*args`` and those with one are placed in ``**kwargs``. 2. We check the list of backends.
a. If it is empty, we try the default implementation.
3. We check if the backend's ``__ua_convert__`` method exists. If it exists:
a. We pass it the output of the dispatcher, which is an iterable of ``ua.Dispatchable`` objects. b. We feed this output, along with the arguments, to the argument replacer. ``NotImplemented`` means we move to 3 with the next backend. c. We store the replaced arguments as the new arguments.
4. We feed the arguments into ``__ua_function__``, and return the output, and exit if it isn't ``NotImplemented``. 5. If the default implementation exists, we try it with the current backend. 6. On failure, we move to 3 with the next backend. If there are no more backends, we move to 7. 7. We raise a ``ua.BackendNotImplementedError``.
Defining overridable multimethods ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
To define an overridable function (a multimethod), one needs a few things:
1. A dispatcher that returns an iterable of ``ua.Dispatchable`` objects. 2. A reverse dispatcher that replaces dispatchable values with the supplied ones. 3. A domain. 4. Optionally, a default implementation, which can be provided in terms of other multimethods.
As an example, consider the following::
import uarray as ua
def full_argreplacer(args, kwargs, dispatchables): def full(shape, fill_value, dtype=None, order='C'): return (shape, fill_value), dict( dtype=dispatchables[0], order=order )
return full(*args, **kwargs)
@ua.create_multimethod(full_argreplacer, domain="numpy") def full(shape, fill_value, dtype=None, order='C'): return (ua.Dispatchable(dtype, np.dtype),)
A large set of examples can be found in the ``unumpy`` repository, [8]_. This simple act of overriding callables allows us to override:
* Methods * Properties, via ``fget`` and ``fset`` * Entire objects, via ``__get__``.
Examples for NumPy ^^^^^^^^^^^^^^^^^^
A library that implements a NumPy-like API will use it in the following manner (as an example)::
import numpy.overridable as unp _ua_implementations = {}
__ua_domain__ = "numpy"
def __ua_function__(func, args, kwargs): fn = _ua_implementations.get(func, None) return fn(*args, **kwargs) if fn is not None else NotImplemented
def implements(ua_func): def inner(func): _ua_implementations[ua_func] = func return func
return inner
@implements(unp.asarray) def asarray(a, dtype=None, order=None): # Code here # Either this method or __ua_convert__ must # return NotImplemented for unsupported types, # Or they shouldn't be marked as dispatchable.
# Provides a default implementation for ones and zeros. @implements(unp.full) def full(shape, fill_value, dtype=None, order='C'): # Code here
Backward compatibility ----------------------
There are no backward incompatible changes proposed in this NEP.
Alternatives ------------
The current alternative to this problem is a combination of NEP-18 [2]_, NEP-13 [4]_ and NEP-30 [9]_ plus adding more protocols (not yet specified) in addition to it. Even then, some parts of the NumPy API will remain non-overridable, so it's a partial alternative.
The main alternative to vendoring ``unumpy`` is to simply move it into NumPy completely and not distribute it as a separate package. This would also achieve the proposed goals, however we prefer to keep it a separate package for now, for reasons already stated above.
The third alternative is to move ``unumpy`` into the NumPy organisation and develop it as a NumPy project. This will also achieve the said goals, and is also a possibility that can be considered by this NEP. However, the act of doing an extra ``pip install`` or ``conda install`` may discourage some users from adopting this method.
An alternative to requiring opt-in is mainly to *not* override ``np.asarray`` and ``np.array``, and making the rest of the NumPy API surface overridable, instead providing ``np.duckarray`` and ``np.asduckarray`` as duck-array friendly alternatives that used the respective overrides. However, this has the downside of adding a minor overhead to NumPy calls.
Discussion ----------
* ``uarray`` blogpost: https://labs.quansight.org/blog/2019/07/uarray-update-api-changes-overhead-a... * The discussion section of NEP-18: https://numpy.org/neps/nep-0018-array-function-protocol.html#discussion * NEP-22: https://numpy.org/neps/nep-0022-ndarray-duck-typing-overview.html * Dask issue #4462: https://github.com/dask/dask/issues/4462 * PR #13046: https://github.com/numpy/numpy/pull/13046 * Dask issue #4883: https://github.com/dask/dask/issues/4883 * Issue #13831: https://github.com/numpy/numpy/issues/13831 * Discussion PR 1: https://github.com/hameerabbasi/numpy/pull/3 * Discussion PR 2: https://github.com/hameerabbasi/numpy/pull/4 * Discussion PR 3: https://github.com/numpy/numpy/pull/14389
References and Footnotes ------------------------
.. [1] uarray, A general dispatch mechanism for Python: https://uarray.readthedocs.io
.. [2] NEP 18 — A dispatch mechanism for NumPy’s high level array functions: https://numpy.org/neps/nep-0018-array-function-protocol.html
.. [3] NEP 22 — Duck typing for NumPy arrays – high level overview: https://numpy.org/neps/nep-0022-ndarray-duck-typing-overview.html
.. [4] NEP 13 — A Mechanism for Overriding Ufuncs: https://numpy.org/neps/nep-0013-ufunc-overrides.html
.. [5] Reply to Adding to the non-dispatched implementation of NumPy methods: http://numpy-discussion.10968.n7.nabble.com/Adding-to-the-non-dispatched-imp...
.. [6] Custom Dtype/Units discussion: http://numpy-discussion.10968.n7.nabble.com/Custom-Dtype-Units-discussion-td...
.. [7] The epic dtype cleanup plan: https://github.com/numpy/numpy/issues/2899
.. [8] unumpy: NumPy, but implementation-independent: https://unumpy.readthedocs.io
.. [9] NEP 30 — Duck Typing for NumPy Arrays - Implementation: https://www.numpy.org/neps/nep-0030-duck-array-protocol.html
.. [10] http://scipy.github.io/devdocs/fft.html#backend-control
Copyright ---------
This document has been placed in the public domain.
From: NumPy-Discussion numpy-discussion-bounces+hameerabbasi=yahoo.com@python.org on behalf of Hameer Abbasi einstein.edison@gmail.com Reply to: Discussion of Numerical Python numpy-discussion@python.org Date: Thursday, 5. September 2019 at 17:12 To: numpy-discussion@python.org Subject: Re: [Numpy-discussion] NEP 31 — Context-local and global overrides of the NumPy API
Hello everyone;
Thanks to all the feedback from the community, in particular Sebastian Berg, we have a new draft of NEP-31.
Please find the full text quoted below for discussion and reference. Any feedback and discussion is welcome.
============================================================
NEP 31 — Context-local and global overrides of the NumPy API
============================================================
:Author: Hameer Abbasi habbasi@quansight.commailto:habbasi@quansight.com
:Author: Ralf Gommers rgommers@quansight.commailto:rgommers@quansight.com
:Author: Peter Bell pbell@quansight.commailto:pbell@quansight.com
:Status: Draft
:Type: Standards Track
:Created: 2019-08-22
Abstract
--------
This NEP proposes to make all of NumPy's public API overridable via an
extensible backend mechanism.
Acceptance of this NEP means NumPy would provide global and context-local
overrides, as well as a dispatch mechanism similar to NEP-18 [2]_. First
experiences with ``__array_function__`` show that it is necessary to be able
to override NumPy functions that *do not take an array-like argument*, and
hence aren't overridable via ``__array_function__``. The most pressing need is
array creation and coercion functions, such as ``numpy.zeros`` or
``numpy.asarray``; see e.g. NEP-30 [9]_.
This NEP proposes to allow, in an opt-in fashion, overriding any part of the
NumPy API. It is intended as a comprehensive resolution to NEP-22 [3]_, and
obviates the need to add an ever-growing list of new protocols for each new
type of function or object that needs to become overridable.
Motivation and Scope
--------------------
The motivation behind ``uarray`` is manyfold: First, there have been several
attempts to allow dispatch of parts of the NumPy API, including (most
prominently), the ``__array_ufunc__`` protocol in NEP-13 [4]_, and the
``__array_function__`` protocol in NEP-18 [2]_, but this has shown the need
for further protocols to be developed, including a protocol for coercion (see
[5]_, [9]_). The reasons these overrides are needed have been extensively
discussed in the references, and this NEP will not attempt to go into the
details of why these are needed; but in short: It is necessary for library
authors to be able to coerce arbitrary objects into arrays of their own types,
such as CuPy needing to coerce to a CuPy array, for example, instead of
a NumPy array.
These kinds of overrides are useful for both the end-user as well as library
authors. End-users may have written or wish to write code that they then later
speed up or move to a different implementation, say PyData/Sparse. They can do
this simply by setting a backend. Library authors may also wish to write code
that is portable across array implementations, for example ``sklearn`` may wish
to write code for a machine learning algorithm that is portable across array
implementations while also using array creation functions.
This NEP takes a holistic approach: It assumes that there are parts of
the API that need to be overridable, and that these will grow over time. It
provides a general framework and a mechanism to avoid a design of a new
protocol each time this is required. This was the goal of ``uarray``: to
allow for overrides in an API without needing the design of a new protocol.
This NEP proposes the following: That ``unumpy`` [8]_ becomes the
recommended override mechanism for the parts of the NumPy API not yet covered
by ``__array_function__`` or ``__array_ufunc__``, and that ``uarray`` is
vendored into a new namespace within NumPy to give users and downstream
dependencies access to these overrides. This vendoring mechanism is similar
to what SciPy decided to do for making ``scipy.fft`` overridable (see [10]_).
Detailed description
--------------------
Using overrides
~~~~~~~~~~~~~~~
The way we propose the overrides will be used by end users is::
# On the library side
import numpy.overridable as unp
def library_function(array):
array = unp.asarray(array)
# Code using unumpy as usual
return array
# On the user side:
import numpy.overridable as unp
import uarray as ua
import dask.array as da
ua.register_backend(da)
library_function(dask_array) # works and returns dask_array
with unp.set_backend(da):
library_function([1, 2, 3, 4]) # actually returns a Dask array.
Here, ``backend`` can be any compatible object defined either by NumPy or an
external library, such as Dask or CuPy. Ideally, it should be the module
``dask.array`` or ``cupy`` itself.
Composing backends
~~~~~~~~~~~~~~~~~~
There are some backends which may depend on other backends, for example xarray
depending on `numpy.fft`, and transforming a time axis into a frequency axis,
or Dask/xarray holding an array other than a NumPy array inside it. This would
be handled in the following manner inside code::
with ua.set_backend(cupy), ua.set_backend(dask.array):
# Code that has distributed GPU arrays here
Proposals
~~~~~~~~~
The only change this NEP proposes at its acceptance, is to make ``unumpy`` the
officially recommended way to override NumPy. ``unumpy`` will remain a separate
repository/package (which we propose to vendor to avoid a hard dependency, and
use the separate ``unumpy`` package only if it is installed, rather than depend
on for the time being). In concrete terms, ``numpy.overridable`` becomes an
alias for ``unumpy``, if available with a fallback to the a vendored version if
not. ``uarray`` and ``unumpy`` and will be developed primarily with the input
of duck-array authors and secondarily, custom dtype authors, via the usual
GitHub workflow. There are a few reasons for this:
* Faster iteration in the case of bugs or issues.
* Faster design changes, in the case of needed functionality.
* ``unumpy`` will work with older versions of NumPy as well.
* The user and library author opt-in to the override process,
rather than breakages happening when it is least expected.
In simple terms, bugs in ``unumpy`` mean that ``numpy`` remains
unaffected.
Advantanges of ``unumpy`` over other solutions
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
``unumpy`` offers a number of advantanges over the approach of defining a new
protocol for every problem encountered: Whenever there is something requiring
an override, ``unumpy`` will be able to offer a unified API with very minor
changes. For example:
* ``ufunc`` objects can be overridden via their ``__call__``, ``reduce`` and
other methods.
* Other functions can be overridden in a similar fashion.
* ``np.asduckarray`` goes away, and becomes ``np.overridable.asarray`` with a
backend set.
* The same holds for array creation functions such as ``np.zeros``,
``np.empty`` and so on.
This also holds for the future: Making something overridable would require only
minor changes to ``unumpy``.
Another promise ``unumpy`` holds is one of default implementations. Default
implementations can be provided for any multimethod, in terms of others. This
allows one to override a large part of the NumPy API by defining only a small
part of it. This is to ease the creation of new duck-arrays, by providing
default implementations of many functions that can be easily expressed in
terms of others, as well as a repository of utility functions that help in the
implementation of duck-arrays that most duck-arrays would require.
It also allows one to override functions in a manner which
``__array_function__`` simply cannot, such as overriding ``np.einsum`` with the
version from the ``opt_einsum`` package, or Intel MKL overriding FFT, BLAS
or ``ufunc`` objects. They would define a backend with the appropriate
multimethods, and the user would select them via a ``with`` statement, or
registering them as a backend.
The last benefit is a clear way to coerce to a given backend (via the
``coerce`` keyword in ``ua.set_backend``), and a protocol
for coercing not only arrays, but also ``dtype`` objects and ``ufunc`` objects
with similar ones from other libraries. This is due to the existence of actual,
third party dtype packages, and their desire to blend into the NumPy ecosystem
(see [6]_). This is a separate issue compared to the C-level dtype redesign
proposed in [7]_, it's about allowing third-party dtype implementations to
work with NumPy, much like third-party array implementations. These can provide
features such as, for example, units, jagged arrays or other such features that
are outside the scope of NumPy.
Mixing NumPy and ``unumpy`` in the same file
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Normally, one would only want to import only one of ``unumpy`` or ``numpy``,
you would import it as ``np`` for familiarity. However, there may be situations
where one wishes to mix NumPy and the overrides, and there are a few ways to do
this, depending on the user's style::
from numpy import overridable as unp
import numpy as np
or::
import numpy as np
# Use unumpy via np.overridable
Duck-array coercion
~~~~~~~~~~~~~~~~~~~
There are inherent problems about returning objects that are not NumPy arrays
from ``numpy.array`` or ``numpy.asarray``, particularly in the context of C/C++
or Cython code that may get an object with a different memory layout than the
one it expects. However, we believe this problem may apply not only to these
two functions but all functions that return NumPy arrays. For this reason,
overrides are opt-in for the user, by using the submodule ``numpy.overridable``
rather than ``numpy``. NumPy will continue to work unaffected by anything in
``numpy.overridable``.
If the user wishes to obtain a NumPy array, there are two ways of doing it:
1. Use ``numpy.asarray`` (the non-overridable version).
2. Use ``numpy.overridable.asarray`` with the NumPy backend set and coercion
enabled
Related Work
------------
Other override mechanisms
~~~~~~~~~~~~~~~~~~~~~~~~~
* NEP-18, the ``__array_function__`` protocol. [2]_
* NEP-13, the ``__array_ufunc__`` protocol. [3]_
* NEP-30, the ``__duck_array__`` protocol. [9]_
Existing NumPy-like array implementations
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
* Dask: https://dask.org/
* CuPy: https://cupy.chainer.org/
* PyData/Sparse: https://sparse.pydata.org/
* Xnd: https://xnd.readthedocs.io/
* Astropy's Quantity: https://docs.astropy.org/en/stable/units/
Existing and potential consumers of alternative arrays
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
* Dask: https://dask.org/
* scikit-learn: https://scikit-learn.org/
* xarray: https://xarray.pydata.org/
* TensorLy: http://tensorly.org/
Existing alternate dtype implementations
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
* ``ndtypes``: https://ndtypes.readthedocs.io/en/latest/
* Datashape: https://datashape.readthedocs.io
* Plum: https://plum-py.readthedocs.io/
Implementation
--------------
The implementation of this NEP will require the following steps:
* Implementation of ``uarray`` multimethods corresponding to the
NumPy API, including classes for overriding ``dtype``, ``ufunc``
and ``array`` objects, in the ``unumpy`` repository.
* Moving backends from ``unumpy`` into the respective array libraries.
``uarray`` Primer
~~~~~~~~~~~~~~~~~
**Note:** *This section will not attempt to go into too much detail about
uarray, that is the purpose of the uarray documentation.* [1]_
*However, the NumPy community will have input into the design of
uarray, via the issue tracker.*
``unumpy`` is the interface that defines a set of overridable functions
(multimethods) compatible with the numpy API. To do this, it uses the
``uarray`` library. ``uarray`` is a general purpose tool for creating
multimethods that dispatch to one of multiple different possible backend
implementations. In this sense, it is similar to the ``__array_function__``
protocol but with the key difference that the backend is explicitly installed
by the end-user and not coupled into the array type.
Decoupling the backend from the array type gives much more flexibility to
end-users and backend authors. For example, it is possible to:
* override functions not taking arrays as arguments
* create backends out of source from the array type
* install multiple backends for the same array type
This decoupling also means that ``uarray`` is not constrained to dispatching
over array-like types. The backend is free to inspect the entire set of
function arguments to determine if it can implement the function e.g. ``dtype``
parameter dispatching.
Defining backends
^^^^^^^^^^^^^^^^^
``uarray`` consists of two main protocols: ``__ua_convert__`` and
``__ua_function__``, called in that order, along with ``__ua_domain__``.
``__ua_convert__`` is for conversion and coercion. It has the signature
``(dispatchables, coerce)``, where ``dispatchables`` is an iterable of
``ua.Dispatchable`` objects and ``coerce`` is a boolean indicating whether or
not to force the conversion. ``ua.Dispatchable`` is a simple class consisting
of three simple values: ``type``, ``value``, and ``coercible``.
``__ua_convert__`` returns an iterable of the converted values, or
``NotImplemented`` in the case of failure.
``__ua_function__`` has the signature ``(func, args, kwargs)`` and defines
the actual implementation of the function. It recieves the function and its
arguments. Returning ``NotImplemented`` will cause a move to the default
implementation of the function if one exists, and failing that, the next
backend.
Here is what will happen assuming a ``uarray`` multimethod is called:
1. We canonicalise the arguments so any arguments without a default
are placed in ``*args`` and those with one are placed in ``**kwargs``.
2. We check the list of backends.
a. If it is empty, we try the default implementation.
3. We check if the backend's ``__ua_convert__`` method exists. If it exists:
a. We pass it the output of the dispatcher,
which is an iterable of ``ua.Dispatchable`` objects.
b. We feed this output, along with the arguments,
to the argument replacer. ``NotImplemented`` means we move to 3
with the next backend.
c. We store the replaced arguments as the new arguments.
4. We feed the arguments into ``__ua_function__``, and return the output, and
exit if it isn't ``NotImplemented``.
5. If the default implementation exists, we try it with the current backend.
6. On failure, we move to 3 with the next backend. If there are no more
backends, we move to 7.
7. We raise a ``ua.BackendNotImplementedError``.
Defining overridable multimethods
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
To define an overridable function (a multimethod), one needs a few things:
1. A dispatcher that returns an iterable of ``ua.Dispatchable`` objects.
2. A reverse dispatcher that replaces dispatchable values with the supplied
ones.
3. A domain.
4. Optionally, a default implementation, which can be provided in terms of
other multimethods.
As an example, consider the following::
import uarray as ua
def full_argreplacer(args, kwargs, dispatchables):
def full(shape, fill_value, dtype=None, order='C'):
return (shape, fill_value), dict(
dtype=dispatchables[0],
order=order
)
return full(*args, **kwargs)
@ua.create_multimethod(full_argreplacer, domain="numpy")
def full(shape, fill_value, dtype=None, order='C'):
return (ua.Dispatchable(dtype, np.dtype),)
A large set of examples can be found in the ``unumpy`` repository, [8]_.
This simple act of overriding callables allows us to override:
* Methods
* Properties, via ``fget`` and ``fset``
* Entire objects, via ``__get__``.
Examples for NumPy
^^^^^^^^^^^^^^^^^^
A library that implements a NumPy-like API will use it in the following
manner (as an example)::
import numpy.overridable as unp
_ua_implementations = {}
__ua_domain__ = "numpy"
def __ua_function__(func, args, kwargs):
fn = _ua_implementations.get(func, None)
return fn(*args, **kwargs) if fn is not None else NotImplemented
def implements(ua_func):
def inner(func):
_ua_implementations[ua_func] = func
return func
return inner
@implements(unp.asarray)
def asarray(a, dtype=None, order=None):
# Code here
# Either this method or __ua_convert__ must
# return NotImplemented for unsupported types,
# Or they shouldn't be marked as dispatchable.
# Provides a default implementation for ones and zeros.
@implements(unp.full)
def full(shape, fill_value, dtype=None, order='C'):
# Code here
Backward compatibility
----------------------
There are no backward incompatible changes proposed in this NEP.
Alternatives
------------
The current alternative to this problem is a combination of NEP-18 [2]_,
NEP-13 [4]_ and NEP-30 [9]_ plus adding more protocols (not yet specified)
in addition to it. Even then, some parts of the NumPy API will remain
non-overridable, so it's a partial alternative.
The main alternative to vendoring ``unumpy`` is to simply move it into NumPy
completely and not distribute it as a separate package. This would also achieve
the proposed goals, however we prefer to keep it a separate package for now,
for reasons already stated above.
The third alternative is to move ``unumpy`` into the NumPy organisation and
develop it as a NumPy project. This will also achieve the said goals, and is
also a possibility that can be considered by this NEP. However, the act of
doing an extra ``pip install`` or ``conda install`` may discourage some users
from adopting this method.
Discussion
----------
* ``uarray`` blogpost: https://labs.quansight.org/blog/2019/07/uarray-update-api-changes-overhead-a...
* The discussion section of NEP-18: https://numpy.org/neps/nep-0018-array-function-protocol.html#discussion
* NEP-22: https://numpy.org/neps/nep-0022-ndarray-duck-typing-overview.html
* Dask issue #4462: https://github.com/dask/dask/issues/4462
* PR #13046: https://github.com/numpy/numpy/pull/13046
* Dask issue #4883: https://github.com/dask/dask/issues/4883
* Issue #13831: https://github.com/numpy/numpy/issues/13831
* Discussion PR 1: https://github.com/hameerabbasi/numpy/pull/3
* Discussion PR 2: https://github.com/hameerabbasi/numpy/pull/4
* Discussion PR 3: https://github.com/numpy/numpy/pull/14389
References and Footnotes
------------------------
.. [1] uarray, A general dispatch mechanism for Python: https://uarray.readthedocs.io
.. [2] NEP 18 — A dispatch mechanism for NumPy’s high level array functions: https://numpy.org/neps/nep-0018-array-function-protocol.html
.. [3] NEP 22 — Duck typing for NumPy arrays – high level overview: https://numpy.org/neps/nep-0022-ndarray-duck-typing-overview.html
.. [4] NEP 13 — A Mechanism for Overriding Ufuncs: https://numpy.org/neps/nep-0013-ufunc-overrides.html
.. [5] Reply to Adding to the non-dispatched implementation of NumPy methods: http://numpy-discussion.10968.n7.nabble.com/Adding-to-the-non-dispatched-imp...
.. [6] Custom Dtype/Units discussion: http://numpy-discussion.10968.n7.nabble.com/Custom-Dtype-Units-discussion-td...
.. [7] The epic dtype cleanup plan: https://github.com/numpy/numpy/issues/2899
.. [8] unumpy: NumPy, but implementation-independent: https://unumpy.readthedocs.io
.. [9] NEP 30 — Duck Typing for NumPy Arrays - Implementation: https://www.numpy.org/neps/nep-0030-duck-array-protocol.html
.. [10] http://scipy.github.io/devdocs/fft.html#backend-control
Copyright
---------
This document has been placed in the public domain. _______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@python.org https://mail.python.org/mailman/listinfo/numpy-discussion