[Numpy-discussion] NEP 31 — Context-local and global overrides of the NumPy API

Hameer Abbasi einstein.edison at gmail.com
Tue Sep 3 05:06:38 EDT 2019


Hi Nathaniel,


On 02.09.19 23:09, Nathaniel Smith wrote:
> On Mon, Sep 2, 2019 at 2:15 AM Hameer Abbasi <einstein.edison at gmail.com> wrote:
>> Me, Ralf Gommers and Peter Bell (both cc’d) have come up with a proposal on how to solve the array creation and duck array problems. The solution is outlined in NEP-31, currently in the form of a PR, [1]
> Thanks for putting this together! It'd be great to have more
> engagement between uarray and numpy.
>
>> ============================================================
>>
>> NEP 31 — Context-local and global overrides of the NumPy API
>>
>> ============================================================
> Now that I've read this over, my main feedback is that right now it
> seems too vague and high-level to give it a fair evaluation? The idea
> of a NEP is to lay out a problem and proposed solution in enough
> detail that it can be evaluated and critiqued, but this felt to me
> more like it was pointing at some other documents for all the details
> and then promising that uarray has solutions for all our problems.
>
>> This NEP takes a more holistic approach: It assumes that there are parts of the API that need to be
>> overridable, and that these will grow over time. It provides a general framework and a mechanism to
>> avoid a design of a new protocol each time this is required.
> The idea of a holistic approach makes me nervous, because I'm not sure
> we have holistic problems.

The fact that we're having to design more and more protocols for a lot 
of very similar things is, to me, an indicator that we do have holistic 
problems that ought to be solved by a single protocol.

> Sometimes a holistic approach is the right
> thing; other times it means sweeping the actual problems under the
> rug, so things *look* simple and clean but in fact nothing has been
> solved, and they just end up biting us later. And from the NEP as
> currently written, I can't tell whether this is the good kind of
> holistic or the bad kind of holistic.
>
> Now I'm writing vague handwavey things, so let me follow my own advice
> and make it more concrete with an example :-).
>
> When Stephan and I were writing NEP 22, the single thing we spent the
> most time discussing was the problem of duck-array coercion, and in
> particular what to do about existing code that does
> np.asarray(duck_array_obj).
>
> The reason this is challenging is that there's a lot of code written
> in Cython/C/C++ that calls np.asarray, and then blindly casts the
> return value to a PyArray struct and starts accessing the raw memory
> fields. If np.asarray starts returning anything besides a real-actual
> np.ndarray object, then this code will start corrupting random memory,
> leading to a segfault at best.
>
> Stephan felt strongly that this meant that existing np.asarray calls
> *must not* ever return anything besides an np.ndarray object, and
> therefore we needed to add a new function np.asduckarray(), or maybe
> an explicit opt-in flag like np.asarray(..., allow_duck_array=True).
>
> I agreed that this was a problem, but thought we might be able to get
> away with an "opt-out" system, where we add an allow_duck_array= flag,
> but make it *default* to True, and document that the Cython/C/C++
> users who want to work with a raw np.ndarray object should modify
> their code to explicitly call np.asarray(obj, allow_duck_array=False).
> This would mean that for a while people who tried to pass duck-arrays
> into legacy library would get segfaults, but there would be a clear
> path for fixing these issues as they were discovered.
>
> Either way, there are also some other details to figure out: how does
> this affect the C version of asarray? What about np.asfortranarray –
> probably that should default to allow_duck_array=False, even if we did
> make np.asarray default to allow_duck_array=True, right?
>
> Now if I understand right, your proposal would be to make it so any
> code in any package could arbitrarily change the behavior of
> np.asarray for all inputs, e.g. I could just decide that
> np.asarray([1, 2, 3]) should return some arbitrary non-np.ndarray
> object. It seems like this has a much greater potential for breaking
> existing Cython/C/C++ code, and the NEP doesn't currently describe why
> this extra power is useful, and it doesn't currently describe how it
> plans to mitigate the downsides. (For example, if a caller needs a
> real np.ndarray, then is there some way to explicitly request one? The
> NEP doesn't say.) Maybe this is all fine and there are solutions to
> these issues, but any proposal to address duck array coercion needs to
> at least talk about these issues!
I believe I addressed this in a previous email, but the NEP doesn't 
suggest overriding numpy.asarray or numpy.array. It suggests overriding 
numpy.overridable.asarray and numpy.overridable.array, so existing code 
will continue to work as-is and overrides are opt-in rather than forced 
on you.

The argument about this kind of code could be applied to return values 
from other functions as well. That said, there is a way to request a 
NumPy array object explicitly:


with ua.set_backend(np):

     x = np.asarray(...)
>
> And that's just one example... array coercion is a particularly
> central and tricky problem, but the numpy API big, and there are
> probably other problems like this. For another example, I don't
> understand what the NEP is proposing to do about dtypes at all.
Just as there are other kinds of arrays, there may be other kinds of 
dtypes that are not NumPy dtypes. They cannot be attached to a NumPy 
array object (as Sebastian pointed out to me in last week's Community 
meeting), but they can still provide other powerful features.
> That's why I think the NEP needs to be fleshed out a lot more before
> it will be possible to evaluate fairly.
>
> -n
>
I just pushed a new version of the NEP to my PR, the full-text of which 
is below.

============================================================
NEP 31 — Context-local and global overrides of the NumPy API
============================================================

:Author: Hameer Abbasi <habbasi at quansight.com>
:Author: Ralf Gommers <rgommers at quansight.com>
:Author: Peter Bell <peterbell10 at live.co.uk>
:Status: Draft
:Type: Standards Track
:Created: 2019-08-22


Abstract
--------

This NEP proposes to make all of NumPy's public API overridable via an
extensible backend mechanism, using a library called ``uarray`` `[1]`_

``uarray`` provides global and context-local overrides, as well as a 
dispatch
mechanism similar to NEP-18 `[2]`_. First experiences with
``__array_function__`` show that it is necessary to be able to override 
NumPy
functions that *do not take an array-like argument*, and hence aren't
overridable via ``__array_function__``. The most pressing need is array
creation and coercion functions - see e.g. NEP-30 `[9]`_.

This NEP proposes to allow, in an opt-in fashion, overriding any part of 
the
NumPy API. It is intended as a comprehensive resolution to NEP-22 
`[3]`_, and
obviates the need to add an ever-growing list of new protocols for each new
type of function or object that needs to become overridable.

Motivation and Scope
--------------------

The motivation behind ``uarray`` is manyfold: First, there have been 
several
attempts to allow dispatch of parts of the NumPy API, including (most
prominently), the ``__array_ufunc__`` protocol in NEP-13 `[4]`_, and the
``__array_function__`` protocol in NEP-18 `[2]`_, but this has shown the 
need
for further protocols to be developed, including a protocol for coercion 
(see
`[5]`_). The reasons these overrides are needed have been extensively 
discussed
in the references, and this NEP will not attempt to go into the details 
of why
these are needed. Another pain point requiring yet another protocol is the
duck-array protocol (see `[9]`_).

This NEP takes a more holistic approach: It assumes that there are parts of
the API that need to be overridable, and that these will grow over time. It
provides a general framework and a mechanism to avoid a design of a new
protocol each time this is required.

This NEP proposes the following: That ``unumpy`` `[8]`_ becomes the
recommended override mechanism for the parts of the NumPy API not yet 
covered
by ``__array_function__`` or ``__array_ufunc__``, and that ``uarray`` is
vendored into a new namespace within NumPy to give users and downstream
dependencies access to these overrides.  This vendoring mechanism is 
similar
to what SciPy decided to do for making ``scipy.fft`` overridable (see 
`[10]`_).


Detailed description
--------------------

**Note:** *This section will not attempt to go into too much detail about
``uarray``, that is the purpose of the ``uarray`` documentation.* `[1]`_
*However, the NumPy community will have input into the design of
``uarray``, via the issue tracker.*

``uarray`` Primer
^^^^^^^^^^^^^^^^^

Defining backends
~~~~~~~~~~~~~~~~~

``uarray`` consists of two main protocols: ``__ua_convert__`` and
``__ua_function__``, called in that order, along with ``__ua_domain__``, 
which
is a string defining the domain of the backend. If any of the protocols 
return
``NotImplemented``, we fall back to the next backend.

``__ua_convert__`` is for conversion and coercion. It has the signature
``(dispatchables, coerce)``, where ``dispatchables`` is an iterable of
``ua.Dispatchable`` objects and ``coerce`` is a boolean indicating 
whether or
not to force the conversion. ``ua.Dispatchable`` is a simple class 
consisting
of three simple values: ``type``, ``value``, and ``coercible``.
``__ua_convert__`` returns an iterable of the converted values, or
``NotImplemented`` in the case of failure. Returning ``NotImplemented``
here will cause ``uarray`` to move to the next available backend.

``__ua_function__`` has the signature ``(func, args, kwargs)`` and defines
the actual implementation of the function. It recieves the function and its
arguments. Returning ``NotImplemented`` will cause a move to the default
implementation of the function if one exists, and failing that, the next
backend.

If all backends are exhausted, a ``ua.BackendNotImplementedError`` is 
raised.

Backends can be registered for permanent use if required.

Defining overridable multimethods
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

To define an overridable function (a multimethod), one needs a few things:

1. A dispatcher that returns an iterable of ``ua.Dispatchable`` objects.
2. A reverse dispatcher that replaces dispatchable values with the supplied
    ones.
3. A domain.
4. Optionally, a default implementation, which can be provided in terms of
    other multimethods.

As an example, consider the following::

     import uarray as ua

     def full_argreplacer(args, kwargs, dispatchables):
         def full(shape, fill_value, dtype=None, order='C'):
             return (shape, fill_value), dict(
                 dtype=dispatchables[0],
                 order=order
             )

         return full(*args, **kwargs)

     @ua.create_multimethod(full_argreplacer, domain="numpy")
     def full(shape, fill_value, dtype=None, order='C'):
         return (ua.Dispatchable(dtype, np.dtype),)

A large set of examples can be found in the ``unumpy`` repository, `[8]`_.
This simple act of overriding callables allows us to override:

* Methods
* Properties, via ``fget`` and ``fset``
* Entire objects, via ``__get__``.

Using overrides
~~~~~~~~~~~~~~~

The way we propose the overrides will be used by end users is::

     import numpy.overridable as np
     with np.set_backend(backend):
         x = np.asarray(my_array, dtype=dtype)

And a library that implements a NumPy-like API will use it in the following
manner (as an example)::

     import numpy.overridable as np
     _ua_implementations = {}

     __ua_domain__ = "numpy"

     def __ua_function__(func, args, kwargs):
         fn = _ua_implementations.get(func, None)
         return fn(*args, **kwargs) if fn is not None else NotImplemented

     def implements(ua_func):
         def inner(func):
             _ua_implementations[ua_func] = func
             return func

         return inner

     @implements(np.asarray)
     def asarray(a, dtype=None, order=None):
         # Code here
         # Either this method or __ua_convert__ must
         # return NotImplemented for unsupported types,
         # Or they shouldn't be marked as dispatchable.

     # Provides a default implementation for ones and zeros.
     @implements(np.full)
     def full(shape, fill_value, dtype=None, order='C'):
         # Code here

The only change this NEP proposes at its acceptance, is to make 
``unumpy`` the
officially recommended way to override NumPy. ``unumpy`` will remain a 
separate
repository/package (which we propose to vendor to avoid a hard 
dependency, and
use the separate ``unumpy`` package only if it is installed) rather than 
depend
on for the time being), and will be developed primarily with the input of
duck-array authors and secondarily, custom dtype authors, via the usual
GitHub workflow. There are a few reasons for this:

* Faster iteration in the case of bugs or issues.
* Faster design changes, in the case of needed functionality.
* ``unumpy`` will work with older versions of NumPy as well.
* The user and library author opt-in to the override process,
   rather than breakages happening when it is least expected.
   In simple terms, bugs in ``unumpy`` mean that ``numpy`` remains
   unaffected.

Duck-array coercion
~~~~~~~~~~~~~~~~~~~

There are inherent problems about returning objects that are not NumPy 
arrays
from ``numpy.array`` or ``numpy.asarray``, particularly in the context 
of C/C++
or Cython code that may get an object with a different memory layout 
than the
one it expects. However, we believe this problem may apply not only to 
these
two functions but all functions that return NumPy arrays. For this reason,
overrides are opt-in for the user, by using the submodule 
``numpy.overridable``
rather than ``numpy``. NumPy will continue to work unaffected by 
anything in
``numpy.overridable``.

If the user wishes to obtain a NumPy array, there are two ways of doing it:

1. Use ``numpy.asarray`` (the non-overridable version).
2. Use ``numpy.overridable.asarray`` with the NumPy backend set and 
coercion
    enabled::

     import numpy.overridable as np

     with ua.set_backend(np):
         x = np.asarray(...)

Advantanges of ``unumpy`` over other solutions
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

``unumpy`` offers a number of advantanges over the approach of defining 
a new
protocol for every problem encountered: Whenever there is something 
requiring
an override, ``unumpy`` will be able to offer a unified API with very minor
changes. For example:

* ``ufunc`` objects can be overridden via their ``__call__``, ``reduce`` 
and
   other methods.
* Other functions can be overridden in a similar fashion.
* ``np.asduckarray`` goes away, and becomes ``np.asarray`` with a 
backend set.
* The same holds for array creation functions such as ``np.zeros``,
   ``np.empty`` and so on.

This also holds for the future: Making something overridable would 
require only
minor changes to ``unumpy``.

Another promise ``unumpy`` holds is one of default implementations. Default
implementations can be provided for any multimethod, in terms of others. 
This
allows one to override a large part of the NumPy API by defining only a 
small
part of it. This is to ease the creation of new duck-arrays, by providing
default implementations of many functions that can be easily expressed in
terms of others, as well as a repository of utility functions that help 
in the
implementation of duck-arrays that most duck-arrays would require.

The last benefit is a clear way to coerce to a given backend (via the
``coerce`` keyword in ``ua.set_backend``), and a protocol
for coercing not only arrays, but also ``dtype`` objects and ``ufunc`` 
objects
with similar ones from other libraries. This is due to the existence of 
actual,
third party dtype packages, and their desire to blend into the NumPy 
ecosystem
(see `[6]`_). This is a separate issue compared to the C-level dtype 
redesign
proposed in `[7]`_, it's about allowing third-party dtype 
implementations to
work with NumPy, much like third-party array implementations. These can 
provide
features such as, for example, units, jagged arrays or other such 
features that
are outside the scope of NumPy.

Mixing NumPy and ``unumpy`` in the same file
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

Normally, one would only want to import only one of ``unumpy`` or 
``numpy``,
you would import it as ``np`` for familiarity. However, there may be 
situations
where one wishes to mix NumPy and the overrides, and there are a few 
ways to do
this, depending on the user's style::

     import numpy.overridable as unumpy
     import numpy as np

or::

     import numpy as np

     # Use unumpy via np.overridable

Related Work
------------

Previous override mechanisms
^^^^^^^^^^^^^^^^^^^^^^^^^^^^

* NEP-18, the ``__array_function__`` protocol. `[2]`_
* NEP-13, the ``__array_ufunc__`` protocol. `[3]`_

Existing NumPy-like array implementations
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

* Dask: https://dask.org/
* CuPy: https://cupy.chainer.org/
* PyData/Sparse: https://sparse.pydata.org/
* Xnd: https://xnd.readthedocs.io/
* Astropy's Quantity: https://docs.astropy.org/en/stable/units/

Existing and potential consumers of alternative arrays
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

* Dask: https://dask.org/
* scikit-learn: https://scikit-learn.org/
* xarray: https://xarray.pydata.org/
* TensorLy: http://tensorly.org/

Existing alternate dtype implementations
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

* ``ndtypes``: https://ndtypes.readthedocs.io/en/latest/
* Datashape: https://datashape.readthedocs.io
* Plum: https://plum-py.readthedocs.io/

Implementation
--------------

The implementation of this NEP will require the following steps:

* Implementation of ``uarray`` multimethods corresponding to the
   NumPy API, including classes for overriding ``dtype``, ``ufunc``
   and ``array`` objects, in the ``unumpy`` repository.
* Moving backends from ``unumpy`` into the respective array libraries.

Backward compatibility
----------------------

There are no backward incompatible changes proposed in this NEP.


Alternatives
------------

The current alternative to this problem is NEP-30 plus adding more 
protocols
(not yet specified) in addition to it.  Even then, some parts of the NumPy
API will remain non-overridable, so it's a partial alternative.

The main alternative to vendoring ``unumpy`` is to simply move it into 
NumPy
completely and not distribute it as a separate package. This would also 
achieve
the proposed goals, however we prefer to keep it a separate package for 
now,
for reasons already stated above.


Discussion
----------

* ``uarray`` blogpost: 
https://labs.quansight.org/blog/2019/07/uarray-update-api-changes-overhead-and-comparison-to-__array_function__/
* The discussion section of NEP-18: 
https://numpy.org/neps/nep-0018-array-function-protocol.html#discussion
* NEP-22: https://numpy.org/neps/nep-0022-ndarray-duck-typing-overview.html
* Dask issue #4462: https://github.com/dask/dask/issues/4462
* PR #13046: https://github.com/numpy/numpy/pull/13046
* Dask issue #4883: https://github.com/dask/dask/issues/4883
* Issue #13831: https://github.com/numpy/numpy/issues/13831
* Discussion PR 1: https://github.com/hameerabbasi/numpy/pull/3
* Discussion PR 2: https://github.com/hameerabbasi/numpy/pull/4


References and Footnotes
------------------------

.. _[1]:

[1] uarray, A general dispatch mechanism for Python: 
https://uarray.readthedocs.io

.. _[2]:

[2] NEP 18 — A dispatch mechanism for NumPy’s high level array 
functions: https://numpy.org/neps/nep-0018-array-function-protocol.html

.. _[3]:

[3] NEP 22 — Duck typing for NumPy arrays – high level overview: 
https://numpy.org/neps/nep-0022-ndarray-duck-typing-overview.html

.. _[4]:

[4] NEP 13 — A Mechanism for Overriding Ufuncs: 
https://numpy.org/neps/nep-0013-ufunc-overrides.html

.. _[5]:

[5] Reply to Adding to the non-dispatched implementation of NumPy 
methods: 
http://numpy-discussion.10968.n7.nabble.com/Adding-to-the-non-dispatched-implementation-of-NumPy-methods-tp46816p46874.html

.. _[6]:

[6] Custom Dtype/Units discussion: 
http://numpy-discussion.10968.n7.nabble.com/Custom-Dtype-Units-discussion-td43262.html

.. _[7]:

[7] The epic dtype cleanup plan: https://github.com/numpy/numpy/issues/2899

.. _[8]:

[8] unumpy: NumPy, but implementation-independent: 
https://unumpy.readthedocs.io

.. _[9]:

[9] NEP 30 — Duck Typing for NumPy Arrays - Implementation: 
https://www.numpy.org/neps/nep-0030-duck-array-protocol.html

.. _[10]:

[10] http://scipy.github.io/devdocs/fft.html#backend-control


Copyright
---------

This document has been placed in the public domain.


More information about the NumPy-Discussion mailing list