[Numpy-discussion] NEP 37: A dispatch protocol for NumPy-like modules

Sebastian Berg sebastian at sipsolutions.net
Thu Feb 6 15:20:15 EST 2020


On Thu, 2020-02-06 at 09:35 -0800, Stephan Hoyer wrote:
> On Wed, Feb 5, 2020 at 8:02 AM Andreas Mueller <t3kcit at gmail.com>
> wrote:
<snip>
>  
> > - We use scipy.linalg in many places, and we would need to do a
> > separate dispatching to check whether we can use module.linalg
> > instead
> >  (that might be an issue for many libraries but I'm not sure).
> > 
> 
> This brings up a good question -- obviously the final decision here
> is up to SciPy maintainers, but how should we encourage SciPy to
> support dispatching?
> We could pretty easily make __array_function__ cover SciPy by simply
> exposing NumPy's internal utilities. SciPy could simply use the
> np.array_function_dispatch decorator internally and that would be
> enough.

Hmmm, in NumPy we can easily force basically 100% of (desired)
coverage, i.e. JAX can return a namespace that implements everything.
With SciPy that is already muss less feasible, and as you go to domain
specific tools it seems implausible.

`get_array_module` solves the issue of a library that wants to support
all array likes. As long as:
  * most functions rely only on the NumPy API
  * the domain specific library is expected to implement support for
    specific array objects if necessary. E.g. sklearn can include
    special code for Dask support. Dask does not replace sklearn code.

> It is less clear how this could work for __array_module__, because
> __array_module__ and get_array_module() are not generic -- they
> refers explicitly to a NumPy like module. If we want to extend it to
> SciPy (for which I agree there are good use-cases), what should that
> look __array_module__` 

I suppose the question is here, where should the code reside? For
SciPy, I agree there is a good reason why you may want to "reverse" the
implementation. The code to support JAX arrays, should live inside JAX.

One, probably silly, option is to return a "global" namespace, so that:

    np = get_array_module(*arrays).numpy`

We have to distinct issues: Where should e.g. SciPy put a generic
implementation (assuming they to provide implementations that only
require NumPy-API support to not require overriding)?
And, also if a library provides generic support, should we define a
standard of how the context/namespace may be passed in/provided?

sklearn's main namespace is expected to support many array
objects/types, but it could be nice to pass in an already known
context/namespace (say scikit-image already found it, and then calls
scikit-learn internally). A "generic" namespace may even require this
to infer the correct output array object.


Another thing about backward compatibility: What is our vision there
actually?
This NEP will *not* give the *end user* the option to opt-in! Here,
opt-in is really reserved to the *library user* (e.g. sklearn). (I did
not realize this clearly before)

Thinking about that for a bit now, that seems like the right choice.
But it also means that the library requires an easy way of giving a
FutureWarning, to notify the end-user of the upcoming change. The end-
user will easily be able to convert to a NumPy array to keep the old
behaviour.
Once this warning is given (maybe during `get_array_module()`, the
array module object/context would preferably be passed around,
hopefully even between libraries. That provides a reasonable way to
opt-in to the new behaviour without a warning (mainly for library
users, end-users can silence the warning if they wish so). 

- Sebastian



> The obvious choices would be to either add a new protocol, e.g.,
> __scipy_module__ (but then NumPy needs to know about SciPy), or to
> add some sort of "module request" parameter to np.get_array_module(),
> to indicate the requested API, e.g., np.get_array_module(*arrays,
> matching='scipy'). This is pretty similar to the "default" argument
> but would need to get passed into the __array_module__ protocol, too.
>  
> > - Some models have several possible optimization algorithms, some
> > of which are pure numpy and some which are Cython. If someone
> > provides a different array module,
> >  we might want to choose an algorithm that is actually supported by
> > that module. While this exact issue is maybe sklearn specific, a
> > similar issue could appear for most downstream libs that use Cython
> > in some places.
> >  Many Cython algorithms could be implemented in pure numpy with a
> > potential slowdown, but once we have NEP 37 there might be a
> > benefit to having a pure NumPy implementation as an alternative
> > code path.
> > 
> > 
> > Anyway, NEP 37 seems a great step in the right direction and would
> > enable sklearn to actually dispatch in some places. Dispatching
> > just based on __array_function__ seems not really feasible so far.
> > 
> > Best,
> > Andreas Mueller
> > 
> > 
> > On 1/6/20 11:29 PM, Stephan Hoyer wrote:
> > > I am pleased to present a new NumPy Enhancement Proposal for
> > > discussion: "NEP-37: A dispatch protocol for NumPy-like modules."
> > > Feedback would be very welcome!
> > > 
> > > The full text follows. The rendered proposal can also be found
> > > online at https://numpy.org/neps/nep-0037-array-module.html
> > > 
> > > Best,
> > > Stephan Hoyer
> > > 
> > > ===================================================
> > > NEP 37 — A dispatch protocol for NumPy-like modules
> > > ===================================================
> > > 
> > > :Author: Stephan Hoyer <shoyer at google.com>
> > > :Author: Hameer Abbasi
> > > :Author: Sebastian Berg
> > > :Status: Draft
> > > :Type: Standards Track
> > > :Created: 2019-12-29
> > > 
> > > Abstract
> > > --------
> > > 
> > > NEP-18's ``__array_function__`` has been a mixed success. Some
> > > projects (e.g.,
> > > dask, CuPy, xarray, sparse, Pint) have enthusiastically adopted
> > > it. Others
> > > (e.g., PyTorch, JAX, SciPy) have been more reluctant. Here we
> > > propose a new
> > > protocol, ``__array_module__``, that we expect could eventually
> > > subsume most
> > > use-cases for ``__array_function__``. The protocol requires
> > > explicit adoption
> > > by both users and library authors, which ensures backwards
> > > compatibility, and
> > > is also significantly simpler than ``__array_function__``, both
> > > of which we
> > > expect will make it easier to adopt.
> > > 
> > > Why ``__array_function__`` hasn't been enough
> > > ---------------------------------------------
> > > 
> > > There are two broad ways in which NEP-18 has fallen short of its
> > > goals:
> > > 
> > > 1. **Maintainability concerns**. `__array_function__` has
> > > significant
> > >    implications for libraries that use it:
> > > 
> > >    - Projects like `PyTorch
> > >      <https://github.com/pytorch/pytorch/issues/22402>`_, `JAX
> > >      <https://github.com/google/jax/issues/1565>`_ and even
> > > `scipy.sparse
> > >      <https://github.com/scipy/scipy/issues/10362>`_ have been
> > > reluctant to
> > >      implement `__array_function__` in part because they are
> > > concerned about
> > >      **breaking existing code**: users expect NumPy functions
> > > like
> > >      ``np.concatenate`` to return NumPy arrays. This is a
> > > fundamental
> > >      limitation of the ``__array_function__`` design, which we
> > > chose to allow
> > >      overriding the existing ``numpy`` namespace.
> > >    - ``__array_function__`` currently requires an "all or
> > > nothing" approach to
> > >      implementing NumPy's API. There is no good pathway for
> > > **incremental
> > >      adoption**, which is particularly problematic for
> > > established projects
> > >      for which adopting ``__array_function__`` would result in
> > > breaking
> > >      changes.
> > >    - It is no longer possible to use **aliases to NumPy
> > > functions** within
> > >      modules that support overrides. For example, both CuPy and
> > > JAX set
> > >      ``result_type = np.result_type``.
> > >    - Implementing **fall-back mechanisms** for unimplemented
> > > NumPy functions
> > >      by using NumPy's implementation is hard to get right (but
> > > see the
> > >      `version from dask <
> > > https://github.com/dask/dask/pull/5043>`_), because
> > >      ``__array_function__`` does not present a consistent
> > > interface.
> > >      Converting all arguments of array type requires recursing
> > > into generic
> > >      arguments of the form ``*args, **kwargs``.
> > > 
> > > 2. **Limitations on what can be overridden.**
> > > ``__array_function__`` has some
> > >    important gaps, most notably array creation and coercion
> > > functions:
> > > 
> > >    - **Array creation** routines (e.g., ``np.arange`` and those
> > > in
> > >      ``np.random``) need some other mechanism for indicating what
> > > type of
> > >      arrays to create. `NEP 36 <
> > > https://github.com/numpy/numpy/pull/14715>`_
> > >      proposed adding optional ``like=`` arguments to functions
> > > without
> > >      existing array arguments. However, we still lack any
> > > mechanism to
> > >      override methods on objects, such as those needed by
> > >      ``np.random.RandomState``.
> > >    - **Array conversion** can't reuse the existing coercion
> > > functions like
> > >      ``np.asarray``, because ``np.asarray`` sometimes means
> > > "convert to an
> > >      exact ``np.ndarray``" and other times means "convert to
> > > something _like_
> > >      a NumPy array." This led to the `NEP 30
> > >      <https://numpy.org/neps/nep-0030-duck-array-protocol.html>`_
> > >  proposal for
> > >      a separate ``np.duckarray`` function, but this still does
> > > not resolve how
> > >      to cast one duck array into a type matching another duck
> > > array.
> > > 
> > > ``get_array_module`` and the ``__array_module__`` protocol
> > > ----------------------------------------------------------
> > > 
> > > We propose a new user-facing mechanism for dispatching to a duck-
> > > array
> > > implementation, ``numpy.get_array_module``. ``get_array_module``
> > > performs the
> > > same type resolution as ``__array_function__`` and returns a
> > > module with an API
> > > promised to match the standard interface of ``numpy`` that can
> > > implement
> > > operations on all provided array types.
> > > 
> > > The protocol itself is both simpler and more powerful than
> > > ``__array_function__``, because it doesn't need to worry about
> > > actually
> > > implementing functions. We believe it resolves most of the
> > > maintainability and
> > > functionality limitations of ``__array_function__``.
> > > 
> > > The new protocol is opt-in, explicit and with local control; see
> > > :ref:`appendix-design-choices` for discussion on the importance
> > > of these design
> > > features.
> > > 
> > > The array module contract
> > > =========================
> > > 
> > > Modules returned by ``get_array_module``/``__array_module__``
> > > should make a
> > > best effort to implement NumPy's core functionality on new array
> > > types(s).
> > > Unimplemented functionality should simply be omitted (e.g.,
> > > accessing an
> > > unimplemented function should raise ``AttributeError``). In the
> > > future, we
> > > anticipate codifying a protocol for requesting restricted subsets
> > > of ``numpy``;
> > > see :ref:`requesting-restricted-subsets` for more details.
> > > 
> > > How to use ``get_array_module``
> > > ===============================
> > > 
> > > Code that wants to support generic duck arrays should explicitly
> > > call
> > > ``get_array_module`` to determine an appropriate array module
> > > from which to
> > > call functions, rather than using the ``numpy`` namespace
> > > directly. For
> > > example:
> > > 
> > > .. code:: python
> > > 
> > >     # calls the appropriate version of np.something for x and y
> > >     module = np.get_array_module(x, y)
> > >     module.something(x, y)
> > > 
> > > Both array creation and array conversion are supported, because
> > > dispatching is
> > > handled by ``get_array_module`` rather than via the types of
> > > function
> > > arguments. For example, to use random number generation functions
> > > or methods,
> > > we can simply pull out the appropriate submodule:
> > > 
> > > .. code:: python
> > > 
> > >     def duckarray_add_random(array):
> > >         module = np.get_array_module(array)
> > >         noise = module.random.randn(*array.shape)
> > >         return array + noise
> > > 
> > > We can also write the duck-array ``stack`` function from `NEP 30
> > > <https://numpy.org/neps/nep-0030-duck-array-protocol.html>`_,
> > > without the need
> > > for a new ``np.duckarray`` function:
> > > 
> > > .. code:: python
> > > 
> > >     def duckarray_stack(arrays):
> > >         module = np.get_array_module(*arrays)
> > >         arrays = [module.asarray(arr) for arr in arrays]
> > >         shapes = {arr.shape for arr in arrays}
> > >         if len(shapes) != 1:
> > >             raise ValueError('all input arrays must have the same
> > > shape')
> > >         expanded_arrays = [arr[module.newaxis, ...] for arr in
> > > arrays]
> > >         return module.concatenate(expanded_arrays, axis=0)
> > > 
> > > By default, ``get_array_module`` will return the ``numpy`` module
> > > if no
> > > arguments are arrays. This fall-back can be explicitly controlled
> > > by providing
> > > the ``module`` keyword-only argument. It is also possible to
> > > indicate that an
> > > exception should be raised instead of returning a default array
> > > module by
> > > setting ``module=None``.
> > > 
> > > How to implement ``__array_module__``
> > > =====================================
> > > 
> > > Libraries implementing a duck array type that want to support
> > > ``get_array_module`` need to implement the corresponding
> > > protocol,
> > > ``__array_module__``. This new protocol is based on Python's
> > > dispatch protocol
> > > for arithmetic, and is essentially a simpler version of
> > > ``__array_function__``.
> > > 
> > > Only one argument is passed into ``__array_module__``, a Python
> > > collection of
> > > unique array types passed into ``get_array_module``, i.e., all
> > > arguments with
> > > an ``__array_module__`` attribute.
> > > 
> > > The special method should either return an namespace with an API
> > > matching
> > > ``numpy``, or ``NotImplemented``, indicating that it does not
> > > know how to
> > > handle the operation:
> > > 
> > > .. code:: python
> > > 
> > >     class MyArray:
> > >         def __array_module__(self, types):
> > >             if not all(issubclass(t, MyArray) for t in types):
> > >                 return NotImplemented
> > >             return my_array_module
> > > 
> > > Returning custom objects from ``__array_module__``
> > > ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
> > > 
> > > ``my_array_module`` will typically, but need not always, be a
> > > Python module.
> > > Returning a custom objects (e.g., with functions implemented via
> > > ``__getattr__``) may be useful for some advanced use cases.
> > > 
> > > For example, custom objects could allow for partial
> > > implementations of duck
> > > array modules that fall-back to NumPy (although this is not
> > > recommended in
> > > general because such fall-back behavior can be error prone):
> > > 
> > > .. code:: python
> > > 
> > >     class MyArray:
> > >         def __array_module__(self, types):
> > >             if all(issubclass(t, MyArray) for t in types):
> > >                 return ArrayModule()
> > >             else:
> > >                 return NotImplemented
> > > 
> > >     class ArrayModule:
> > >         def __getattr__(self, name):
> > >             import base_module
> > >             return getattr(base_module, name, getattr(numpy,
> > > name))
> > > 
> > > Subclassing from ``numpy.ndarray``
> > > ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
> > > 
> > > All of the same guidance about well-defined type casting
> > > hierarchies from
> > > NEP-18 still applies. ``numpy.ndarray`` itself contains a
> > > matching
> > > implementation of ``__array_module__``,  which is convenient for
> > > subclasses:
> > > 
> > > .. code:: python
> > > 
> > >     class ndarray:
> > >         def __array_module__(self, types):
> > >             if all(issubclass(t, ndarray) for t in types):
> > >                 return numpy
> > >             else:
> > >                 return NotImplemented
> > > 
> > > NumPy's internal machinery
> > > ==========================
> > > 
> > > The type resolution rules of ``get_array_module`` follow the same
> > > model as
> > > Python and NumPy's existing dispatch protocols: subclasses are
> > > called before
> > > super-classes, and otherwise left to right. ``__array_module__``
> > > is guaranteed
> > > to be called only  a single time on each unique type.
> > > 
> > > The actual implementation of `get_array_module` will be in C, but
> > > should be
> > > equivalent to this Python code:
> > > 
> > > .. code:: python
> > > 
> > >     def get_array_module(*arrays, default=numpy):
> > >         implementing_arrays, types =
> > > _implementing_arrays_and_types(arrays)
> > >         if not implementing_arrays and default is not None:
> > >             return default
> > >         for array in implementing_arrays:
> > >             module = array.__array_module__(types)
> > >             if module is not NotImplemented:
> > >                 return module
> > >         raise TypeError("no common array module found")
> > > 
> > >     def _implementing_arrays_and_types(relevant_arrays):
> > >         types = []
> > >         implementing_arrays = []
> > >         for array in relevant_arrays:
> > >             t = type(array)
> > >             if t not in types and hasattr(t, '__array_module__'):
> > >                 types.append(t)
> > >                 # Subclasses before superclasses, otherwise left
> > > to right
> > >                 index = len(implementing_arrays)
> > >                 for i, old_array in
> > > enumerate(implementing_arrays):
> > >                     if issubclass(t, type(old_array)):
> > >                         index = i
> > >                         break
> > >                 implementing_arrays.insert(index, array)
> > >         return implementing_arrays, types
> > > 
> > > Relationship with ``__array_ufunc__`` and ``__array_function__``
> > > ----------------------------------------------------------------
> > > 
> > > These older protocols have distinct use-cases and should remain
> > > ===============================================================
> > > 
> > > ``__array_module__`` is intended to resolve limitations of
> > > ``__array_function__``, so it is natural to consider whether it
> > > could entirely
> > > replace ``__array_function__``. This would offer dual benefits:
> > > (1) simplifying
> > > the user-story about how to override NumPy and (2) removing the
> > > slowdown
> > > associated with checking for dispatch when calling every NumPy
> > > function.
> > > 
> > > However, ``__array_module__`` and ``__array_function__`` are
> > > pretty different
> > > from a user perspective: it requires explicit calls to
> > > ``get_array_function``,
> > > rather than simply reusing original ``numpy`` functions. This is
> > > probably fine
> > > for *libraries* that rely on duck-arrays, but may be
> > > frustratingly verbose for
> > > interactive use.
> > > 
> > > Some of the dispatching use-cases for ``__array_ufunc__`` are
> > > also solved by
> > > ``__array_module__``, but not all of them. For example, it is
> > > still useful to
> > > be able to define non-NumPy ufuncs (e.g., from Numba or SciPy) in
> > > a generic way
> > > on non-NumPy arrays (e.g., with dask.array).
> > > 
> > > Given their existing adoption and distinct use cases, we don't
> > > think it makes
> > > sense to remove or deprecate ``__array_function__`` and
> > > ``__array_ufunc__`` at
> > > this time.
> > > 
> > > Mixin classes to implement ``__array_function__`` and
> > > ``__array_ufunc__``
> > > =================================================================
> > > ========
> > > 
> > > Despite the user-facing differences, ``__array_module__`` and a
> > > module
> > > implementing NumPy's API still contain sufficient functionality
> > > needed to
> > > implement dispatching with the existing duck array protocols.
> > > 
> > > For example, the following mixin classes would provide sensible
> > > defaults for
> > > these special methods in terms of ``get_array_module`` and
> > > ``__array_module__``:
> > > 
> > > .. code:: python
> > > 
> > >     class ArrayUfuncFromModuleMixin:
> > > 
> > >         def __array_ufunc__(self, ufunc, method, *inputs,
> > > **kwargs):
> > >             arrays = inputs + kwargs.get('out', ())
> > >             try:
> > >                 array_module = np.get_array_module(*arrays)
> > >             except TypeError:
> > >                 return NotImplemented
> > > 
> > >             try:
> > >                 # Note this may have false positive matches, if
> > > ufunc.__name__
> > >                 # matches the name of a ufunc defined by NumPy.
> > > Unfortunately
> > >                 # there is no way to determine in which module a
> > > ufunc was
> > >                 # defined.
> > >                 new_ufunc = getattr(array_module, ufunc.__name__)
> > >             except AttributeError:
> > >                 return NotImplemented
> > > 
> > >             try:
> > >                 callable = getattr(new_ufunc, method)
> > >             except AttributeError:
> > >                 return NotImplemented
> > > 
> > >             return callable(*inputs, **kwargs)
> > > 
> > >     class ArrayFunctionFromModuleMixin:
> > > 
> > >         def __array_function__(self, func, types, args, kwargs):
> > >             array_module = self.__array_module__(types)
> > >             if array_module is NotImplemented:
> > >                 return NotImplemented
> > > 
> > >             # Traverse submodules to find the appropriate
> > > function
> > >             modules = func.__module__.split('.')
> > >             assert modules[0] == 'numpy'
> > >             for submodule in modules[1:]:
> > >                 module = getattr(module, submodule, None)
> > >             new_func = getattr(module, func.__name__, None)
> > >             if new_func is None:
> > >                 return NotImplemented
> > > 
> > >             return new_func(*args, **kwargs)
> > > 
> > > To make it easier to write duck arrays, we could also add these
> > > mixin classes
> > > into ``numpy.lib.mixins`` (but the examples above may suffice).
> > > 
> > > Alternatives considered
> > > -----------------------
> > > 
> > > Naming
> > > ======
> > > 
> > > We like the name ``__array_module__`` because it mirrors the
> > > existing
> > > ``__array_function__`` and ``__array_ufunc__`` protocols. Another
> > > reasonable
> > > choice could be ``__array_namespace__``.
> > > 
> > > It is less clear what the NumPy function that calls this protocol
> > > should be
> > > called (``get_array_module`` in this proposal). Some possible
> > > alternatives:
> > > ``array_module``, ``common_array_module``,
> > > ``resolve_array_module``,
> > > ``get_namespace``, ``get_numpy``, ``get_numpylike_module``,
> > > ``get_duck_array_module``.
> > > 
> > > .. _requesting-restricted-subsets:
> > > 
> > > Requesting restricted subsets of NumPy's API
> > > ============================================
> > > 
> > > Over time, NumPy has accumulated a very large API surface, with
> > > over 600
> > > attributes in the top level ``numpy`` module alone. It is
> > > unlikely that any
> > > duck array library could or would want to implement all of these
> > > functions and
> > > classes, because the frequently used subset of NumPy is much
> > > smaller.
> > > 
> > > We think it would be useful exercise to define "minimal"
> > > subset(s) of NumPy's
> > > API, omitting rarely used or non-recommended functionality. For
> > > example,
> > > minimal NumPy might include ``stack``, but not the other stacking
> > > functions
> > > ``column_stack``, ``dstack``, ``hstack`` and ``vstack``. This
> > > could clearly
> > > indicate to duck array authors and users want functionality is
> > > core and what
> > > functionality they can skip.
> > > 
> > > Support for requesting a restricted subset of NumPy's API would
> > > be a natural
> > > feature to include in  ``get_array_function`` and
> > > ``__array_module__``, e.g.,
> > > 
> > > .. code:: python
> > > 
> > >     # array_module is only guaranteed to contain "minimal" NumPy
> > >     array_module = np.get_array_module(*arrays,
> > > request='minimal')
> > > 
> > > To facilitate testing with NumPy and use with any valid duck
> > > array library,
> > > NumPy itself would return restricted versions of the ``numpy``
> > > module when
> > > ``get_array_module`` is called only on NumPy arrays. Omitted
> > > functions would
> > > simply not exist.
> > > 
> > > Unfortunately, we have not yet figured out what these restricted
> > > subsets should
> > > be, so it doesn't make sense to do this yet. When/if we do, we
> > > could either add
> > > new keyword arguments to ``get_array_module`` or add new top
> > > level functions,
> > > e.g., ``get_minimal_array_module``. We would also need to add
> > > either a new
> > > protocol patterned off of ``__array_module__`` (e.g.,
> > > ``__array_module_minimal__``), or could add an optional second
> > > argument to
> > > ``__array_module__`` (catching errors with ``try``/``except``).
> > > 
> > > A new namespace for implicit dispatch
> > > =====================================
> > > 
> > > Instead of supporting overrides in the main `numpy` namespace
> > > with
> > > ``__array_function__``, we could create a new opt-in namespace,
> > > e.g.,
> > > ``numpy.api``, with versions of NumPy functions that support
> > > dispatching. These
> > > overrides would need new opt-in protocols, e.g.,
> > > ``__array_function_api__``
> > > patterned off of ``__array_function__``.
> > > 
> > > This would resolve the biggest limitations of
> > > ``__array_function__`` by being
> > > opt-in and would also allow for unambiguously overriding
> > > functions like
> > > ``asarray``, because ``np.api.asarray`` would always mean
> > > "convert an
> > > array-like object."  But it wouldn't solve all the dispatching
> > > needs met by
> > > ``__array_module__``, and would leave us with supporting a
> > > considerably more
> > > complex protocol both for array users and implementors.
> > > 
> > > We could potentially implement such a new namespace *via* the
> > > ``__array_module__`` protocol. Certainly some users would find
> > > this convenient,
> > > because it is slightly less boilerplate. But this would leave
> > > users with a
> > > confusing choice: when should they use `get_array_module` vs.
> > > `np.api.something`. Also, we would have to add and maintain a
> > > whole new module,
> > > which is considerably more expensive than merely adding a
> > > function.
> > > 
> > > Dispatching on both types and arrays instead of only types
> > > ==========================================================
> > > 
> > > Instead of supporting dispatch only via unique array types, we
> > > could also
> > > support dispatch via array objects, e.g., by passing an
> > > ``arrays`` argument as
> > > part of the ``__array_module__`` protocol. This could potentially
> > > be useful for
> > > dispatch for arrays with metadata, such provided by Dask and
> > > Pint, but would
> > > impose costs in terms of type safety and complexity.
> > > 
> > > For example, a library that supports arrays on both CPUs and GPUs
> > > might decide
> > > on which device to create a new arrays from functions like
> > > ``ones`` based on
> > > input arguments:
> > > 
> > > .. code:: python
> > > 
> > >     class Array:
> > >         def __array_module__(self, types, arrays):
> > >             useful_arrays = tuple(a in arrays if isinstance(a,
> > > Array))
> > >             if not useful_arrays:
> > >                 return NotImplemented
> > >             prefer_gpu = any(a.prefer_gpu for a in useful_arrays)
> > >             return ArrayModule(prefer_gpu)
> > > 
> > >     class ArrayModule:
> > >         def __init__(self, prefer_gpu):
> > >             self.prefer_gpu = prefer_gpu
> > >         
> > >         def __getattr__(self, name):
> > >             import base_module
> > >             base_func = getattr(base_module, name)
> > >             return functools.partial(base_func,
> > > prefer_gpu=self.prefer_gpu)
> > > 
> > > This might be useful, but it's not clear if we really need it.
> > > Pint seems to
> > > get along OK without any explicit array creation routines
> > > (favoring
> > > multiplication by units, e.g., ``np.ones(5) * ureg.m``), and for
> > > the most part
> > > Dask is also OK with existing ``__array_function__`` style
> > > overides (e.g.,
> > > favoring ``np.ones_like`` over ``np.ones``). Choosing whether to
> > > place an array
> > > on the CPU or GPU could be solved by `making array creation lazy
> > > <https://github.com/google/jax/pull/1668>`_.
> > > 
> > > .. _appendix-design-choices:
> > > 
> > > Appendix: design choices for API overrides
> > > ------------------------------------------
> > > 
> > > There is a large range of possible design choices for overriding
> > > NumPy's API.
> > > Here we discuss three major axes of the design decision that
> > > guided our design
> > > for ``__array_module__``.
> > > 
> > > Opt-in vs. opt-out for users
> > > ============================
> > > 
> > > The ``__array_ufunc__`` and ``__array_function__`` protocols
> > > provide a
> > > mechanism for overriding NumPy functions *within NumPy's existing
> > > namespace*.
> > > This means that users need to explicitly opt-out if they do not
> > > want any
> > > overridden behavior, e.g., by casting arrays with
> > > ``np.asarray()``.
> > > 
> > > In theory, this approach lowers the barrier for adopting these
> > > protocols in
> > > user code and libraries, because code that uses the standard
> > > NumPy namespace is
> > > automatically compatible. But in practice, this hasn't worked
> > > out. For example,
> > > most well-maintained libraries that use NumPy follow the best
> > > practice of
> > > casting all inputs with ``np.asarray()``, which they would have
> > > to explicitly
> > > relax to use ``__array_function__``. Our experience has been that
> > > making a
> > > library compatible with a new duck array type typically requires
> > > at least a
> > > small amount of work to accommodate differences in the data model
> > > and operations
> > > that can be implemented efficiently.
> > > 
> > > These opt-out approaches also considerably complicate backwards
> > > compatibility
> > > for libraries that adopt these protocols, because by opting in as
> > > a library
> > > they also opt-in their users, whether they expect it or not. For
> > > winning over
> > > libraries that have been unable to adopt ``__array_function__``,
> > > an opt-in
> > > approach seems like a must.
> > > 
> > > Explicit vs. implicit choice of implementation
> > > ==============================================
> > > 
> > > Both ``__array_ufunc__`` and ``__array_function__`` have implicit
> > > control over
> > > dispatching: the dispatched functions are determined via the
> > > appropriate
> > > protocols in every function call. This generalizes well to
> > > handling many
> > > different types of objects, as evidenced by its use for
> > > implementing arithmetic
> > > operators in Python, but it has two downsides:
> > > 
> > > 1. *Speed*: it imposes additional overhead in every function
> > > call, because each
> > >    function call needs to inspect each of its arguments for
> > > overrides. This is
> > >    why arithmetic on builtin Python numbers is slow.
> > > 2. *Readability*: it is not longer immediately evident to readers
> > > of code what
> > >    happens when a function is called, because the function's
> > > implementation
> > >    could be overridden by any of its arguments.
> > > 
> > > In contrast, importing a new library (e.g., ``import  dask.array
> > > as da``) with
> > > an API matching NumPy is entirely explicit. There is no overhead
> > > from dispatch
> > > or ambiguity about which implementation is being used.
> > > 
> > > Explicit and implicit choice of implementations are not mutually
> > > exclusive
> > > options. Indeed, most implementations of NumPy API overrides via
> > > ``__array_function__`` that we are familiar with (namely, dask,
> > > CuPy and
> > > sparse, but not Pint) also include an explicit way to use their
> > > version of
> > > NumPy's API by importing a module directly (``dask.array``,
> > > ``cupy`` or
> > > ``sparse``, respectively).
> > > 
> > > Local vs. non-local vs. global control
> > > ======================================
> > > 
> > > The final design axis is how users control the choice of API:
> > > 
> > > - **Local control**, as exemplified by multiple dispatch and
> > > Python protocols for
> > >   arithmetic, determines which implementation to use either by
> > > checking types
> > >   or calling methods on the direct arguments of a function.
> > > - **Non-local control** such as `np.errstate
> > >   <
> > > https://docs.scipy.org/doc/numpy/reference/generated/numpy.errstate.html>`_
> > >   overrides behavior with global-state via function decorators or
> > >   context-managers. Control is determined hierarchically, via the
> > > inner-most
> > >   context.
> > > - **Global control** provides a mechanism for users to set
> > > default behavior,
> > >   either via function calls or configuration files. For example,
> > > matplotlib
> > >   allows setting a global choice of plotting backend.
> > > 
> > > Local control is generally considered a best practice for API
> > > design, because
> > > control flow is entirely explicit, which makes it the easiest to
> > > understand.
> > > Non-local and global control are occasionally used, but generally
> > > either due to
> > > ignorance or a lack of better alternatives.
> > > 
> > > In the case of duck typing for NumPy's public API, we think non-
> > > local or global
> > > control would be mistakes, mostly because they **don't compose
> > > well**. If one
> > > library sets/needs one set of overrides and then internally calls
> > > a routine
> > > that expects another set of overrides, the resulting behavior may
> > > be very
> > > surprising. Higher order functions are especially problematic,
> > > because the
> > > context in which functions are evaluated may not be the context
> > > in which they
> > > are defined.
> > > 
> > > One class of override use cases where we think non-local and
> > > global control are
> > > appropriate is for choosing a backend system that is guaranteed
> > > to have an
> > > entirely consistent interface, such as a faster alternative
> > > implementation of
> > > ``numpy.fft`` on NumPy arrays. However, these are out of scope
> > > for the current
> > > proposal, which is focused on duck arrays.
> > > 
> > > 
> > > _______________________________________________
> > > NumPy-Discussion mailing list
> > > NumPy-Discussion at python.org
> > > https://mail.python.org/mailman/listinfo/numpy-discussion
> >  
> > _______________________________________________
> > NumPy-Discussion mailing list
> > NumPy-Discussion at python.org
> > https://mail.python.org/mailman/listinfo/numpy-discussion
> 
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at python.org
> https://mail.python.org/mailman/listinfo/numpy-discussion
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 833 bytes
Desc: This is a digitally signed message part
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20200206/ddf176eb/attachment-0001.sig>


More information about the NumPy-Discussion mailing list