Revised NEP-18, __array_function__ protocol
![](https://secure.gravatar.com/avatar/93a76a800ef6c5919baa8ba91120ee98.jpg?s=120&d=mm&r=g)
After much discussion (and the addition of three new co-authors!), I’m pleased to present a significantly revision of NumPy Enhancement Proposal 18: A dispatch mechanism for NumPy's high level array functions: http://www.numpy.org/neps/nep-0018-array-function-protocol.html The full text is also included below. Best, Stephan =========================================================== A dispatch mechanism for NumPy's high level array functions =========================================================== :Author: Stephan Hoyer <shoyer@google.com> :Author: Matthew Rocklin <mrocklin@gmail.com> :Author: Marten van Kerkwijk <mhvk@astro.utoronto.ca> :Author: Hameer Abbasi <hameerabbasi@yahoo.com> :Author: Eric Wieser <wieser.eric@gmail.com> :Status: Draft :Type: Standards Track :Created: 2018-05-29 Abstact ------- We propose the ``__array_function__`` protocol, to allow arguments of NumPy functions to define how that function operates on them. This will allow using NumPy as a high level API for efficient multi-dimensional array operations, even with array implementations that differ greatly from ``numpy.ndarray``. Detailed description -------------------- NumPy's high level ndarray API has been implemented several times outside of NumPy itself for different architectures, such as for GPU arrays (CuPy), Sparse arrays (scipy.sparse, pydata/sparse) and parallel arrays (Dask array) as well as various NumPy-like implementations in the deep learning frameworks, like TensorFlow and PyTorch. Similarly there are many projects that build on top of the NumPy API for labeled and indexed arrays (XArray), automatic differentiation (Autograd, Tangent), masked arrays (numpy.ma), physical units (astropy.units, pint, unyt), etc. that add additional functionality on top of the NumPy API. Most of these project also implement a close variation of NumPy's level high API. We would like to be able to use these libraries together, for example we would like to be able to place a CuPy array within XArray, or perform automatic differentiation on Dask array code. This would be easier to accomplish if code written for NumPy ndarrays could also be used by other NumPy-like projects. For example, we would like for the following code example to work equally well with any NumPy-like array object: .. code:: python def f(x): y = np.tensordot(x, x.T) return np.mean(np.exp(y)) Some of this is possible today with various protocol mechanisms within NumPy. - The ``np.exp`` function checks the ``__array_ufunc__`` protocol - The ``.T`` method works using Python's method dispatch - The ``np.mean`` function explicitly checks for a ``.mean`` method on the argument However other functions, like ``np.tensordot`` do not dispatch, and instead are likely to coerce to a NumPy array (using the ``__array__``) protocol, or err outright. To achieve enough coverage of the NumPy API to support downstream projects like XArray and autograd we want to support *almost all* functions within NumPy, which calls for a more reaching protocol than just ``__array_ufunc__``. We would like a protocol that allows arguments of a NumPy function to take control and divert execution to another function (for example a GPU or parallel implementation) in a way that is safe and consistent across projects. Implementation -------------- We propose adding support for a new protocol in NumPy, ``__array_function__``. This protocol is intended to be a catch-all for NumPy functionality that is not covered by the ``__array_ufunc__`` protocol for universal functions (like ``np.exp``). The semantics are very similar to ``__array_ufunc__``, except the operation is specified by an arbitrary callable object rather than a ufunc instance and method. A prototype implementation can be found in `this notebook < https://nbviewer.jupyter.org/gist/shoyer/1f0a308a06cd96df20879a1ddb8f0006
`_.
The interface ~~~~~~~~~~~~~ We propose the following signature for implementations of ``__array_function__``: .. code-block:: python def __array_function__(self, func, types, args, kwargs) - ``func`` is an arbitrary callable exposed by NumPy's public API, which was called in the form ``func(*args, **kwargs)``. - ``types`` is a ``frozenset`` of unique argument types from the original NumPy function call that implement ``__array_function__``. - The tuple ``args`` and dict ``kwargs`` are directly passed on from the original call. Unlike ``__array_ufunc__``, there are no high-level guarantees about the type of ``func``, or about which of ``args`` and ``kwargs`` may contain objects implementing the array API. As a convenience for ``__array_function__`` implementors, ``types`` provides all argument types with an ``'__array_function__'`` attribute. This allows downstream implementations to quickly determine if they are likely able to support the operation. A ``frozenset`` is used to ensure that ``__array_function__`` implementations cannot rely on the iteration order of ``types``, which would facilitate violating the well-defined "Type casting hierarchy" described in `NEP-13 <https://www.numpy.org/neps/nep-0013-ufunc-overrides.html>`_. Example for a project implementing the NumPy API ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Most implementations of ``__array_function__`` will start with two checks: 1. Is the given function something that we know how to overload? 2. Are all arguments of a type that we know how to handle? If these conditions hold, ``__array_function__`` should return the result from calling its implementation for ``func(*args, **kwargs)``. Otherwise, it should return the sentinel value ``NotImplemented``, indicating that the function is not implemented by these types. This is preferable to raising ``TypeError`` directly, because it gives *other* arguments the opportunity to define the operations. There are no general requirements on the return value from ``__array_function__``, although most sensible implementations should probably return array(s) with the same type as one of the function's arguments. If/when Python gains `typing support for protocols <https://www.python.org/dev/peps/pep-0544/>`_ and NumPy adds static type annotations, the ``@overload`` implementation for ``SupportsArrayFunction`` will indicate a return type of ``Any``. It may also be convenient to define a custom decorators (``implements`` below) for registering ``__array_function__`` implementations. .. code:: python HANDLED_FUNCTIONS = {} class MyArray: def __array_function__(self, func, types, args, kwargs): if func not in HANDLED_FUNCTIONS: return NotImplemented # Note: this allows subclasses that don't override # __array_function__ to handle MyArray objects if not all(issubclass(t, MyArray) for t in types): return NotImplemented return HANDLED_FUNCTIONS[func](*args, **kwargs) def implements(numpy_function): """Register an __array_function__ implementation for MyArray objects.""" def decorator(func): HANDLED_FUNCTIONS[numpy_function] = func return func return decorator @implements(np.concatenate) def concatenate(arrays, axis=0, out=None): ... # implementation of concatenate for MyArray objects @implements(np.broadcast_to) def broadcast_to(array, shape): ... # implementation of broadcast_to for MyArray objects Note that it is not required for ``__array_function__`` implementations to include *all* of the corresponding NumPy function's optional arguments (e.g., ``broadcast_to`` above omits the irrelevant ``subok`` argument). Optional arguments are only passed in to ``__array_function__`` if they were explicitly used in the NumPy function call. Necessary changes within the NumPy codebase itself ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ This will require two changes within the NumPy codebase: 1. A function to inspect available inputs, look for the ``__array_function__`` attribute on those inputs, and call those methods appropriately until one succeeds. This needs to be fast in the common all-NumPy case, and have acceptable performance (no worse than linear time) even if the number of overloaded inputs is large (e.g., as might be the case for `np.concatenate`). This is one additional function of moderate complexity. 2. Calling this function within all relevant NumPy functions. This affects many parts of the NumPy codebase, although with very low complexity. Finding and calling the right ``__array_function__`` ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ Given a NumPy function, ``*args`` and ``**kwargs`` inputs, we need to search through ``*args`` and ``**kwargs`` for all appropriate inputs that might have the ``__array_function__`` attribute. Then we need to select among those possible methods and execute the right one. Negotiating between several possible implementations can be complex. Finding arguments ''''''''''''''''' Valid arguments may be directly in the ``*args`` and ``**kwargs``, such as in the case for ``np.tensordot(left, right, out=out)``, or they may be nested within lists or dictionaries, such as in the case of ``np.concatenate([x, y, z])``. This can be problematic for two reasons: 1. Some functions are given long lists of values, and traversing them might be prohibitively expensive. 2. Some functions may have arguments that we don't want to inspect, even if they have the ``__array_function__`` method. To resolve these issues, NumPy functions should explicitly indicate which of their arguments may be overloaded, and how these arguments should be checked. As a rule, this should include all arguments documented as either ``array_like`` or ``ndarray``. We propose to do so by writing "dispatcher" functions for each overloaded NumPy function: - These functions will be called with the exact same arguments that were passed into the NumPy function (i.e., ``dispatcher(*args, **kwargs)``), and should return an iterable of arguments to check for overrides. - Dispatcher functions are required to share the exact same positional, optional and keyword-only arguments as their corresponding NumPy functions. Otherwise, valid invocations of a NumPy function could result in an error when calling its dispatcher. - Because default *values* for keyword arguments do not have ``__array_function__`` attributes, by convention we set all default argument values to ``None``. This reduces the likelihood of signatures falling out of sync, and minimizes extraneous information in the dispatcher. The only exception should be cases where the argument value in some way effects dispatching, which should be rare. An example of the dispatcher for ``np.concatenate`` may be instructive: .. code:: python def _concatenate_dispatcher(arrays, axis=None, out=None): for array in arrays: yield array if out is not None: yield out The concatenate dispatcher is written as generator function, which allows it to potentially include the value of the optional ``out`` argument without needing to create a new sequence with the (potentially long) list of objects to be concatenated. Trying ``__array_function__`` methods until the right one works ''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''' Many arguments may implement the ``__array_function__`` protocol. Some of these may decide that, given the available inputs, they are unable to determine the correct result. How do we call the right one? If several are valid then which has precedence? For the most part, the rules for dispatch with ``__array_function__`` match those for ``__array_ufunc__`` (see `NEP-13 <https://www.numpy.org/neps/nep-0013-ufunc-overrides.html>`_). In particular: - NumPy will gather implementations of ``__array_function__`` from all specified inputs and call them in order: subclasses before superclasses, and otherwise left to right. Note that in some edge cases involving subclasses, this differs slightly from the `current behavior <https://bugs.python.org/issue30140>`_ of Python. - Implementations of ``__array_function__`` indicate that they can handle the operation by returning any value other than ``NotImplemented``. - If all ``__array_function__`` methods return ``NotImplemented``, NumPy will raise ``TypeError``. One deviation from the current behavior of ``__array_ufunc__`` is that NumPy will only call ``__array_function__`` on the *first* argument of each unique type. This matches Python's `rule for calling reflected methods < https://docs.python.org/3/reference/datamodel.html#object.__ror__>`_, and this ensures that checking overloads has acceptable performance even when there are a large number of overloaded arguments. To avoid long-term divergence between these two dispatch protocols, we should `also update <https://github.com/numpy/numpy/issues/11306>`_ ``__array_ufunc__`` to match this behavior. Special handling of ``numpy.ndarray`` ''''''''''''''''''''''''''''''''''''' The use cases for subclasses with ``__array_function__`` are the same as those with ``__array_ufunc__``, so ``numpy.ndarray`` should also define a ``__array_function__`` method mirroring ``ndarray.__array_ufunc__``: .. code:: python def __array_function__(self, func, types, args, kwargs): # Cannot handle items that have __array_function__ other than our own. for t in types: if (hasattr(t, '__array_function__') and t.__array_function__ is not ndarray.__array_function__): return NotImplemented # Arguments contain no overrides, so we can safely call the # overloaded function again. return func(*args, **kwargs) To avoid infinite recursion, the dispatch rules for ``__array_function__`` need also the same special case they have for ``__array_ufunc__``: any arguments with an ``__array_function__`` method that is identical to ``numpy.ndarray.__array_function__`` are not be called as ``__array_function__`` implementations. Changes within NumPy functions ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ Given a function defining the above behavior, for now call it ``try_array_function_override``, we now need to call that function from within every relevant NumPy function. This is a pervasive change, but of fairly simple and innocuous code that should complete quickly and without effect if no arguments implement the ``__array_function__`` protocol. In most cases, these functions should written using the ``array_function_dispatch`` decorator, which also associates dispatcher functions: .. code:: python def array_function_dispatch(dispatcher): """Wrap a function for dispatch with the __array_function__ protocol.""" def decorator(func): @functools.wraps(func) def new_func(*args, **kwargs): relevant_arguments = dispatcher(*args, **kwargs) success, value = try_array_function_override( new_func, relevant_arguments, args, kwargs) if success: return value return func(*args, **kwargs) return new_func return decorator # example usage def _broadcast_to_dispatcher(array, shape, subok=None, **ignored_kwargs): return (array,) @array_function_dispatch(_broadcast_to_dispatcher) def broadcast_to(array, shape, subok=False): ... # existing definition of np.broadcast_to Using a decorator is great! We don't need to change the definitions of existing NumPy functions, and only need to write a few additional lines for the dispatcher function. We could even reuse a single dispatcher for families of functions with the same signature (e.g., ``sum`` and ``prod``). For such functions, the largest change could be adding a few lines to the docstring to note which arguments are checked for overloads. It's particularly worth calling out the decorator's use of ``functools.wraps``: - This ensures that the wrapped function has the same name and docstring as the wrapped NumPy function. - On Python 3, it also ensures that the decorator function copies the original function signature, which is important for introspection based tools such as auto-complete. If we care about preserving function signatures on Python 2, for the `short while longer < http://www.numpy.org/neps/nep-0014-dropping-python2.7-proposal.html>`_ that NumPy supports Python 2.7, we do could do so by adding a vendored dependency on the (single-file, BSD licensed) `decorator library <https://github.com/micheles/decorator>`_. - Finally, it ensures that the wrapped function `can be pickled < http://gael-varoquaux.info/programming/decoration-in-python-done-right-decor...
`_.
In a few cases, it would not make sense to use the ``array_function_dispatch`` decorator directly, but override implementation in terms of ``try_array_function_override`` should still be straightforward. - Functions written entirely in C (e.g., ``np.concatenate``) can't use decorators, but they could still use a C equivalent of ``try_array_function_override``. If performance is not a concern, they could also be easily wrapped with a small Python wrapper. - The ``__call__`` method of ``np.vectorize`` can't be decorated with <p style="margin:0px;font-stretch:normal;font-size:17.4px;line-
![](https://secure.gravatar.com/avatar/1198e2d145718c841565712312e04227.jpg?s=120&d=mm&r=g)
On 27. Jun 2018 at 07:48, Stephan Hoyer <shoyer@gmail.com> wrote: After much discussion (and the addition of three new co-authors!), I’m pleased to present a significantly revision of NumPy Enhancement Proposal 18: A dispatch mechanism for NumPy's high level array functions: http://www.numpy.org/neps/nep-0018-array-function-protocol.html The full text is also included below. Best, Stephan =========================================================== A dispatch mechanism for NumPy's high level array functions =========================================================== :Author: Stephan Hoyer <shoyer@google.com> :Author: Matthew Rocklin <mrocklin@gmail.com> :Author: Marten van Kerkwijk <mhvk@astro.utoronto.ca> :Author: Hameer Abbasi <hameerabbasi@yahoo.com> :Author: Eric Wieser <wieser.eric@gmail.com> :Status: Draft :Type: Standards Track :Created: 2018-05-29 Abstact ------- We propose the ``__array_function__`` protocol, to allow arguments of NumPy functions to define how that function operates on them. This will allow using NumPy as a high level API for efficient multi-dimensional array operations, even with array implementations that differ greatly from ``numpy.ndarray``. Detailed description -------------------- NumPy's high level ndarray API has been implemented several times outside of NumPy itself for different architectures, such as for GPU arrays (CuPy), Sparse arrays (scipy.sparse, pydata/sparse) and parallel arrays (Dask array) as well as various NumPy-like implementations in the deep learning frameworks, like TensorFlow and PyTorch. Similarly there are many projects that build on top of the NumPy API for labeled and indexed arrays (XArray), automatic differentiation (Autograd, Tangent), masked arrays (numpy.ma), physical units (astropy.units, pint, unyt), etc. that add additional functionality on top of the NumPy API. Most of these project also implement a close variation of NumPy's level high API. We would like to be able to use these libraries together, for example we would like to be able to place a CuPy array within XArray, or perform automatic differentiation on Dask array code. This would be easier to accomplish if code written for NumPy ndarrays could also be used by other NumPy-like projects. For example, we would like for the following code example to work equally well with any NumPy-like array object: .. code:: python def f(x): y = np.tensordot(x, x.T) return np.mean(np.exp(y)) Some of this is possible today with various protocol mechanisms within NumPy. - The ``np.exp`` function checks the ``__array_ufunc__`` protocol - The ``.T`` method works using Python's method dispatch - The ``np.mean`` function explicitly checks for a ``.mean`` method on the argument However other functions, like ``np.tensordot`` do not dispatch, and instead are likely to coerce to a NumPy array (using the ``__array__``) protocol, or err outright. To achieve enough coverage of the NumPy API to support downstream projects like XArray and autograd we want to support *almost all* functions within NumPy, which calls for a more reaching protocol than just ``__array_ufunc__``. We would like a protocol that allows arguments of a NumPy function to take control and divert execution to another function (for example a GPU or parallel implementation) in a way that is safe and consistent across projects. Implementation -------------- We propose adding support for a new protocol in NumPy, ``__array_function__``. This protocol is intended to be a catch-all for NumPy functionality that is not covered by the ``__array_ufunc__`` protocol for universal functions (like ``np.exp``). The semantics are very similar to ``__array_ufunc__``, except the operation is specified by an arbitrary callable object rather than a ufunc instance and method. A prototype implementation can be found in `this notebook < https://nbviewer.jupyter.org/gist/shoyer/1f0a308a06cd96df20879a1ddb8f0006
`_.
The interface ~~~~~~~~~~~~~ We propose the following signature for implementations of ``__array_function__``: .. code-block:: python def __array_function__(self, func, types, args, kwargs) - ``func`` is an arbitrary callable exposed by NumPy's public API, which was called in the form ``func(*args, **kwargs)``. - ``types`` is a ``frozenset`` of unique argument types from the original NumPy function call that implement ``__array_function__``. - The tuple ``args`` and dict ``kwargs`` are directly passed on from the original call. Unlike ``__array_ufunc__``, there are no high-level guarantees about the type of ``func``, or about which of ``args`` and ``kwargs`` may contain objects implementing the array API. As a convenience for ``__array_function__`` implementors, ``types`` provides all argument types with an ``'__array_function__'`` attribute. This allows downstream implementations to quickly determine if they are likely able to support the operation. A ``frozenset`` is used to ensure that ``__array_function__`` implementations cannot rely on the iteration order of ``types``, which would facilitate violating the well-defined "Type casting hierarchy" described in `NEP-13 <https://www.numpy.org/neps/nep-0013-ufunc-overrides.html>`_. Example for a project implementing the NumPy API ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Most implementations of ``__array_function__`` will start with two checks: 1. Is the given function something that we know how to overload? 2. Are all arguments of a type that we know how to handle? If these conditions hold, ``__array_function__`` should return the result from calling its implementation for ``func(*args, **kwargs)``. Otherwise, it should return the sentinel value ``NotImplemented``, indicating that the function is not implemented by these types. This is preferable to raising ``TypeError`` directly, because it gives *other* arguments the opportunity to define the operations. There are no general requirements on the return value from ``__array_function__``, although most sensible implementations should probably return array(s) with the same type as one of the function's arguments. If/when Python gains `typing support for protocols <https://www.python.org/dev/peps/pep-0544/>`_ and NumPy adds static type annotations, the ``@overload`` implementation for ``SupportsArrayFunction`` will indicate a return type of ``Any``. It may also be convenient to define a custom decorators (``implements`` below) for registering ``__array_function__`` implementations. .. code:: python HANDLED_FUNCTIONS = {} class MyArray: def __array_function__(self, func, types, args, kwargs): if func not in HANDLED_FUNCTIONS: return NotImplemented # Note: this allows subclasses that don't override # __array_function__ to handle MyArray objects if not all(issubclass(t, MyArray) for t in types): return NotImplemented return HANDLED_FUNCTIONS[func](*args, **kwargs) def implements(numpy_function): """Register an __array_function__ implementation for MyArray objects.""" def decorator(func): HANDLED_FUNCTIONS[numpy_function] = func return func return decorator @implements(np.concatenate) def concatenate(arrays, axis=0, out=None): ... # implementation of concatenate for MyArray objects @implements(np.broadcast_to) def broadcast_to(array, shape): ... # implementation of broadcast_to for MyArray objects Note that it is not required for ``__array_function__`` implementations to include *all* of the corresponding NumPy function's optional arguments (e.g., ``broadcast_to`` above omits the irrelevant ``subok`` argument). Optional arguments are only passed in to ``__array_function__`` if they were explicitly used in the NumPy function call. Necessary changes within the NumPy codebase itself ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ This will require two changes within the NumPy codebase: 1. A function to inspect available inputs, look for the ``__array_function__`` attribute on those inputs, and call those methods appropriately until one succeeds. This needs to be fast in the common all-NumPy case, and have acceptable performance (no worse than linear time) even if the number of overloaded inputs is large (e.g., as might be the case for `np.concatenate`). This is one additional function of moderate complexity. 2. Calling this function within all relevant NumPy functions. This affects many parts of the NumPy codebase, although with very low complexity. Finding and calling the right ``__array_function__`` ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ Given a NumPy function, ``*args`` and ``**kwargs`` inputs, we need to search through ``*args`` and ``**kwargs`` for all appropriate inputs that might have the ``__array_function__`` attribute. Then we need to select among those possible methods and execute the right one. Negotiating between several possible implementations can be complex. Finding arguments ''''''''''''''''' Valid arguments may be directly in the ``*args`` and ``**kwargs``, such as in the case for ``np.tensordot(left, right, out=out)``, or they may be nested within lists or dictionaries, such as in the case of ``np.concatenate([x, y, z])``. This can be problematic for two reasons: 1. Some functions are given long lists of values, and traversing them might be prohibitively expensive. 2. Some functions may have arguments that we don't want to inspect, even if they have the ``__array_function__`` method. To resolve these issues, NumPy functions should explicitly indicate which of their arguments may be overloaded, and how these arguments should be checked. As a rule, this should include all arguments documented as either ``array_like`` or ``ndarray``. We propose to do so by writing "dispatcher" functions for each overloaded NumPy function: - These functions will be called with the exact same arguments that were passed into the NumPy function (i.e., ``dispatcher(*args, **kwargs)``), and should return an iterable of arguments to check for overrides. - Dispatcher functions are required to share the exact same positional, optional and keyword-only arguments as their corresponding NumPy functions. Otherwise, valid invocations of a NumPy function could result in an error when calling its dispatcher. - Because default *values* for keyword arguments do not have ``__array_function__`` attributes, by convention we set all default argument values to ``None``. This reduces the likelihood of signatures falling out of sync, and minimizes extraneous information in the dispatcher. The only exception should be cases where the argument value in some way effects dispatching, which should be rare. An example of the dispatcher for ``np.concatenate`` may be instructive: .. code:: python def _concatenate_dispatcher(arrays, axis=None, out=None): for array in arrays: yield array if out is not None: yield out The concatenate dispatcher is written as generator function, which allows it to potentially include the value of the optional ``out`` argument without needing to create a new sequence with the (potentially long) list of objects to be concatenated. Trying ``__array_function__`` methods until the right one works ''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''' Many arguments may implement the ``__array_function__`` protocol. Some of these may decide that, given the available inputs, they are unable to determine the correct result. How do we call the right one? If several are valid then which has precedence? For the most part, the rules for dispatch with ``__array_function__`` match those for ``__array_ufunc__`` (see `NEP-13 <https://www.numpy.org/neps/nep-0013-ufunc-overrides.html>`_). In particular: - NumPy will gather implementations of ``__array_function__`` from all specified inputs and call them in order: subclasses before superclasses, and otherwise left to right. Note that in some edge cases involving subclasses, this differs slightly from the `current behavior <https://bugs.python.org/issue30140>`_ of Python. - Implementations of ``__array_function__`` indicate that they can handle the operation by returning any value other than ``NotImplemented``. - If all ``__array_function__`` methods return ``NotImplemented``, NumPy will raise ``TypeError``. One deviation from the current behavior of ``__array_ufunc__`` is that NumPy will only call ``__array_function__`` on the *first* argument of each unique type. This matches Python's `rule for calling reflected methods < https://docs.python.org/3/reference/datamodel.html#object.__ror__>`_, and this ensures that checking overloads has acceptable performance even when there are a large number of overloaded arguments. To avoid long-term divergence between these two dispatch protocols, we should `also update <https://github.com/numpy/numpy/issues/11306>`_ ``__array_ufunc__`` to match this behavior. Special handling of ``numpy.ndarray`` ''''''''''''''''''''''''''''''''''''' The use cases for subclasses with ``__array_function__`` are the same as those with ``__array_ufunc__``, so ``numpy.ndarray`` should also define a ``__array_function__`` method mirroring ``ndarray.__array_ufunc__``: .. code:: python def __array_function__(self, func, types, args, kwargs): # Cannot handle items that have __array_function__ other than our own. for t in types: if (hasattr(t, '__array_function__') and t.__array_function__ is not ndarray.__array_function__): return NotImplemented # Arguments contain no overrides, so we can safely call the # overloaded function again. return func(*args, **kwargs) To avoid infinite recursion, the dispatch rules for ``__array_function__`` need also the same special case they have for ``__array_ufunc__``: any arguments with an ``__array_function__`` method that is identical to ``numpy.ndarray.__array_function__`` are not be called as ``__array_function__`` implementations. Changes within NumPy functions ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ Given a function defining the above behavior, for now call it ``try_array_function_override``, we now need to call that function from within every relevant NumPy function. This is a pervasive change, but of fairly simple and innocuous code that should complete quickly and without effect if no arguments implement the ``__array_function__`` protocol. In most cases, these functions should written using the ``array_function_dispatch`` decorator, which also associates dispatcher functions: .. code:: python def array_function_dispatch(dispatcher): """Wrap a function for dispatch with the __array_function__ protocol.""" def decorator(func): @functools.wraps(func) def new_func(*args, **kwargs): relevant_arguments = dispatcher(*args, **kwargs) success, value = try_array_function_override( new_func, relevant_arguments, args, kwargs) if success: return value return func(*args, **kwargs) return new_func return decorator # example usage def _broadcast_to_dispatcher(array, shape, subok=None, **ignored_kwargs): return (array,) @array_function_dispatch(_broadcast_to_dispatcher) def broadcast_to(array, shape, subok=False): ... # existing definition of np.broadcast_to Using a decorator is great! We don't need to change the definitions of existing NumPy functions, and only need to write a few additional lines for the dispatcher function. We could even reuse a single dispatcher for families of functions with the same signature (e.g., ``sum`` and ``prod``). For such functions, the largest change could be adding a few lines to the docstring to note which arguments are checked for overloads. It's particularly worth calling out the decorator's use of ``functools.wraps``: - This ensures that the wrapped function has the same name and docstring as the wrapped NumPy function. - On Python 3, it also ensures that the decorator function copies the original function signature, which is important for introspection based tools such as auto-complete. If we care about preserving function signatures on Python 2, for the `short while longer < http://www.numpy.org/neps/nep-0014-dropping-python2.7-proposal.html>`_ that NumPy supports Python 2.7, we do could do so by adding a vendored dependency on the (single-file, BSD licensed) `decorator library <https://github.com/micheles/decorator>`_. - Finally, it ensures that the wrapped function `can be pickled < http://gael-varoquaux.info/programming/decoration-in-python-done-right-decor...
`_.
In a few cases, it would not make sense to use the ``array_function_dispatch`` decorator directly, but override implementation in terms of ``try_array_function_override`` should still be straightforward. - Functions written entirely in C (e.g., ``np.concatenate``) can't use decorators, but they could still use a C equivalent of ``try_array_function_override``. If performance is not a concern, they could also be easily wrapped with a small Python wrapper. - The ``__call__`` method of ``np.vectorize`` can't be decorated with <p style="margin:0px;font-stretch:normal;font-size:17.4px;line- _______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@python.org https://mail.python.org/mailman/listinfo/numpy-discussion I would like to propose that we use `__array_function` in the following manner for functions that create arrays: - `array_reference` for indicating the “reference array” whose `__array_function__` implementation will be called. For example, `np.arange(5, array_reference=some_dask_array)`. - I use a reference in the design rather than a type because for some arrays (such as Dask), chunk sizes or other reference data is needed to make this work. I realise that this is a big design decision, so I welcome any input! Best Regards, Hameer Abbasi Sent from Astro <https://www.helloastro.com> for Mac
![](https://secure.gravatar.com/avatar/1198e2d145718c841565712312e04227.jpg?s=120&d=mm&r=g)
Hi Martin, It is. The point of the proposed feature was to handle array generation mechanisms, that don't take an array as input in the standard NumPy API. Giving them a reference handles both the dispatch and the decision about which implementation to call. I'm confused: Isn't your reference array just `self`?
![](https://secure.gravatar.com/avatar/851ff10fbb1363b7d6111ac60194cc1c.jpg?s=120&d=mm&r=g)
Hi Hameer, It is. The point of the proposed feature was to handle array generation
Sorry, I had clearly misunderstood. It would indeed be nice for overrides to work on functions like `zeros` or `arange` as well, but it seems strange to change the signature just for that. As a possible alternative, should we perhaps generally check for overrides on `dtype`? All the best, Marten
![](https://secure.gravatar.com/avatar/1198e2d145718c841565712312e04227.jpg?s=120&d=mm&r=g)
Hi Marten, Sorry, I had clearly misunderstood. It would indeed be nice for overrides to work on functions like `zeros` or `arange` as well, but it seems strange to change the signature just for that. As a possible alternative, should we perhaps generally check for overrides on `dtype`? While this very clearly makes sense for something like astropy, it has a few drawbacks: - Other duck arrays such as Dask need more information than just the dtype. For example, Dask needs chunk sizes, XArray needs axis labels, and pydata/sparse needs to know the type of the reference array in order to make one of the same type. The information in a reference array is a strict superset of information in the dtype. - There’s a need for a separate protocol, which might be a lot harder to work with for both NumPy and library authors. - Some things, like numpy.random.RandomState, don’t accept a dtype argument. As for your concern about changing the signature, it’s easy enough with a decorator. We’ll need a separate decorator for array generation functions. Something like: def array_generation_function(func): @functools.wraps(func) def wrapped(*args, **kwargs, array_reference=np._NoValue): if array_reference is not np._NoValue: success, result = try_array_function_override(wrapped, [array_reference], args, kwargs) if success: return result return func(*args, **kwargs) return wrapped Hameer Abbasi
![](https://secure.gravatar.com/avatar/851ff10fbb1363b7d6111ac60194cc1c.jpg?s=120&d=mm&r=g)
Hi Hameer, I think the override on `dtype` would work - after all, the override is checked before anything is done, so one can just pass in `self` if one wishes (or some helper class that contains both `self` and any desired further information. But, as you note, it would not cover everything, and your `array_reference` idea definitely makes things more uniform. Indeed, it would allow one to implement things like `np.zeros_like` using `np.zero`, which seems quite nice. Still, I'm not sure whether this should be included in the present NEP or is best done separately after, with a few concrete examples of where it would be useful. All the best, Marten On Sat, Jun 30, 2018 at 10:40 AM, Hameer Abbasi <einstein.edison@gmail.com> wrote:
![](https://secure.gravatar.com/avatar/1198e2d145718c841565712312e04227.jpg?s=120&d=mm&r=g)
Hi Marten, Still, I'm not sure whether this should be included in the present NEP or is best done separately after, with a few concrete examples of where it would be useful. There already are concrete examples from Dask and CuPy, and this is currently a blocker for them, which is part of the reason I’m pushing so hard for it. See #11074 <https://github.com/numpy/numpy/issues/11074> for a context, and I think it was part of the reason that inspired Matt and Stephan to write this protocol in the first place. Best Regards, Hameer Abbasi
![](https://secure.gravatar.com/avatar/93a76a800ef6c5919baa8ba91120ee98.jpg?s=120&d=mm&r=g)
On Sat, Jun 30, 2018 at 11:59 AM Hameer Abbasi <einstein.edison@gmail.com> wrote:
Overloading np.ones_like() is definitely in scope already. I’d love to see a generic way of doing random number generation, but I agree with Martin that I don’t see it fitting a naturally into this NEP. An invasive change to add an array_reference argument to a bunch of functions might indeed be worthy of its own NEP, but again I’m not convinced that’s actually the right approach. I’d rather add a few new functions like random_like, which is a small enough change that concensus on the list might be enough.
![](https://secure.gravatar.com/avatar/764323a14e554c97ab74177e0bce51d4.jpg?s=120&d=mm&r=g)
On Sat, Jun 30, 2018 at 12:14 PM Stephan Hoyer <shoyer@gmail.com> wrote:
random_like() seems very weird to me. It doesn't seem like a function that anyone actually wants. It seems like what people actually want is to be able to draw random numbers from any distribution as a specified array-like type and shape, not just sample U(0, 1) with the shape of an existing array. The most workable way to do this is to modify RandomGenerator (i.e. the new RandomState design)[1] to accept the array-like type in the class constructor, and modify its internals to do the right thing. Because the intrusion on the API is so small, that doesn't require a NEP, just a PR (a long, complicated, and tedious PR, to be sure)[2]. There are a bunch of technical issues (if you want to avoid memory copies) because the Cython implementation requires direct memory access, but that's intrinsic to any solution to this problem, regardless of the API choices. random_like() would have the same issues. [1] https://github.com/bashtage/randomgen [2] Sorry, Kevin. -- Robert Kern
![](https://secure.gravatar.com/avatar/93a76a800ef6c5919baa8ba91120ee98.jpg?s=120&d=mm&r=g)
On Tue, Jun 26, 2018 at 11:27 PM Hameer Abbasi <einstein.edison@gmail.com> wrote:
These are somewhat similar to the existing ones_like, zeros_like and full_like. My inclination would be to consider adding new functions/methods for these rather than a new argument, e.g., arange_like() and random_like(), which could then use the standard __array_function__ dispatching mechanism. But this is pretty orthogonal to the design of __array_function__ either way, so I think we could safely defer this to another NEP (which could be pretty short!). One concern this does raise is how to handle methods like those on RandomState, even though methods like random_like() don't currently exist. Distribution objects from scipy.stats could have similar use cases. So perhaps it's worth "future proofing" the interface by passing `obj` and `method` to __array_function__ rather than only `func`. It is slower to call a func via func.__call__ than func, but only very marginally (~100 ns in my tests).
![](https://secure.gravatar.com/avatar/851ff10fbb1363b7d6111ac60194cc1c.jpg?s=120&d=mm&r=g)
On Wed, Jun 27, 2018 at 3:50 PM, Stephan Hoyer <shoyer@gmail.com> wrote: <snip>
That would make it more similar yet to `__array_ufunc__`, but I'm not sure how useful it is, as you cannot generically assume the methods have the same arguments and hence they need their own dispatcher. Once you're there you might as well pass them on directly (since any callable can be used as the function). Indeed, for `__array_ufunc__`, this might not have been a bad idea either... -- Marten
![](https://secure.gravatar.com/avatar/1198e2d145718c841565712312e04227.jpg?s=120&d=mm&r=g)
I think the usefulness of this feature is actually needed. Consider `np.random.RandomState`. If we were to add what I proposed, the two could work very nicely to (for example) do things like creating Dask random arrays, from RandomState objects. For reproducibility, Dask could generate multiple RandomState objects with a seed sequential in the job numbers. Looping in Matt Rocklin for this — He might have some input about the design. Best Regards, Hameer Abbasi Sent from Astro <https://www.helloastro.com> for Mac On 28. Jun 2018 at 14:37, Marten van Kerkwijk <m.h.vankerkwijk@gmail.com> wrote: On Wed, Jun 27, 2018 at 3:50 PM, Stephan Hoyer <shoyer@gmail.com> wrote: <snip>
That would make it more similar yet to `__array_ufunc__`, but I'm not sure how useful it is, as you cannot generically assume the methods have the same arguments and hence they need their own dispatcher. Once you're there you might as well pass them on directly (since any callable can be used as the function). Indeed, for `__array_ufunc__`, this might not have been a bad idea either... -- Marten _______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@python.org https://mail.python.org/mailman/listinfo/numpy-discussion
![](https://secure.gravatar.com/avatar/93a76a800ef6c5919baa8ba91120ee98.jpg?s=120&d=mm&r=g)
On Wed, Jun 27, 2018 at 12:50 PM Stephan Hoyer <shoyer@gmail.com> wrote:
I did a little more digging, and turned up the __self__ and __func__ attributes of bound methods: https://stackoverflow.com/questions/4679592/how-to-find-instance-of-a-bound-... So we might need another decorator function, but it seems that the current interface would actually suffice just fine for overriding methods. I'll update the NEP with some examples. It will look something like: def __array_function__(self, func, types, args, kwargs): ... if isinstance(func, types.MethodType): object = func.__self__ unbound_func = func.__func__ ... Given that functions are the most common case, I think it's best to keep with `func` as the main interface, but it's good to know that this does not preclude overriding methods.
![](https://secure.gravatar.com/avatar/851ff10fbb1363b7d6111ac60194cc1c.jpg?s=120&d=mm&r=g)
For C classes like the ufuncs, it seems `__self__` is defined for methods as well (at least, `np.add.reduce.__self__` gives `np.add`), but not a `__func__`. There is a `__name__` (="reduce"), though, which means that I think one can still retrieve what is needed (obviously, this also means `__array_ufunc__` could have been simpler...) -- Marten
![](https://secure.gravatar.com/avatar/93a76a800ef6c5919baa8ba91120ee98.jpg?s=120&d=mm&r=g)
On Thu, Jun 28, 2018 at 5:36 PM Eric Wieser <wieser.eric+numpy@gmail.com> wrote:
np.add.reduce == np.add.reduce # OK True
np.add.reduce is np.add.reduce # what?!? False
Maybe this is a bug? There's been some somewhat related discussion recently on python-dev: https://mail.python.org/pipermail/python-dev/2018-June/153959.html
![](https://secure.gravatar.com/avatar/209654202cde8ec709dee0a4d23c717d.jpg?s=120&d=mm&r=g)
Good catch, I think the latter failing is because np.add.reduce ends up calling np.ufunc.reduce.__get__(np.add), and builtin_function.__get__ doesn’t appear to do any caching. I suppose caching bound methods would just be a waste of time. == would work just fine in my suggestion above, it seems - irrespective of the resolution of the discussion on python-dev. Eric On Fri, 29 Jun 2018 at 18:24 Stephan Hoyer <shoyer@gmail.com> wrote:
![](https://secure.gravatar.com/avatar/851ff10fbb1363b7d6111ac60194cc1c.jpg?s=120&d=mm&r=g)
On Fri, Jun 29, 2018 at 9:54 PM, Eric Wieser <wieser.eric+numpy@gmail.com> wrote:
I think for implementers it might work easiest anyway to look up the ufunc itself in a dict or so and then check the name of the method. (At least, for my impementations of `__array_ufunc__`, it made a lot of sense to use the method in that way; possibly less so for the larger variety with other numpy functions). -- Marten
![](https://secure.gravatar.com/avatar/1198e2d145718c841565712312e04227.jpg?s=120&d=mm&r=g)
On 27. Jun 2018 at 07:48, Stephan Hoyer <shoyer@gmail.com> wrote: After much discussion (and the addition of three new co-authors!), I’m pleased to present a significantly revision of NumPy Enhancement Proposal 18: A dispatch mechanism for NumPy's high level array functions: http://www.numpy.org/neps/nep-0018-array-function-protocol.html The full text is also included below. Best, Stephan =========================================================== A dispatch mechanism for NumPy's high level array functions =========================================================== :Author: Stephan Hoyer <shoyer@google.com> :Author: Matthew Rocklin <mrocklin@gmail.com> :Author: Marten van Kerkwijk <mhvk@astro.utoronto.ca> :Author: Hameer Abbasi <hameerabbasi@yahoo.com> :Author: Eric Wieser <wieser.eric@gmail.com> :Status: Draft :Type: Standards Track :Created: 2018-05-29 Abstact ------- We propose the ``__array_function__`` protocol, to allow arguments of NumPy functions to define how that function operates on them. This will allow using NumPy as a high level API for efficient multi-dimensional array operations, even with array implementations that differ greatly from ``numpy.ndarray``. Detailed description -------------------- NumPy's high level ndarray API has been implemented several times outside of NumPy itself for different architectures, such as for GPU arrays (CuPy), Sparse arrays (scipy.sparse, pydata/sparse) and parallel arrays (Dask array) as well as various NumPy-like implementations in the deep learning frameworks, like TensorFlow and PyTorch. Similarly there are many projects that build on top of the NumPy API for labeled and indexed arrays (XArray), automatic differentiation (Autograd, Tangent), masked arrays (numpy.ma), physical units (astropy.units, pint, unyt), etc. that add additional functionality on top of the NumPy API. Most of these project also implement a close variation of NumPy's level high API. We would like to be able to use these libraries together, for example we would like to be able to place a CuPy array within XArray, or perform automatic differentiation on Dask array code. This would be easier to accomplish if code written for NumPy ndarrays could also be used by other NumPy-like projects. For example, we would like for the following code example to work equally well with any NumPy-like array object: .. code:: python def f(x): y = np.tensordot(x, x.T) return np.mean(np.exp(y)) Some of this is possible today with various protocol mechanisms within NumPy. - The ``np.exp`` function checks the ``__array_ufunc__`` protocol - The ``.T`` method works using Python's method dispatch - The ``np.mean`` function explicitly checks for a ``.mean`` method on the argument However other functions, like ``np.tensordot`` do not dispatch, and instead are likely to coerce to a NumPy array (using the ``__array__``) protocol, or err outright. To achieve enough coverage of the NumPy API to support downstream projects like XArray and autograd we want to support *almost all* functions within NumPy, which calls for a more reaching protocol than just ``__array_ufunc__``. We would like a protocol that allows arguments of a NumPy function to take control and divert execution to another function (for example a GPU or parallel implementation) in a way that is safe and consistent across projects. Implementation -------------- We propose adding support for a new protocol in NumPy, ``__array_function__``. This protocol is intended to be a catch-all for NumPy functionality that is not covered by the ``__array_ufunc__`` protocol for universal functions (like ``np.exp``). The semantics are very similar to ``__array_ufunc__``, except the operation is specified by an arbitrary callable object rather than a ufunc instance and method. A prototype implementation can be found in `this notebook < https://nbviewer.jupyter.org/gist/shoyer/1f0a308a06cd96df20879a1ddb8f0006
`_.
The interface ~~~~~~~~~~~~~ We propose the following signature for implementations of ``__array_function__``: .. code-block:: python def __array_function__(self, func, types, args, kwargs) - ``func`` is an arbitrary callable exposed by NumPy's public API, which was called in the form ``func(*args, **kwargs)``. - ``types`` is a ``frozenset`` of unique argument types from the original NumPy function call that implement ``__array_function__``. - The tuple ``args`` and dict ``kwargs`` are directly passed on from the original call. Unlike ``__array_ufunc__``, there are no high-level guarantees about the type of ``func``, or about which of ``args`` and ``kwargs`` may contain objects implementing the array API. As a convenience for ``__array_function__`` implementors, ``types`` provides all argument types with an ``'__array_function__'`` attribute. This allows downstream implementations to quickly determine if they are likely able to support the operation. A ``frozenset`` is used to ensure that ``__array_function__`` implementations cannot rely on the iteration order of ``types``, which would facilitate violating the well-defined "Type casting hierarchy" described in `NEP-13 <https://www.numpy.org/neps/nep-0013-ufunc-overrides.html>`_. Example for a project implementing the NumPy API ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Most implementations of ``__array_function__`` will start with two checks: 1. Is the given function something that we know how to overload? 2. Are all arguments of a type that we know how to handle? If these conditions hold, ``__array_function__`` should return the result from calling its implementation for ``func(*args, **kwargs)``. Otherwise, it should return the sentinel value ``NotImplemented``, indicating that the function is not implemented by these types. This is preferable to raising ``TypeError`` directly, because it gives *other* arguments the opportunity to define the operations. There are no general requirements on the return value from ``__array_function__``, although most sensible implementations should probably return array(s) with the same type as one of the function's arguments. If/when Python gains `typing support for protocols <https://www.python.org/dev/peps/pep-0544/>`_ and NumPy adds static type annotations, the ``@overload`` implementation for ``SupportsArrayFunction`` will indicate a return type of ``Any``. It may also be convenient to define a custom decorators (``implements`` below) for registering ``__array_function__`` implementations. .. code:: python HANDLED_FUNCTIONS = {} class MyArray: def __array_function__(self, func, types, args, kwargs): if func not in HANDLED_FUNCTIONS: return NotImplemented # Note: this allows subclasses that don't override # __array_function__ to handle MyArray objects if not all(issubclass(t, MyArray) for t in types): return NotImplemented return HANDLED_FUNCTIONS[func](*args, **kwargs) def implements(numpy_function): """Register an __array_function__ implementation for MyArray objects.""" def decorator(func): HANDLED_FUNCTIONS[numpy_function] = func return func return decorator @implements(np.concatenate) def concatenate(arrays, axis=0, out=None): ... # implementation of concatenate for MyArray objects @implements(np.broadcast_to) def broadcast_to(array, shape): ... # implementation of broadcast_to for MyArray objects Note that it is not required for ``__array_function__`` implementations to include *all* of the corresponding NumPy function's optional arguments (e.g., ``broadcast_to`` above omits the irrelevant ``subok`` argument). Optional arguments are only passed in to ``__array_function__`` if they were explicitly used in the NumPy function call. Necessary changes within the NumPy codebase itself ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ This will require two changes within the NumPy codebase: 1. A function to inspect available inputs, look for the ``__array_function__`` attribute on those inputs, and call those methods appropriately until one succeeds. This needs to be fast in the common all-NumPy case, and have acceptable performance (no worse than linear time) even if the number of overloaded inputs is large (e.g., as might be the case for `np.concatenate`). This is one additional function of moderate complexity. 2. Calling this function within all relevant NumPy functions. This affects many parts of the NumPy codebase, although with very low complexity. Finding and calling the right ``__array_function__`` ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ Given a NumPy function, ``*args`` and ``**kwargs`` inputs, we need to search through ``*args`` and ``**kwargs`` for all appropriate inputs that might have the ``__array_function__`` attribute. Then we need to select among those possible methods and execute the right one. Negotiating between several possible implementations can be complex. Finding arguments ''''''''''''''''' Valid arguments may be directly in the ``*args`` and ``**kwargs``, such as in the case for ``np.tensordot(left, right, out=out)``, or they may be nested within lists or dictionaries, such as in the case of ``np.concatenate([x, y, z])``. This can be problematic for two reasons: 1. Some functions are given long lists of values, and traversing them might be prohibitively expensive. 2. Some functions may have arguments that we don't want to inspect, even if they have the ``__array_function__`` method. To resolve these issues, NumPy functions should explicitly indicate which of their arguments may be overloaded, and how these arguments should be checked. As a rule, this should include all arguments documented as either ``array_like`` or ``ndarray``. We propose to do so by writing "dispatcher" functions for each overloaded NumPy function: - These functions will be called with the exact same arguments that were passed into the NumPy function (i.e., ``dispatcher(*args, **kwargs)``), and should return an iterable of arguments to check for overrides. - Dispatcher functions are required to share the exact same positional, optional and keyword-only arguments as their corresponding NumPy functions. Otherwise, valid invocations of a NumPy function could result in an error when calling its dispatcher. - Because default *values* for keyword arguments do not have ``__array_function__`` attributes, by convention we set all default argument values to ``None``. This reduces the likelihood of signatures falling out of sync, and minimizes extraneous information in the dispatcher. The only exception should be cases where the argument value in some way effects dispatching, which should be rare. An example of the dispatcher for ``np.concatenate`` may be instructive: .. code:: python def _concatenate_dispatcher(arrays, axis=None, out=None): for array in arrays: yield array if out is not None: yield out The concatenate dispatcher is written as generator function, which allows it to potentially include the value of the optional ``out`` argument without needing to create a new sequence with the (potentially long) list of objects to be concatenated. Trying ``__array_function__`` methods until the right one works ''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''' Many arguments may implement the ``__array_function__`` protocol. Some of these may decide that, given the available inputs, they are unable to determine the correct result. How do we call the right one? If several are valid then which has precedence? For the most part, the rules for dispatch with ``__array_function__`` match those for ``__array_ufunc__`` (see `NEP-13 <https://www.numpy.org/neps/nep-0013-ufunc-overrides.html>`_). In particular: - NumPy will gather implementations of ``__array_function__`` from all specified inputs and call them in order: subclasses before superclasses, and otherwise left to right. Note that in some edge cases involving subclasses, this differs slightly from the `current behavior <https://bugs.python.org/issue30140>`_ of Python. - Implementations of ``__array_function__`` indicate that they can handle the operation by returning any value other than ``NotImplemented``. - If all ``__array_function__`` methods return ``NotImplemented``, NumPy will raise ``TypeError``. One deviation from the current behavior of ``__array_ufunc__`` is that NumPy will only call ``__array_function__`` on the *first* argument of each unique type. This matches Python's `rule for calling reflected methods < https://docs.python.org/3/reference/datamodel.html#object.__ror__>`_, and this ensures that checking overloads has acceptable performance even when there are a large number of overloaded arguments. To avoid long-term divergence between these two dispatch protocols, we should `also update <https://github.com/numpy/numpy/issues/11306>`_ ``__array_ufunc__`` to match this behavior. Special handling of ``numpy.ndarray`` ''''''''''''''''''''''''''''''''''''' The use cases for subclasses with ``__array_function__`` are the same as those with ``__array_ufunc__``, so ``numpy.ndarray`` should also define a ``__array_function__`` method mirroring ``ndarray.__array_ufunc__``: .. code:: python def __array_function__(self, func, types, args, kwargs): # Cannot handle items that have __array_function__ other than our own. for t in types: if (hasattr(t, '__array_function__') and t.__array_function__ is not ndarray.__array_function__): return NotImplemented # Arguments contain no overrides, so we can safely call the # overloaded function again. return func(*args, **kwargs) To avoid infinite recursion, the dispatch rules for ``__array_function__`` need also the same special case they have for ``__array_ufunc__``: any arguments with an ``__array_function__`` method that is identical to ``numpy.ndarray.__array_function__`` are not be called as ``__array_function__`` implementations. Changes within NumPy functions ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ Given a function defining the above behavior, for now call it ``try_array_function_override``, we now need to call that function from within every relevant NumPy function. This is a pervasive change, but of fairly simple and innocuous code that should complete quickly and without effect if no arguments implement the ``__array_function__`` protocol. In most cases, these functions should written using the ``array_function_dispatch`` decorator, which also associates dispatcher functions: .. code:: python def array_function_dispatch(dispatcher): """Wrap a function for dispatch with the __array_function__ protocol.""" def decorator(func): @functools.wraps(func) def new_func(*args, **kwargs): relevant_arguments = dispatcher(*args, **kwargs) success, value = try_array_function_override( new_func, relevant_arguments, args, kwargs) if success: return value return func(*args, **kwargs) return new_func return decorator # example usage def _broadcast_to_dispatcher(array, shape, subok=None, **ignored_kwargs): return (array,) @array_function_dispatch(_broadcast_to_dispatcher) def broadcast_to(array, shape, subok=False): ... # existing definition of np.broadcast_to Using a decorator is great! We don't need to change the definitions of existing NumPy functions, and only need to write a few additional lines for the dispatcher function. We could even reuse a single dispatcher for families of functions with the same signature (e.g., ``sum`` and ``prod``). For such functions, the largest change could be adding a few lines to the docstring to note which arguments are checked for overloads. It's particularly worth calling out the decorator's use of ``functools.wraps``: - This ensures that the wrapped function has the same name and docstring as the wrapped NumPy function. - On Python 3, it also ensures that the decorator function copies the original function signature, which is important for introspection based tools such as auto-complete. If we care about preserving function signatures on Python 2, for the `short while longer < http://www.numpy.org/neps/nep-0014-dropping-python2.7-proposal.html>`_ that NumPy supports Python 2.7, we do could do so by adding a vendored dependency on the (single-file, BSD licensed) `decorator library <https://github.com/micheles/decorator>`_. - Finally, it ensures that the wrapped function `can be pickled < http://gael-varoquaux.info/programming/decoration-in-python-done-right-decor...
`_.
In a few cases, it would not make sense to use the ``array_function_dispatch`` decorator directly, but override implementation in terms of ``try_array_function_override`` should still be straightforward. - Functions written entirely in C (e.g., ``np.concatenate``) can't use decorators, but they could still use a C equivalent of ``try_array_function_override``. If performance is not a concern, they could also be easily wrapped with a small Python wrapper. - The ``__call__`` method of ``np.vectorize`` can't be decorated with <p style="margin:0px;font-stretch:normal;font-size:17.4px;line- _______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@python.org https://mail.python.org/mailman/listinfo/numpy-discussion I would like to propose that we use `__array_function` in the following manner for functions that create arrays: - `array_reference` for indicating the “reference array” whose `__array_function__` implementation will be called. For example, `np.arange(5, array_reference=some_dask_array)`. - I use a reference in the design rather than a type because for some arrays (such as Dask), chunk sizes or other reference data is needed to make this work. I realise that this is a big design decision, so I welcome any input! Best Regards, Hameer Abbasi Sent from Astro <https://www.helloastro.com> for Mac
![](https://secure.gravatar.com/avatar/1198e2d145718c841565712312e04227.jpg?s=120&d=mm&r=g)
Hi Martin, It is. The point of the proposed feature was to handle array generation mechanisms, that don't take an array as input in the standard NumPy API. Giving them a reference handles both the dispatch and the decision about which implementation to call. I'm confused: Isn't your reference array just `self`?
![](https://secure.gravatar.com/avatar/851ff10fbb1363b7d6111ac60194cc1c.jpg?s=120&d=mm&r=g)
Hi Hameer, It is. The point of the proposed feature was to handle array generation
Sorry, I had clearly misunderstood. It would indeed be nice for overrides to work on functions like `zeros` or `arange` as well, but it seems strange to change the signature just for that. As a possible alternative, should we perhaps generally check for overrides on `dtype`? All the best, Marten
![](https://secure.gravatar.com/avatar/1198e2d145718c841565712312e04227.jpg?s=120&d=mm&r=g)
Hi Marten, Sorry, I had clearly misunderstood. It would indeed be nice for overrides to work on functions like `zeros` or `arange` as well, but it seems strange to change the signature just for that. As a possible alternative, should we perhaps generally check for overrides on `dtype`? While this very clearly makes sense for something like astropy, it has a few drawbacks: - Other duck arrays such as Dask need more information than just the dtype. For example, Dask needs chunk sizes, XArray needs axis labels, and pydata/sparse needs to know the type of the reference array in order to make one of the same type. The information in a reference array is a strict superset of information in the dtype. - There’s a need for a separate protocol, which might be a lot harder to work with for both NumPy and library authors. - Some things, like numpy.random.RandomState, don’t accept a dtype argument. As for your concern about changing the signature, it’s easy enough with a decorator. We’ll need a separate decorator for array generation functions. Something like: def array_generation_function(func): @functools.wraps(func) def wrapped(*args, **kwargs, array_reference=np._NoValue): if array_reference is not np._NoValue: success, result = try_array_function_override(wrapped, [array_reference], args, kwargs) if success: return result return func(*args, **kwargs) return wrapped Hameer Abbasi
![](https://secure.gravatar.com/avatar/851ff10fbb1363b7d6111ac60194cc1c.jpg?s=120&d=mm&r=g)
Hi Hameer, I think the override on `dtype` would work - after all, the override is checked before anything is done, so one can just pass in `self` if one wishes (or some helper class that contains both `self` and any desired further information. But, as you note, it would not cover everything, and your `array_reference` idea definitely makes things more uniform. Indeed, it would allow one to implement things like `np.zeros_like` using `np.zero`, which seems quite nice. Still, I'm not sure whether this should be included in the present NEP or is best done separately after, with a few concrete examples of where it would be useful. All the best, Marten On Sat, Jun 30, 2018 at 10:40 AM, Hameer Abbasi <einstein.edison@gmail.com> wrote:
![](https://secure.gravatar.com/avatar/1198e2d145718c841565712312e04227.jpg?s=120&d=mm&r=g)
Hi Marten, Still, I'm not sure whether this should be included in the present NEP or is best done separately after, with a few concrete examples of where it would be useful. There already are concrete examples from Dask and CuPy, and this is currently a blocker for them, which is part of the reason I’m pushing so hard for it. See #11074 <https://github.com/numpy/numpy/issues/11074> for a context, and I think it was part of the reason that inspired Matt and Stephan to write this protocol in the first place. Best Regards, Hameer Abbasi
![](https://secure.gravatar.com/avatar/93a76a800ef6c5919baa8ba91120ee98.jpg?s=120&d=mm&r=g)
On Sat, Jun 30, 2018 at 11:59 AM Hameer Abbasi <einstein.edison@gmail.com> wrote:
Overloading np.ones_like() is definitely in scope already. I’d love to see a generic way of doing random number generation, but I agree with Martin that I don’t see it fitting a naturally into this NEP. An invasive change to add an array_reference argument to a bunch of functions might indeed be worthy of its own NEP, but again I’m not convinced that’s actually the right approach. I’d rather add a few new functions like random_like, which is a small enough change that concensus on the list might be enough.
![](https://secure.gravatar.com/avatar/764323a14e554c97ab74177e0bce51d4.jpg?s=120&d=mm&r=g)
On Sat, Jun 30, 2018 at 12:14 PM Stephan Hoyer <shoyer@gmail.com> wrote:
random_like() seems very weird to me. It doesn't seem like a function that anyone actually wants. It seems like what people actually want is to be able to draw random numbers from any distribution as a specified array-like type and shape, not just sample U(0, 1) with the shape of an existing array. The most workable way to do this is to modify RandomGenerator (i.e. the new RandomState design)[1] to accept the array-like type in the class constructor, and modify its internals to do the right thing. Because the intrusion on the API is so small, that doesn't require a NEP, just a PR (a long, complicated, and tedious PR, to be sure)[2]. There are a bunch of technical issues (if you want to avoid memory copies) because the Cython implementation requires direct memory access, but that's intrinsic to any solution to this problem, regardless of the API choices. random_like() would have the same issues. [1] https://github.com/bashtage/randomgen [2] Sorry, Kevin. -- Robert Kern
![](https://secure.gravatar.com/avatar/93a76a800ef6c5919baa8ba91120ee98.jpg?s=120&d=mm&r=g)
On Tue, Jun 26, 2018 at 11:27 PM Hameer Abbasi <einstein.edison@gmail.com> wrote:
These are somewhat similar to the existing ones_like, zeros_like and full_like. My inclination would be to consider adding new functions/methods for these rather than a new argument, e.g., arange_like() and random_like(), which could then use the standard __array_function__ dispatching mechanism. But this is pretty orthogonal to the design of __array_function__ either way, so I think we could safely defer this to another NEP (which could be pretty short!). One concern this does raise is how to handle methods like those on RandomState, even though methods like random_like() don't currently exist. Distribution objects from scipy.stats could have similar use cases. So perhaps it's worth "future proofing" the interface by passing `obj` and `method` to __array_function__ rather than only `func`. It is slower to call a func via func.__call__ than func, but only very marginally (~100 ns in my tests).
![](https://secure.gravatar.com/avatar/851ff10fbb1363b7d6111ac60194cc1c.jpg?s=120&d=mm&r=g)
On Wed, Jun 27, 2018 at 3:50 PM, Stephan Hoyer <shoyer@gmail.com> wrote: <snip>
That would make it more similar yet to `__array_ufunc__`, but I'm not sure how useful it is, as you cannot generically assume the methods have the same arguments and hence they need their own dispatcher. Once you're there you might as well pass them on directly (since any callable can be used as the function). Indeed, for `__array_ufunc__`, this might not have been a bad idea either... -- Marten
![](https://secure.gravatar.com/avatar/1198e2d145718c841565712312e04227.jpg?s=120&d=mm&r=g)
I think the usefulness of this feature is actually needed. Consider `np.random.RandomState`. If we were to add what I proposed, the two could work very nicely to (for example) do things like creating Dask random arrays, from RandomState objects. For reproducibility, Dask could generate multiple RandomState objects with a seed sequential in the job numbers. Looping in Matt Rocklin for this — He might have some input about the design. Best Regards, Hameer Abbasi Sent from Astro <https://www.helloastro.com> for Mac On 28. Jun 2018 at 14:37, Marten van Kerkwijk <m.h.vankerkwijk@gmail.com> wrote: On Wed, Jun 27, 2018 at 3:50 PM, Stephan Hoyer <shoyer@gmail.com> wrote: <snip>
That would make it more similar yet to `__array_ufunc__`, but I'm not sure how useful it is, as you cannot generically assume the methods have the same arguments and hence they need their own dispatcher. Once you're there you might as well pass them on directly (since any callable can be used as the function). Indeed, for `__array_ufunc__`, this might not have been a bad idea either... -- Marten _______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@python.org https://mail.python.org/mailman/listinfo/numpy-discussion
![](https://secure.gravatar.com/avatar/93a76a800ef6c5919baa8ba91120ee98.jpg?s=120&d=mm&r=g)
On Wed, Jun 27, 2018 at 12:50 PM Stephan Hoyer <shoyer@gmail.com> wrote:
I did a little more digging, and turned up the __self__ and __func__ attributes of bound methods: https://stackoverflow.com/questions/4679592/how-to-find-instance-of-a-bound-... So we might need another decorator function, but it seems that the current interface would actually suffice just fine for overriding methods. I'll update the NEP with some examples. It will look something like: def __array_function__(self, func, types, args, kwargs): ... if isinstance(func, types.MethodType): object = func.__self__ unbound_func = func.__func__ ... Given that functions are the most common case, I think it's best to keep with `func` as the main interface, but it's good to know that this does not preclude overriding methods.
![](https://secure.gravatar.com/avatar/851ff10fbb1363b7d6111ac60194cc1c.jpg?s=120&d=mm&r=g)
For C classes like the ufuncs, it seems `__self__` is defined for methods as well (at least, `np.add.reduce.__self__` gives `np.add`), but not a `__func__`. There is a `__name__` (="reduce"), though, which means that I think one can still retrieve what is needed (obviously, this also means `__array_ufunc__` could have been simpler...) -- Marten
![](https://secure.gravatar.com/avatar/93a76a800ef6c5919baa8ba91120ee98.jpg?s=120&d=mm&r=g)
On Thu, Jun 28, 2018 at 5:36 PM Eric Wieser <wieser.eric+numpy@gmail.com> wrote:
np.add.reduce == np.add.reduce # OK True
np.add.reduce is np.add.reduce # what?!? False
Maybe this is a bug? There's been some somewhat related discussion recently on python-dev: https://mail.python.org/pipermail/python-dev/2018-June/153959.html
![](https://secure.gravatar.com/avatar/209654202cde8ec709dee0a4d23c717d.jpg?s=120&d=mm&r=g)
Good catch, I think the latter failing is because np.add.reduce ends up calling np.ufunc.reduce.__get__(np.add), and builtin_function.__get__ doesn’t appear to do any caching. I suppose caching bound methods would just be a waste of time. == would work just fine in my suggestion above, it seems - irrespective of the resolution of the discussion on python-dev. Eric On Fri, 29 Jun 2018 at 18:24 Stephan Hoyer <shoyer@gmail.com> wrote:
![](https://secure.gravatar.com/avatar/851ff10fbb1363b7d6111ac60194cc1c.jpg?s=120&d=mm&r=g)
On Fri, Jun 29, 2018 at 9:54 PM, Eric Wieser <wieser.eric+numpy@gmail.com> wrote:
I think for implementers it might work easiest anyway to look up the ufunc itself in a dict or so and then check the name of the method. (At least, for my impementations of `__array_ufunc__`, it made a lot of sense to use the method in that way; possibly less so for the larger variety with other numpy functions). -- Marten
participants (6)
-
Eric Wieser
-
Hameer Abbasi
-
Marten van Kerkwijk
-
Matti Picus
-
Robert Kern
-
Stephan Hoyer