__skip_array_function__ discussion summary
Hi all,
This is an attempt from me to wrap up the discussion a bit so that others can chime in if they want to.
NumPy 1.17 will ship with `__array_function__` a way for array like projects (dask, cupy) to override almost all numpy functions [0]. This addition is uncontroversial. NumPy 1.17 will _not_ ship with the `__skip_array_funciton__` following a longer dicussion. For those interested, I tried to give an very short overview over the topic below.
The discussion here is around the addition of `__skip_array_function__` which would allow code to use:
np.ones_like.__skip_array_function__(*args)
to reuse the current implementation in numpy (i.e. directly call the current code). This can simplify things drastically for some array likes, since they do not have to provide an alternative implementation. However, PR13585 [1] sparked a more detailed discussion, since it was going to add the use of `__skip_array_function__` internally in numpy [2].
The issue is exposure of implementation details. If we do not use it internally, a user may implement their own `np.empty_like` and rely on `np.ones_like` to use `np.empty_like` [3] internally. Thus, `np.ones_like(my_array_like)` can work without `my_array_like` having any special code for `np.ones_like`.
The PR exposes the issue that if `np.ones_like` is changed to call `np.empty_like.__skip_array_function__` internally, this will break the users `my_array_like` (it will not call their own `np.empty_like` implementation.
We could expect users to fix up such breaking changes, but it exposes how fragile the interaction of user types using `__skip_array_function__` and changes in the specific implementation used by numpy can be in some cases.
The second option would be to make sure we use `__skip_array_function__` internally, so that users cannot expect `np.ones_like` to work because they made `np.empty_like` work in the above example (does not increase the "API surface" of NumPy).
Plus it increases the issue that the numpy code itself is less readable if we use `__skip_array_function__` internally in many/all places.
Those two options further have very different goals in mind for the final usage of the protocol. So that right now the solution is to step back, not include the addition and rather gain experience with the NumPy 1.17 release that includes `__array_function__` but not `__skip_array_function`.
I hope this may help those interested who did not follow the full discussion, can't say I feel I am very good at summarizing. For details I encourage you to have a look at the PR discussion and the recent mails to the list.
Best,
Sebastian
[0] http://www.numpy.org/neps/nep0018arrayfunctionprotocol.html#implementati... [1] https://github.com/numpy/numpy/pull/13585 [2] Mostly for slight optimization. [3] It also uses `np.copyto` which can be overridden as well.
On Thu, 23 May 2019 14:33:17 0700, Sebastian Berg wrote:
Those two options further have very different goals in mind for the final usage of the protocol. So that right now the solution is to step back, not include the addition and rather gain experience with the NumPy 1.17 release that includes `__array_function__` but not `__skip_array_function`.
To emphasize how this solves the API exposure problem:
If `__skip_array_function__` is being made available, the user can implement `ones_like` for their custom class as:
class MyArray: def __array_function__(func, types, *args, **kwargs): if func == np.ones_like: return np.ones_like.__skip_array_function__(x)
Without it, they are forced to reimplement `ones_like` from scratch. This ensures that they never rely on any internal behavior of `np.ones_like`, which may change at any time to break for their custom array class.
Here's a concrete example:
The user wants to override `ones_like` and `zeros_like` for their custom array. They implement it as follows:
class MyArray: def __array_function__(func, types, *args, **kwargs): if func == np.ones_like: return np.ones_like.__skip_array_function__(*args, **kwargs) elif func == np.zeros_like: return MyArray(...)
Would this work? Well, it depends on how NumPy implements `ones_like` internally. If NumPy used `__skip_array_function__ consistently throughout, it would not work:
def np.ones_like(x): y = np.zeros_like.__skip_array_function__(x) y.fill(1) return y
If, instead, the implementation was
def np.ones_like(x): y = np.zeros_like(x) y.fill(1) return y
it would work. *BUT*, it would be brittle, because our internal implementation may easily change to:
def np.ones_like(x): y = np.empty_like(x) y.fill(1) return y
And if `empty_like` isn't implemented by MyArray, this would break.
The workaround that Stephan Hoyer mentioned (and that will have to be used in 1.17) is that you can still use the NumPy machinery to operate on pure arrays:
class MyArray: def __array_function__(func, types, *args, **kwargs): if func == np.ones_like: x_arr = np.asarray(x) ones = np.ones_like(x_arr) return MyArray.from_array(ones)
Stéfan
Hi Sebastian, Stéfan,
Thanks for the very good summaries!
An additional item worth mentioning is that by using `__skip_array_function__` everywhere inside, one minimizes the performance penalty of checking for `__array_function__`. It would obviously be worth trying to do that, but ideally in a way that is much less intrusive.
Furthermore, it became clear that there were different pictures of the final goal, with quite a bit of discussion about the relevant benefits of trying the limit exposure of the internal API and of, conversely, trying to (incrementally) move to implementations that are maximally reusable (using ducktyping), which are themselves based around a smaller core (more in line with Nathaniel's NEP22).
In the latter respect, Stéfan's example is instructive. The real implementation of `ones_like` is: ``` def ones_like(a, dtype=None, order='K', subok=True, shape=None): res = empty_like(a, dtype=dtype, order=order, subok=subok, shape=shape) multiarray.copyto(res, 1, casting='unsafe') return res ```
The first step is here seems obvious: an "empty_like" function would seem to belong in the core. The second step less so: Stéfan's `res.fill(1)` seems more logical, as surely a class's method is the optimal way to do something. Though I do feel `.fill` itself breaks "There should be one and preferably only one obvious way to do it." So, I'd want to replace it with `res[...] = 1`, so that one relies on the more obvious `__setitem__`. (Note that all are equally fast even now.)
Of course, in this idealized future, there would be little reason to even allow `ones_like` to be overridden with __array_function__...
All the best,
Marten
Sebastian, Stefan and Marten  thanks for the excellent summaries of the discussion.
In line with this consensus, I have drafted a revision of the NEP without __skip_array_function__: https://github.com/numpy/numpy/pull/13624
On Thu, May 23, 2019 at 5:28 PM Marten van Kerkwijk < m.h.vankerkwijk@gmail.com> wrote:
Hi Sebastian, Stéfan,
Thanks for the very good summaries!
An additional item worth mentioning is that by using `__skip_array_function__` everywhere inside, one minimizes the performance penalty of checking for `__array_function__`. It would obviously be worth trying to do that, but ideally in a way that is much less intrusive.
Furthermore, it became clear that there were different pictures of the final goal, with quite a bit of discussion about the relevant benefits of trying the limit exposure of the internal API and of, conversely, trying to (incrementally) move to implementations that are maximally reusable (using ducktyping), which are themselves based around a smaller core (more in line with Nathaniel's NEP22).
In the latter respect, Stéfan's example is instructive. The real implementation of `ones_like` is:
def ones_like(a, dtype=None, order='K', subok=True, shape=None): res = empty_like(a, dtype=dtype, order=order, subok=subok, shape=shape) multiarray.copyto(res, 1, casting='unsafe') return res
The first step is here seems obvious: an "empty_like" function would seem to belong in the core. The second step less so: Stéfan's `res.fill(1)` seems more logical, as surely a class's method is the optimal way to do something. Though I do feel `.fill` itself breaks "There should be one and preferably only one obvious way to do it." So, I'd want to replace it with `res[...] = 1`, so that one relies on the more obvious `__setitem__`. (Note that all are equally fast even now.)
Of course, in this idealized future, there would be little reason to even allow `ones_like` to be overridden with __array_function__...
All the best,
Marten _______________________________________________ NumPyDiscussion mailing list NumPyDiscussion@python.org https://mail.python.org/mailman/listinfo/numpydiscussion
participants (4)

Marten van Kerkwijk

Sebastian Berg

Stefan van der Walt

Stephan Hoyer