Implementations of ndarray.__array_function__ (and ndarray.__array_ufunc__)
Hi again, Another thought about __array_function__, this time about the implementation for ndarray. In it, we currently check whether any of the types define a (different) __array_function__, and, if so, give up. This seems too strict: I think that, at least in principle, subclasses should be allowed through even if they override __array_function__. This thought was triggered by Travis pointing to the Liskov substitution principle [1], that code written for a given type should just work on a (properly written) subclass. This suggests `ndarray` should not exclude subclasses even if they override __array_function__, since if the subclass does not work that way, it can already ensure an error is raised since it knows it is called first. Indeed, this is also how python itself works: if, eg., I subclass list as follows: ``` class MyList(list): def __radd__(self, other): return NotImplemented ``` then any `list + mylist` will just concatenate the lists, even though `MyList.__radd__` explicitly tells it cannot do it (it returning `NotImplemented` means that `list.__add__` gets a change). The reason that we do not already follow this logic may be that currently `ndarray.__array_function__` ends by calling the public function, which will lead to infinite recursion if there is a subclass that overrides __array_function__ and returns NotImplemented. However, inside ndarray.__array_function__, there is no real reason to call the public function - one might as well just call the implementation, in which case this is not a problem. Does the above make sense? I realize that the same would be true for `__array_ufunc__`, though there the situation is slightly trickier since it is not as easy to bypass any further override checks. Nevertheless, it does seem like it would be correct to do the same there. (And if we agree this is the case, I'd quite happily implement it -- with the merger of multiarray and umath it has become much easier to do.) All the best, Marten [1] https://en.wikipedia.org/wiki/Liskov_substitution_principle
On Sun, Nov 4, 2018 at 8:45 AM Marten van Kerkwijk < m.h.vankerkwijk@gmail.com> wrote:
Does the above make sense? I realize that the same would be true for `__array_ufunc__`, though there the situation is slightly trickier since it is not as easy to bypass any further override checks. Nevertheless, it does seem like it would be correct to do the same there. (And if we agree this is the case, I'd quite happily implement it -- with the merger of multiarray and umath it has become much easier to do.)
Marten actually implemented a draft version of this already in https://github.com/numpy/numpy/pull/12328 :). I found reading over the PR helpful for understand this proposal. I guess the practical import of this change is that it makes it (much?) easier to write __array_function__ for ndarray subclasses: if there's a function where NumPy's default function works fine, you don't need to bother with returning anything other than NotImplemented from __array_function__. It's sort of like NotImplementedButCoercible, but only for ndarray subclasses. One minor downside is that this might make it harder to eventually deprecate and/or contemplate removing checks for 'mean' methods in functions like np.mean(), because __array_function__ implementers might still be relying on this. But so far, I think this makes sense. The PR includes additional changes to np.core.overrides, but I'm not sure if those are actually required here (or rather only possible due to this change). I guess they are needed if you want to be able to count on ndarray.__array_function__ being called after subclass __array_function__ methods. I'm not sure I like this part: it means that ndarray.__array_function__ actually gets called when other arguments implement __array_function__. For interactions with objects that aren't ndarray subclasses this is entirely pointless and would unnecessarily slow things down, since ndarray._array_function__ will always return NotImplemented.
On Sun, Nov 4, 2018 at 8:57 PM Stephan Hoyer <shoyer@gmail.com> wrote:
On Sun, Nov 4, 2018 at 8:45 AM Marten van Kerkwijk < m.h.vankerkwijk@gmail.com> wrote:
Does the above make sense? I realize that the same would be true for `__array_ufunc__`, though there the situation is slightly trickier since it is not as easy to bypass any further override checks. Nevertheless, it does seem like it would be correct to do the same there. (And if we agree this is the case, I'd quite happily implement it -- with the merger of multiarray and umath it has become much easier to do.)
Marten actually implemented a draft version of this already in https://github.com/numpy/numpy/pull/12328 :). I found reading over the PR helpful for understand this proposal.
I guess the practical import of this change is that it makes it (much?) easier to write __array_function__ for ndarray subclasses: if there's a function where NumPy's default function works fine, you don't need to bother with returning anything other than NotImplemented from __array_function__. It's sort of like NotImplementedButCoercible, but only for ndarray subclasses.
Yes, return NotImplemented if there is another array, or, even simpler, just call super. Note that it is not quite like `NotImplementedButCoercible`, since no actual coercion to ndarray would necessarily be needed - with adherence to the Liskov substitution principle, the subclass might stay intact (if only partially initialized).
One minor downside is that this might make it harder to eventually deprecate and/or contemplate removing checks for 'mean' methods in functions like np.mean(), because __array_function__ implementers might still be relying on this.
I think this is somewhat minor indeed, since we can (and should) insist that subclasses here properly behave as subclasses, so if an ndarray-specific implementation breaks a subclass, that might well indicate that the subclass is not quite good enough (and we can now point out there is a way to override the function). It might also indicate that the code itself could be better - that would be a win. But so far, I think this makes sense.
The PR includes additional changes to np.core.overrides, but I'm not sure if those are actually required here (or rather only possible due to this change). I guess they are needed if you want to be able to count on ndarray.__array_function__ being called after subclass __array_function__ methods.
It is mostly a transfer of functionality from `get_override_types_and_args` to the place where the implementation is decided upon. Perhaps more logical even if we do not pursue this.
I'm not sure I like this part: it means that ndarray.__array_function__ actually gets called when other arguments implement __array_function__. For interactions with objects that aren't ndarray subclasses this is entirely pointless and would unnecessarily slow things down, since ndarray._array_function__ will always return NotImplemented.
Agreed here. I did in fact think about it, but wasn't sure (and didn't have time to think how to check) that the gain in time for cases where an ndarray comes before the relevant array mimic (and there thus a needless call to ndarray.__array_function__ can be prevented) was worth it compared to the cost of attempting to do the removal for cases where the array mimic came first or where there was no regular ndarray in the first place. But I think this is an implementation detail; for now, let me add a note to the PR about it. All the best, Marten
participants (2)
-
Marten van Kerkwijk
-
Stephan Hoyer