Adding to the non-dispatched implementation of NumPy methods
Hi everyone, We have a proposed a revision to NEP-18 (__array_function__). The proposal is for an adding an alias to the non-dispatched version of NumPy array functions in the __numpy_implementaiton__ function attribute: https://github.com/numpy/numpy/pull/13305 I believe this attribute improves the protocol in three ways: 1. It provides a hook that __array_function__ methods can use to call implementation intended for NumPy arrays. This allows for "partial implementations" of NumPy's API, which turns out to useful even for some array libraries that reimplement nearly everything (namely, for CuPy and JAX). 2. It allows for fast access to the non-dispatching version of NumPy functions, e.g., np.concatenate.__numpy_implementation__(list_of_all_numpy_arrays). 3. Internally, the implementation of numpy.ndarray.__array_function__ now looks identical to how we encourage outside developers to write their own __array_function__ methods. The dispatching logic no longer includes a special case for NumPy arrays. Feedback would be greatly welcomed! Best, Stephan
On Mon, 15 Apr 2019 08:30:06 -0700, Stephan Hoyer wrote:
We have a proposed a revision to NEP-18 (__array_function__). The proposal is for an adding an alias to the non-dispatched version of NumPy array functions in the __numpy_implementation__ function attribute: https://github.com/numpy/numpy/pull/13305
To help others parsing through the comments in GitHub, this mailing list post is already a summary of all the comments up to https://github.com/numpy/numpy/pull/13305#issuecomment-483301211 I'm generally in favor: it makes sense that you should easily be able to access the original function when overriding it. Stéfan
What's the difference between np.concatenate.__numpy_implementation__(...) and np.ndarray.__array_function__(np.concatenate, ...) ? More generally, I guess I'm not quite clear on how to think about what the "no dispatch" version does, because obviously it doesn't make sense to have *no* dispatch. Instead it's something like "the legacy hard-coded dispatch"? On Mon, Apr 15, 2019, 08:30 Stephan Hoyer <shoyer@gmail.com> wrote:
Hi everyone,
We have a proposed a revision to NEP-18 (__array_function__). The proposal is for an adding an alias to the non-dispatched version of NumPy array functions in the __numpy_implementaiton__ function attribute: https://github.com/numpy/numpy/pull/13305
I believe this attribute improves the protocol in three ways: 1. It provides a hook that __array_function__ methods can use to call implementation intended for NumPy arrays. This allows for "partial implementations" of NumPy's API, which turns out to useful even for some array libraries that reimplement nearly everything (namely, for CuPy and JAX). 2. It allows for fast access to the non-dispatching version of NumPy functions, e.g., np.concatenate.__numpy_implementation__(list_of_all_numpy_arrays). 3. Internally, the implementation of numpy.ndarray.__array_function__ now looks identical to how we encourage outside developers to write their own __array_function__ methods. The dispatching logic no longer includes a special case for NumPy arrays.
Feedback would be greatly welcomed!
Best, Stephan _______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@python.org https://mail.python.org/mailman/listinfo/numpy-discussion
On Mon, Apr 15, 2019 at 1:21 PM Nathaniel Smith <njs@pobox.com> wrote:
What's the difference between
np.concatenate.__numpy_implementation__(...)
and
np.ndarray.__array_function__(np.concatenate, ...)
?
I can answer this technically, though this doesn't seem to be quite what you're looking for: - The former always succeed at dispatch, because it coerces all arguments to NumPy arrays. - The second will either return NotImplemented (if a non-NumPy arrays implements __array_function__), or give the same result as former.
More generally, I guess I'm not quite clear on how to think about what the "no dispatch" version does, because obviously it doesn't make sense to have *no* dispatch. Instead it's something like "the legacy hard-coded dispatch"?
__numpy_implementation__ means you skip __array_function__ dispath and call the original NumPy function. In practice, this means you get legacy hard-coded dispatch behavior in most cases, e.g., the result will always be in the form of NumPy array(s). It doesn't mean that the implementation always coerces all arguments to NumPy arrays. For example, np.result_type() will pull out of .dtype attributes off of its arguments, even without necessarily coercing its arguments to NumPy arrays. This strange version of "the implementation for NumPy arrays" turns out to be something that several libraries that want to implement __array_function__ want to be able to continue to use on their own array objects (namely, JAX and CuPy).
On Mon, Apr 15, 2019, 08:30 Stephan Hoyer <shoyer@gmail.com> wrote:
Hi everyone,
We have a proposed a revision to NEP-18 (__array_function__). The proposal is for an adding an alias to the non-dispatched version of NumPy array functions in the __numpy_implementaiton__ function attribute: https://github.com/numpy/numpy/pull/13305
I believe this attribute improves the protocol in three ways: 1. It provides a hook that __array_function__ methods can use to call implementation intended for NumPy arrays. This allows for "partial implementations" of NumPy's API, which turns out to useful even for some array libraries that reimplement nearly everything (namely, for CuPy and JAX). 2. It allows for fast access to the non-dispatching version of NumPy functions, e.g., np.concatenate.__numpy_implementation__(list_of_all_numpy_arrays). 3. Internally, the implementation of numpy.ndarray.__array_function__ now looks identical to how we encourage outside developers to write their own __array_function__ methods. The dispatching logic no longer includes a special case for NumPy arrays.
Feedback would be greatly welcomed!
Best, Stephan _______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@python.org https://mail.python.org/mailman/listinfo/numpy-discussion
_______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@python.org https://mail.python.org/mailman/listinfo/numpy-discussion
On Mon, Apr 15, 2019 at 4:39 PM Stephan Hoyer <shoyer@gmail.com> wrote:
On Mon, Apr 15, 2019 at 1:21 PM Nathaniel Smith <njs@pobox.com> wrote:
What's the difference between
np.concatenate.__numpy_implementation__(...)
and
np.ndarray.__array_function__(np.concatenate, ...)
?
I can answer this technically, though this doesn't seem to be quite what you're looking for: - The former always succeed at dispatch, because it coerces all arguments to NumPy arrays. - The second will either return NotImplemented (if a non-NumPy arrays implements __array_function__), or give the same result as former.
More generally, I guess I'm not quite clear on how to think about what the "no dispatch" version does, because obviously it doesn't make sense to have *no* dispatch. Instead it's something like "the legacy hard-coded dispatch"?
__numpy_implementation__ means you skip __array_function__ dispath and call the original NumPy function. In practice, this means you get legacy hard-coded dispatch behavior in most cases, e.g., the result will always be in the form of NumPy array(s).
It doesn't mean that the implementation always coerces all arguments to NumPy arrays. For example, np.result_type() will pull out of .dtype attributes off of its arguments, even without necessarily coercing its arguments to NumPy arrays. This strange version of "the implementation for NumPy arrays" turns out to be something that several libraries that want to implement __array_function__ want to be able to continue to use on their own array objects (namely, JAX and CuPy).
Microsoft's "open standard" [1] document format, OOXML, famously contains tags like "autoSpaceLikeWord95" and "useWord97LineBreakRules". If you want to correctly interpret a Word document, you have to know what these mean. (Unfortunately, the standard doesn't say.) Mostly I would like the definition for numpy 1.17's semantics to be internally coherent and self-contained. If the documentation for __numpy_implementation__ is just "it does whatever numpy 1.14 did", then that seems not so great. Is there any way to define __numpy_implementation__'s semantics without incorporating previous versions of numpy by reference? -n [1] https://en.wikipedia.org/wiki/Standardization_of_Office_Open_XML -- Nathaniel J. Smith -- https://vorpus.org
I thought this was simply a slot to store the NumPy version of the dispatched method, so that you could see easily call through to it and extend it. Stephan, was there a deeper intent here that I missed? Best regards, Stéfan On April 15, 2019 20:32:35 Nathaniel Smith <njs@pobox.com> wrote:
On Mon, Apr 15, 2019 at 4:39 PM Stephan Hoyer <shoyer@gmail.com> wrote:
On Mon, Apr 15, 2019 at 1:21 PM Nathaniel Smith <njs@pobox.com> wrote:
What's the difference between
np.concatenate.__numpy_implementation__(...)
and
np.ndarray.__array_function__(np.concatenate, ...)
?
I can answer this technically, though this doesn't seem to be quite what you're looking for: - The former always succeed at dispatch, because it coerces all arguments to NumPy arrays. - The second will either return NotImplemented (if a non-NumPy arrays implements __array_function__), or give the same result as former.
More generally, I guess I'm not quite clear on how to think about what the "no dispatch" version does, because obviously it doesn't make sense to have *no* dispatch. Instead it's something like "the legacy hard-coded dispatch"?
__numpy_implementation__ means you skip __array_function__ dispath and call the original NumPy function. In practice, this means you get legacy hard-coded dispatch behavior in most cases, e.g., the result will always be in the form of NumPy array(s).
It doesn't mean that the implementation always coerces all arguments to NumPy arrays. For example, np.result_type() will pull out of .dtype attributes off of its arguments, even without necessarily coercing its arguments to NumPy arrays. This strange version of "the implementation for NumPy arrays" turns out to be something that several libraries that want to implement __array_function__ want to be able to continue to use on their own array objects (namely, JAX and CuPy).
Microsoft's "open standard" [1] document format, OOXML, famously contains tags like "autoSpaceLikeWord95" and "useWord97LineBreakRules". If you want to correctly interpret a Word document, you have to know what these mean. (Unfortunately, the standard doesn't say.)
Mostly I would like the definition for numpy 1.17's semantics to be internally coherent and self-contained. If the documentation for __numpy_implementation__ is just "it does whatever numpy 1.14 did", then that seems not so great. Is there any way to define __numpy_implementation__'s semantics without incorporating previous versions of numpy by reference?
-n
[1] https://en.wikipedia.org/wiki/Standardization_of_Office_Open_XML
-- Nathaniel J. Smith -- https://vorpus.org _______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@python.org https://mail.python.org/mailman/listinfo/numpy-discussion
I somewhat share Nathaniel's worry that by providing `__numpy_implementation__` we essentially get stuck with the implementations we have currently, rather than having the hoped-for freedom to remove all the `np.asarray` coercion. In that respect, an advantage of using `_wrapped` is that it is clearly a private method, so anybody is automatically forewarned that this can change. In principle, ndarray.__array_function__ would be more logical, but as noted in the PR, the problem is that it is non-trivial for a regular __array_function__ implementation to coerce all the arguments to ndarray itself. Which suggests that perhaps what is missing is a general routine that does that, i.e., that re-uses the dispatcher. -- Marten
__numpy_implementation__ is indeed simply a slot for third-parties to access NumPy's implementation. It should be considered "NumPy's current implementation", not "NumPy's implementation as of 1.14". Of course, in practice these will remain very similar, because we are already very conservative about how we change NumPy. I would love to have clean well-defined coercion semantics for every NumPy function, which would be implicitly adopted by `__numpy_implementation__` (e.g., we could say that every function always coerces its arguments with `np.asarray()`). But I think that's an orthogonal issue. We have been supporting some ad-hoc duck typing in NumPy for a long time (e.g., the `.sum()` method which is called by `np.sum()`). Removing that would require a deprecation cycle, which may indeed be warranted once we're sure we're happy with __array_function__. But I don't think the deprecation cycle will be any worse if the implementation is also exposed via `__numpy_implementation__`. We should definitely still think about a cleaner "core" implementation of NumPy functions in terms of a minimal core. One recent example of this can be found JAX (see https://github.com/google/jax/blob/04b45e4086249bad691a33438e8bb6fcf639d001/...). This would be something appropriate to put into a more generic function attribute on NumPy functions, perhaps `__array_implementation__`. But I don't think formalizing `__numpy_implementation__` as a way to get access to NumPy's default implementation will limit our future options here. Cheers, Stephan On Tue, Apr 16, 2019 at 6:44 AM Marten van Kerkwijk < m.h.vankerkwijk@gmail.com> wrote:
I somewhat share Nathaniel's worry that by providing `__numpy_implementation__` we essentially get stuck with the implementations we have currently, rather than having the hoped-for freedom to remove all the `np.asarray` coercion. In that respect, an advantage of using `_wrapped` is that it is clearly a private method, so anybody is automatically forewarned that this can change.
In principle, ndarray.__array_function__ would be more logical, but as noted in the PR, the problem is that it is non-trivial for a regular __array_function__ implementation to coerce all the arguments to ndarray itself.
Which suggests that perhaps what is missing is a general routine that does that, i.e., that re-uses the dispatcher.
-- Marten _______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@python.org https://mail.python.org/mailman/listinfo/numpy-discussion
Are there still concerns here? If not, I would love to move ahead with these changes so we can get this into NumPy 1.17. On Tue, Apr 16, 2019 at 10:23 AM Stephan Hoyer <shoyer@gmail.com> wrote:
__numpy_implementation__ is indeed simply a slot for third-parties to access NumPy's implementation. It should be considered "NumPy's current implementation", not "NumPy's implementation as of 1.14". Of course, in practice these will remain very similar, because we are already very conservative about how we change NumPy.
I would love to have clean well-defined coercion semantics for every NumPy function, which would be implicitly adopted by `__numpy_implementation__` (e.g., we could say that every function always coerces its arguments with `np.asarray()`). But I think that's an orthogonal issue. We have been supporting some ad-hoc duck typing in NumPy for a long time (e.g., the `.sum()` method which is called by `np.sum()`). Removing that would require a deprecation cycle, which may indeed be warranted once we're sure we're happy with __array_function__. But I don't think the deprecation cycle will be any worse if the implementation is also exposed via `__numpy_implementation__`.
We should definitely still think about a cleaner "core" implementation of NumPy functions in terms of a minimal core. One recent example of this can be found JAX (see https://github.com/google/jax/blob/04b45e4086249bad691a33438e8bb6fcf639d001/...). This would be something appropriate to put into a more generic function attribute on NumPy functions, perhaps `__array_implementation__`. But I don't think formalizing `__numpy_implementation__` as a way to get access to NumPy's default implementation will limit our future options here.
Cheers, Stephan
On Tue, Apr 16, 2019 at 6:44 AM Marten van Kerkwijk < m.h.vankerkwijk@gmail.com> wrote:
I somewhat share Nathaniel's worry that by providing `__numpy_implementation__` we essentially get stuck with the implementations we have currently, rather than having the hoped-for freedom to remove all the `np.asarray` coercion. In that respect, an advantage of using `_wrapped` is that it is clearly a private method, so anybody is automatically forewarned that this can change.
In principle, ndarray.__array_function__ would be more logical, but as noted in the PR, the problem is that it is non-trivial for a regular __array_function__ implementation to coerce all the arguments to ndarray itself.
Which suggests that perhaps what is missing is a general routine that does that, i.e., that re-uses the dispatcher.
-- Marten _______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@python.org https://mail.python.org/mailman/listinfo/numpy-discussion
Your last email didn't really clarify anything for me. I get that np.func.__numpy_implementation__ is intended to have the semantics of numpy's implementation of func, but that doesn't tell me much :-). And also, that's exactly the definition of np.func, isn't it? You're talking about ~doubling the size of numpy's API, and don't seem able to even articulate what the new API's commitments are. This still makes me nervous. Maybe it should have a NEP? What's your testing strategy for all the new functions? On Mon, Apr 22, 2019, 09:22 Stephan Hoyer <shoyer@gmail.com> wrote:
Are there still concerns here? If not, I would love to move ahead with these changes so we can get this into NumPy 1.17.
On Tue, Apr 16, 2019 at 10:23 AM Stephan Hoyer <shoyer@gmail.com> wrote:
__numpy_implementation__ is indeed simply a slot for third-parties to access NumPy's implementation. It should be considered "NumPy's current implementation", not "NumPy's implementation as of 1.14". Of course, in practice these will remain very similar, because we are already very conservative about how we change NumPy.
I would love to have clean well-defined coercion semantics for every NumPy function, which would be implicitly adopted by `__numpy_implementation__` (e.g., we could say that every function always coerces its arguments with `np.asarray()`). But I think that's an orthogonal issue. We have been supporting some ad-hoc duck typing in NumPy for a long time (e.g., the `.sum()` method which is called by `np.sum()`). Removing that would require a deprecation cycle, which may indeed be warranted once we're sure we're happy with __array_function__. But I don't think the deprecation cycle will be any worse if the implementation is also exposed via `__numpy_implementation__`.
We should definitely still think about a cleaner "core" implementation of NumPy functions in terms of a minimal core. One recent example of this can be found JAX (see https://github.com/google/jax/blob/04b45e4086249bad691a33438e8bb6fcf639d001/...). This would be something appropriate to put into a more generic function attribute on NumPy functions, perhaps `__array_implementation__`. But I don't think formalizing `__numpy_implementation__` as a way to get access to NumPy's default implementation will limit our future options here.
Cheers, Stephan
On Tue, Apr 16, 2019 at 6:44 AM Marten van Kerkwijk < m.h.vankerkwijk@gmail.com> wrote:
I somewhat share Nathaniel's worry that by providing `__numpy_implementation__` we essentially get stuck with the implementations we have currently, rather than having the hoped-for freedom to remove all the `np.asarray` coercion. In that respect, an advantage of using `_wrapped` is that it is clearly a private method, so anybody is automatically forewarned that this can change.
In principle, ndarray.__array_function__ would be more logical, but as noted in the PR, the problem is that it is non-trivial for a regular __array_function__ implementation to coerce all the arguments to ndarray itself.
Which suggests that perhaps what is missing is a general routine that does that, i.e., that re-uses the dispatcher.
-- Marten _______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@python.org https://mail.python.org/mailman/listinfo/numpy-discussion
_______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@python.org https://mail.python.org/mailman/listinfo/numpy-discussion
On Mon, Apr 22, 2019 at 9:26 PM Nathaniel Smith <njs@pobox.com> wrote:
Your last email didn't really clarify anything for me. I get that np.func.__numpy_implementation__ is intended to have the semantics of numpy's implementation of func, but that doesn't tell me much :-). And also, that's exactly the definition of np.func, isn't it?
You're talking about ~doubling the size of numpy's API,
I think we can already get both the NEP 18 wrapped functions and their underlying implementations today, based on the value of NUMPY_EXPERIMENTAL_ARRAY_FUNCTION. It looks to me like all this proposed change does is bypass a do-very-little wrapper. and don't seem able to even articulate what the new API's commitments are.
This still makes me nervous. Maybe it should have a NEP? What's your testing strategy for all the new functions?
The current decorator mechanism already checks that the signatures match, so it shouldn't be possible to get a mismatch. So probably not much is needed beyond some assert_equal(np.func(...), np.func.__numpy_implementation__(...)) checks. @Stephan the PR for the NEP change is very hard to parse. Maybe easier to just open a PR with an implementation for one or a few functions + associated tests? Cheers, Ralf
On Mon, Apr 22, 2019, 09:22 Stephan Hoyer <shoyer@gmail.com> wrote:
Are there still concerns here? If not, I would love to move ahead with these changes so we can get this into NumPy 1.17.
On Tue, Apr 16, 2019 at 10:23 AM Stephan Hoyer <shoyer@gmail.com> wrote:
__numpy_implementation__ is indeed simply a slot for third-parties to access NumPy's implementation. It should be considered "NumPy's current implementation", not "NumPy's implementation as of 1.14". Of course, in practice these will remain very similar, because we are already very conservative about how we change NumPy.
I would love to have clean well-defined coercion semantics for every NumPy function, which would be implicitly adopted by `__numpy_implementation__` (e.g., we could say that every function always coerces its arguments with `np.asarray()`). But I think that's an orthogonal issue. We have been supporting some ad-hoc duck typing in NumPy for a long time (e.g., the `.sum()` method which is called by `np.sum()`). Removing that would require a deprecation cycle, which may indeed be warranted once we're sure we're happy with __array_function__. But I don't think the deprecation cycle will be any worse if the implementation is also exposed via `__numpy_implementation__`.
We should definitely still think about a cleaner "core" implementation of NumPy functions in terms of a minimal core. One recent example of this can be found JAX (see https://github.com/google/jax/blob/04b45e4086249bad691a33438e8bb6fcf639d001/...). This would be something appropriate to put into a more generic function attribute on NumPy functions, perhaps `__array_implementation__`. But I don't think formalizing `__numpy_implementation__` as a way to get access to NumPy's default implementation will limit our future options here.
Cheers, Stephan
On Tue, Apr 16, 2019 at 6:44 AM Marten van Kerkwijk < m.h.vankerkwijk@gmail.com> wrote:
I somewhat share Nathaniel's worry that by providing `__numpy_implementation__` we essentially get stuck with the implementations we have currently, rather than having the hoped-for freedom to remove all the `np.asarray` coercion. In that respect, an advantage of using `_wrapped` is that it is clearly a private method, so anybody is automatically forewarned that this can change.
In principle, ndarray.__array_function__ would be more logical, but as noted in the PR, the problem is that it is non-trivial for a regular __array_function__ implementation to coerce all the arguments to ndarray itself.
Which suggests that perhaps what is missing is a general routine that does that, i.e., that re-uses the dispatcher.
-- Marten _______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@python.org https://mail.python.org/mailman/listinfo/numpy-discussion
_______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@python.org https://mail.python.org/mailman/listinfo/numpy-discussion
_______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@python.org https://mail.python.org/mailman/listinfo/numpy-discussion
On Mon, Apr 22, 2019 at 2:20 PM Ralf Gommers <ralf.gommers@gmail.com> wrote:
On Mon, Apr 22, 2019 at 9:26 PM Nathaniel Smith <njs@pobox.com> wrote:
Your last email didn't really clarify anything for me. I get that np.func.__numpy_implementation__ is intended to have the semantics of numpy's implementation of func, but that doesn't tell me much :-). And also, that's exactly the definition of np.func, isn't it?
My understanding of the protocol we came up with in NEP-18 is that every NumPy function (that takes array-like arguments) now has two parts to its implementation: 1. The NEP-18 part involving calling the dispatcher function, and checking for/calling __array_function__ attributes on array-like arguments. This part is documented in NEP-18. 2. The original function definition, which is called if either (a) no __array_function__ attributes exist, or (b) the only __array_function__ attribute is numpy.ndarray.__array_function__. This part is documented in the docstring of the NumPy function. "__numpy_implementation__" provides a short-cut to (2) without (1). That's it. OK, thinking about this a little bit more, there is other one (rare) difference: in cases where a function has deprecated arguments, we are currently only issuing the deprecation warnings in the dispatcher function, rather than in both the dispatcher and the implementation. This is all the more reason to discourage users from calling __numpy_implementation__ directly (I'll update the NEP), but it's still fine to call __numpy_implementation__ from within __array_function__ methods themselves. I guess the other option would be to make it programmatically impossible to access implementations outside of __array_function__, by making numpy_implementation an argument used to call __array_function__() rather than making it an attribute on NumPy functions. I don't like this as much, for two reasons: 1. It would break every existing implementation of __array_function__ before it launches. We did reserve the right to do this, but it's still a little unfriendly to our early adopters. 2. There are still cases where users will prefer to call np.concatenate.__numpy_implementation__ for extra performance, even knowing that they will miss any hypothetical deprecation warnings and removed/renamed function arguments. You're talking about ~doubling the size of numpy's API,
I think we can already get both the NEP 18 wrapped functions and their underlying implementations today, based on the value of NUMPY_EXPERIMENTAL_ARRAY_FUNCTION.
It looks to me like all this proposed change does is bypass a do-very-little wrapper.
This is how I think of it. and don't seem able to even articulate what the new API's commitments are.
This still makes me nervous. Maybe it should have a NEP? What's your testing strategy for all the new functions?
The current decorator mechanism already checks that the signatures match, so it shouldn't be possible to get a mismatch. So probably not much is needed beyond some assert_equal(np.func(...), np.func.__numpy_implementation__(...)) checks.
@Stephan the PR for the NEP change is very hard to parse. Maybe easier to just open a PR with an implementation for one or a few functions + associated tests?
Sure, here's a full implementation (with tests): https://github.com/numpy/numpy/pull/13389 I have not included tests on every numpy function, but we didn't write those for each NumPy function with __array_function__ overrides, either -- the judgment was that the changes are mechanistic enough that writing a unit test for each function would not be worthwhile. Also you'll note that my PR includes only a single change to np.ndarray.__array_function__ (swapping out __wrapped__ -> __numpy_implementation__). This is because we had actually already changed the implementation of ndarray.__array_function__ without updating the NEP, per prior discussion on the mailing list [1]. The existing use of the __wrapped__ attribute is an undocumented optimization / implementation detail. [1] https://mail.python.org/pipermail/numpy-discussion/2018-November/078912.html
Cheers, Ralf
On Mon, Apr 22, 2019, 09:22 Stephan Hoyer <shoyer@gmail.com> wrote:
Are there still concerns here? If not, I would love to move ahead with these changes so we can get this into NumPy 1.17.
On Tue, Apr 16, 2019 at 10:23 AM Stephan Hoyer <shoyer@gmail.com> wrote:
__numpy_implementation__ is indeed simply a slot for third-parties to access NumPy's implementation. It should be considered "NumPy's current implementation", not "NumPy's implementation as of 1.14". Of course, in practice these will remain very similar, because we are already very conservative about how we change NumPy.
I would love to have clean well-defined coercion semantics for every NumPy function, which would be implicitly adopted by `__numpy_implementation__` (e.g., we could say that every function always coerces its arguments with `np.asarray()`). But I think that's an orthogonal issue. We have been supporting some ad-hoc duck typing in NumPy for a long time (e.g., the `.sum()` method which is called by `np.sum()`). Removing that would require a deprecation cycle, which may indeed be warranted once we're sure we're happy with __array_function__. But I don't think the deprecation cycle will be any worse if the implementation is also exposed via `__numpy_implementation__`.
We should definitely still think about a cleaner "core" implementation of NumPy functions in terms of a minimal core. One recent example of this can be found JAX (see https://github.com/google/jax/blob/04b45e4086249bad691a33438e8bb6fcf639d001/...). This would be something appropriate to put into a more generic function attribute on NumPy functions, perhaps `__array_implementation__`. But I don't think formalizing `__numpy_implementation__` as a way to get access to NumPy's default implementation will limit our future options here.
Cheers, Stephan
On Tue, Apr 16, 2019 at 6:44 AM Marten van Kerkwijk < m.h.vankerkwijk@gmail.com> wrote:
I somewhat share Nathaniel's worry that by providing `__numpy_implementation__` we essentially get stuck with the implementations we have currently, rather than having the hoped-for freedom to remove all the `np.asarray` coercion. In that respect, an advantage of using `_wrapped` is that it is clearly a private method, so anybody is automatically forewarned that this can change.
In principle, ndarray.__array_function__ would be more logical, but as noted in the PR, the problem is that it is non-trivial for a regular __array_function__ implementation to coerce all the arguments to ndarray itself.
Which suggests that perhaps what is missing is a general routine that does that, i.e., that re-uses the dispatcher.
-- Marten _______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@python.org https://mail.python.org/mailman/listinfo/numpy-discussion
_______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@python.org https://mail.python.org/mailman/listinfo/numpy-discussion
_______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@python.org https://mail.python.org/mailman/listinfo/numpy-discussion
_______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@python.org https://mail.python.org/mailman/listinfo/numpy-discussion
On Mon, Apr 22, 2019 at 11:13 PM Stephan Hoyer <shoyer@gmail.com> wrote:
On Mon, Apr 22, 2019 at 9:26 PM Nathaniel Smith <njs@pobox.com> wrote:
Your last email didn't really clarify anything for me. I get that np.func.__numpy_implementation__ is intended to have the semantics of numpy's implementation of func, but that doesn't tell me much :-). And also, that's exactly the definition of np.func, isn't it?
My understanding of the protocol we came up with in NEP-18 is that every NumPy function (that takes array-like arguments) now has two parts to its implementation: 1. The NEP-18 part involving calling the dispatcher function, and checking for/calling __array_function__ attributes on array-like arguments. This part is documented in NEP-18. 2. The original function definition, which is called if either (a) no __array_function__ attributes exist, or (b) the only __array_function__ attribute is numpy.ndarray.__array_function__. This part is documented in the docstring of the NumPy function.
"__numpy_implementation__" provides a short-cut to (2) without (1). That's it.
OK, so the semantics are: the same as the normal function, except we pretend that none of the arguments have an __array_function__ attribute? That's much clearer to me than how you were phrasing it before :-). Though now the name "__numpy_implementation__" doesn't seem very evocative of what it does... numpy's dispatch sequence has changed a lot in the past (mostly adding new coercion rules), and will probably change in the future, and "__numpy_implementation__" doesn't give much guidance about which parts of the dispatch sequence should be skipped as "dispatch" and which should be included as "implementation". Maybe something like __skipping_array_function__? -n -- Nathaniel J. Smith -- https://vorpus.org
Hi All, Reading the discussion again, I've gotten somewhat unsure that it is helpful to formalize a way to call an implementation that we can and hopefully will change. Why not just leave it at __wrapped__? I think the name is no worse and it is more obvious that one relies on something private. I ask in part since I could see a good case for having a special method that is available only for functions that do no explicit casting to array, i.e., that are ready to accept array mimics (and for which we're willing to guarantee that would not change). For instance, functions like np.sinc that really have no business using more than ufuncs under the hood, i.e., which any class that has __array_ufunc__ can call safely. Or (eventually) all those functions that just end up calling `concatenate` - those again could easily be made safe for a class that just overrides `np.concatenate` using __array_function__. In essence, this would be any function that does not do `np.as(any)array` but just relies on array attributes. But the above obviously presumes a vision of where this is headed, which I'm not sure is shared... All the best, Marten
On Tue, Apr 23, 2019 at 4:31 PM Marten van Kerkwijk < m.h.vankerkwijk@gmail.com> wrote:
Hi All,
Reading the discussion again, I've gotten somewhat unsure that it is helpful to formalize a way to call an implementation that we can and hopefully will change. Why not just leave it at __wrapped__? I think the name is no worse and it is more obvious that one relies on something private.
I'm not convinced about the name either. NEP 18 also suggests adopting the protocol in other libraries, so for SciPy would we have to name it __scipy_implementation__? Not sure that's better or worse than a generic __wrapped__ I don't see why the numpy implementation must be considered private though. It's public today, and there's little wrong with keeping it public. The "it can change" doesn't really apply, it has the same backwards compat guarantees going forward that we have now.
I ask in part since I could see a good case for having a special method that is available only for functions that do no explicit casting to array, i.e., that are ready to accept array mimics (and for which we're willing to guarantee that would not change). For instance, functions like np.sinc that really have no business using more than ufuncs under the hood, i.e., which any class that has __array_ufunc__ can call safely. Or (eventually) all those functions that just end up calling `concatenate` - those again could easily be made safe for a class that just overrides `np.concatenate` using __array_function__. In essence, this would be any function that does not do `np.as(any)array` but just relies on array attributes.
But the above obviously presumes a vision of where this is headed, which I'm not sure is shared...
This is an orthogonal topic I think - you want multiple implementations, "safe" and "unsafe" (or fast vs checking for invalids vs robust for subclasses, etc. - lots of options here). Cheers, Ralf
On Tue, Apr 23, 2019 at 12:27 AM Nathaniel Smith <njs@pobox.com> wrote:
On Mon, Apr 22, 2019 at 11:13 PM Stephan Hoyer <shoyer@gmail.com> wrote:
On Mon, Apr 22, 2019 at 9:26 PM Nathaniel Smith <njs@pobox.com> wrote:
Your last email didn't really clarify anything for me. I get that
np.func.__numpy_implementation__ is intended to have the semantics of numpy's implementation of func, but that doesn't tell me much :-). And also, that's exactly the definition of np.func, isn't it?
My understanding of the protocol we came up with in NEP-18 is that every
1. The NEP-18 part involving calling the dispatcher function, and checking for/calling __array_function__ attributes on array-like arguments. This part is documented in NEP-18. 2. The original function definition, which is called if either (a) no __array_function__ attributes exist, or (b) the only __array_function__ attribute is numpy.ndarray.__array_function__. This part is documented in
NumPy function (that takes array-like arguments) now has two parts to its implementation: the docstring of the NumPy function.
"__numpy_implementation__" provides a short-cut to (2) without (1).
That's it.
OK, so the semantics are: the same as the normal function, except we pretend that none of the arguments have an __array_function__ attribute?
That's much clearer to me than how you were phrasing it before :-).
OK, I will make sure something like this ends up in the NEP :)
Though now the name "__numpy_implementation__" doesn't seem very evocative of what it does... numpy's dispatch sequence has changed a lot in the past (mostly adding new coercion rules), and will probably change in the future, and "__numpy_implementation__" doesn't give much guidance about which parts of the dispatch sequence should be skipped as "dispatch" and which should be included as "implementation". Maybe something like __skipping_array_function__?
With "__numpy_implementation__" I was hoping to evoke "the implementation used by numpy.ndarray.__array_function__" and "the implementation for NumPy arrays" rather than "the implementation found in the NumPy library." So it would still be appropriate to use on functions defined in SciPy, as long as they are defined on NumPy arrays. That said, this is clearly going to remain a source of confusion. So let's see if we can do better. Taking a step back, there will be three generic parts to NumPy functions after NEP-18: 1. Dispatching with __array_function__ 2. Coercion to NumPy arrays (sometimes skipped if an object has the necessary duck-typing methods) 3. Implementation (either in C or is terms of other NumPy functions/methods) Currently, NumPy functions do steps (2) and (3) together. What we're asking for here is a way to continue this behavior in the future, by optionally skipping step (1). But in the future, as Marten notes below, we should not rule out cases where we also want to skip straight to step (3), without step (2). "__skipping_array_function__" would be a reasonable choice, though it does not evoke the "numpy array specific" aspect that I want to emphasis. Also, it has the unfortunate aspect of being named after what it doesn't do, rather than what it does. "__numpy_ndarray_implementation__" and "__numpy_array_implementation__" are a bit verbose, but maybe they would be better? The generic "__wrapped__" seems like a pretty bad choice to me, both because it's not at all descriptive and because it's generically used by functools.wraps -- which means np.ndarray.__array_function__ could inadvertently succeed when called with non-NumPy functions. Let's at least stick to unique names for our protocols :).
On Wed, Apr 24, 2019 at 9:45 PM Stephan Hoyer <shoyer@gmail.com> wrote:
With "__numpy_implementation__" I was hoping to evoke "the implementation used by numpy.ndarray.__array_function__" and "the implementation for NumPy arrays" rather than "the implementation found in the NumPy library." So it would still be appropriate to use on functions defined in SciPy, as long as they are defined on NumPy arrays.
That said, this is clearly going to remain a source of confusion. So let's see if we can do better.
Taking a step back, there will be three generic parts to NumPy functions after NEP-18: 1. Dispatching with __array_function__ 2. Coercion to NumPy arrays (sometimes skipped if an object has the necessary duck-typing methods) 3. Implementation (either in C or is terms of other NumPy functions/methods)
Currently, NumPy functions do steps (2) and (3) together. What we're asking for here is a way to continue this behavior in the future, by optionally skipping step (1). But in the future, as Marten notes below, we should not rule out cases where we also want to skip straight to step (3), without step (2).
"__skipping_array_function__" would be a reasonable choice, though it does not evoke the "numpy array specific" aspect that I want to emphasis. Also, it has the unfortunate aspect of being named after what it doesn't do, rather than what it does.
"__numpy_ndarray_implementation__" and "__numpy_array_implementation__" are a bit verbose, but maybe they would be better?
When you say "numpy array specific" and "__numpy_(nd)array_implementation__", that sounds to me like you're trying to say "just step 3, skipping steps 1 and 2"? Step 3 is the one that operates on ndarrays... When we have some kind of __asduckarray__ coercion, then that will complicate things too, because presumably we'll do something like 1. __array_function__ dispatch 2. __asduckarray__ coercion 3. __array_function__ dispatch again 4. ndarray coercion 5. [either "the implementation", or __array_function__ dispatch again, depending on how you want to think about it] -n -- Nathaniel J. Smith -- https://vorpus.org
On Wed, Apr 24, 2019 at 9:56 PM Nathaniel Smith <njs@pobox.com> wrote:
When you say "numpy array specific" and "__numpy_(nd)array_implementation__", that sounds to me like you're trying to say "just step 3, skipping steps 1 and 2"? Step 3 is the one that operates on ndarrays...
My thinking was that if we implement NumPy functions with duck typing (e.g., `np.stack()` in terms of `.shape` + `np.concatenate()`), then step (3) could in some sense be the generic "array implementation", not only for NumPy arrays.
When we have some kind of __asduckarray__ coercion, then that will complicate things too, because presumably we'll do something like
1. __array_function__ dispatch 2. __asduckarray__ coercion 3. __array_function__ dispatch again 4. ndarray coercion 5. [either "the implementation", or __array_function__ dispatch again, depending on how you want to think about it]
I was thinking of something a little simpler: do __asduckarray__ rather than numpy.ndarray coercion inside the implementation of NumPy functions. Then making use of NumPy's implementations would be a matter of calling the NumPy implementation without ndarray coercion from side __array_function__. e.g., class MyArray: def __duck_array__(self): return self def __array_function__(self, func, types, args, kwargs): ... if func in {np.stack, np.atleast_1d, ...}: # use NumPy's "duck typing" implementations for these functions return func.__duck_array_implementation__(*args, **kwargs) elif func == np.concatenate: # write my own version of np.concatenate ... This would let you make use of duck typing in a controlled way if you use __array_function__. np.stack.__duck_array_implementation__ would look exactly like np.stack, except np.asanyarray() would be replaced by np.asduckarray(). The reason why we need the separate __duck_array_implementation__ and __numpy_array_implementation__/__skipping_array_function__ is because there are also use cases where you *don't* want to worry about how np.stack is implemented under the hood (i.e., in terms of np.concatenate), and want to go straight to the coercive numpy.ndarray implementation. This lets you avoid both the complexity and overhead associated with further dispatch checks. I don't think we want repeated dispatching with __array_function__. That seems like a recipe for slow performance and confusion.
It seems we are adding to the wishlist! I see four so far: 1. Exposed in API, can be overridden with __array_ufunc__ 2. One that converts everything to ndarray (or subclass); essentially the current implementation; 3. One that does asduckarray 4. One that assumes all arguments are arrays. Maybe handiest would be if there is a method to coerce all relevant arguments with a function of one's choice? I.e., in the example of Stephan, one would have ``` if function in JUST_COERCE: coerced_args, coerced_kwargs = function.__coerce__(np.asanyarray, *args, **kwargs) return function.__implementation__(*coerced_args, **coerced_kwargs) ``` Actually, this might in fact work with the plan proposed here, if we allow for an extra, optional kwarg that contains the coercion function, that is ``` return function.__implementation__(*args, coercion_function=np.asanyarray, **kwargs) ``` The possible advantage of this over yet more dunder methods is that one can fine-tune the extent to which something has to mimic an array properly (e.g., run `asanyarray` only if `shape` is not present). It would be nice, though, if we could end up with also option 4 being available, if only because code that just can assume ndarray will be easiest to read. All the best, Marten
On Thursday, Apr 25, 2019 at 9:45 PM, Marten van Kerkwijk <m.h.vankerkwijk@gmail.com (mailto:m.h.vankerkwijk@gmail.com)> wrote: It seems we are adding to the wishlist! I see four so far: 1. Exposed in API, can be overridden with __array_ufunc__ 2. One that converts everything to ndarray (or subclass); essentially the current implementation; 3. One that does asduckarray 4. One that assumes all arguments are arrays.
Maybe handiest would be if there is a method to coerce all relevant arguments with a function of one's choice? I.e., in the example of Stephan, one would have ``` if function in JUST_COERCE: coerced_args, coerced_kwargs = function.__coerce__(np.asanyarray, *args, **kwargs) return function.__implementation__(*coerced_args, **coerced_kwargs) ``` Actually, this might in fact work with the plan proposed here, if we allow for an extra, optional kwarg that contains the coercion function, that is ``` return function.__implementation__(*args, coercion_function=np.asanyarray, **kwargs) ```
The possible advantage of this over yet more dunder methods is that one can fine-tune the extent to which something has to mimic an array properly (e.g., run `asanyarray` only if `shape` is not present).
It would be nice, though, if we could end up with also option 4 being available, if only because code that just can assume ndarray will be easiest to read.
All the best,
Marten _______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@python.org https://mail.python.org/mailman/listinfo/numpy-discussion
Hi everyone, Although, in general, I agree with Stephan’s design goals, I agree with Marten that the number of protocols are getting larger and may get out of hand if not handled properly. There’s even one Marten forgot to mention: __array_dtype__. I have been working on a project that I consider to have all the essential features that Marten proposes, mostly within one framework. It’s called uarray (for universal array) and can be found over at Source: https://github.com/Quansight-Labs/uarray Documentation: https://uarray.readthedocs.io/en/latest/ It adopts the “separation of implementation from interface” principles from the beginning. Here’s how it works: There are MultiMethods and Backends. A Backend registers implementations for a given MultiMethod. A MultiMethod defines the signature, along with the elements that can be dispatched over, along with their types. To it, NumPy is (and I realise this is going to be controversial, since this is the NumPy mailing list), just another backend. Here’s how it addresses Marten’s concerns: Everything is made into a MultiMethod. Then, the multimethod marks objects it’d like to dispatch over. For the status quo, this is arrays. But thinking long-term, we could dispatch over abstract ufuncs and dtypes as well. For ufuncs, ufunc.__call__ and ufunc.reduce are also MultiMethods. Coercion works by extracting marked dispatchables, converting them into native library equivalents and then passing them back into the function. For example, it would convert lists (or anything marked as an array) to arrays. What it could also do is convert dtype=‘int64’ to an actual dtype, and so on. __asduckarray__ is rendered unnecessary… Coercion handles that. You can check out the usage examples in the tests: Core backend infrastructure: https://github.com/Quansight-Labs/uarray/blob/master/uarray/tests/test_backe... Backend infrastructure: https://github.com/Quansight-Labs/uarray/blob/master/unumpy/tests/test_numpy... Examples of how to write NumPy MultiMethods are here: https://github.com/Quansight-Labs/uarray/blob/master/unumpy/multimethods.py, along with the accompanying Backends in https://github.com/Quansight-Labs/uarray/tree/master/unumpy. Best Regards, Hameer Abbasi
On Thu, Apr 25, 2019 at 1:30 PM Hameer Abbasi <einstein.edison@gmail.com> wrote:
Although, in general, I agree with Stephan’s design goals, I agree with Marten that the number of protocols are getting larger and may get out of hand if not handled properly. There’s even one Marten forgot to mention: __array_dtype__.
What's __array_dtype__? That string doesn't seem to appear in the numpy source, and google has no hits... -n -- Nathaniel J. Smith -- https://vorpus.org
On Thu, Apr 25, 2019 at 12:46 PM Marten van Kerkwijk < m.h.vankerkwijk@gmail.com> wrote:
It seems we are adding to the wishlist! I see four so far: 1. Exposed in API, can be overridden with __array_ufunc__ 2. One that converts everything to ndarray (or subclass); essentially the current implementation; 3. One that does asduckarray 4. One that assumes all arguments are arrays.
Maybe handiest would be if there is a method to coerce all relevant arguments with a function of one's choice? I.e., in the example of Stephan, one would have ``` if function in JUST_COERCE: coerced_args, coerced_kwargs = function.__coerce__(np.asanyarray, *args, **kwargs) return function.__implementation__(*coerced_args, **coerced_kwargs) ``` Actually, this might in fact work with the plan proposed here, if we allow for an extra, optional kwarg that contains the coercion function, that is ``` return function.__implementation__(*args, coercion_function=np.asanyarray, **kwargs) ```
The possible advantage of this over yet more dunder methods is that one can fine-tune the extent to which something has to mimic an array properly (e.g., run `asanyarray` only if `shape` is not present).
I do like the look of this, but keep in mind that there is a downside to exposing the implementation of NumPy functions -- now the implementation details become part of NumPy's API. I suspect we do not want to commit ourselves to never changing the implementation of NumPy functions, so at the least this will need careful disclaimers about non-guarantees of backwards compatibility. But for now, I would love to pick a name for "essentially the current implementation", which is something that would make a big difference for near-term NEP-18 use cases. Some options: __skipping_array_function__ __coercive_implementation__ __asarray_implementaiton__ The last two are not quite right, since there is some legacy dispatching to methods. Maybe __skipping_array_function__ is the best? Whatever we pick, we can always make it an alias later, e.g., for func.__implementation__(*args, coercion_function=np.asanyarray, **kwargs).
It would be nice, though, if we could end up with also option 4 being available, if only because code that just can assume ndarray will be easiest to read.
This could perhaps just be coercion_function=None? Or maybe we want to keep around coercion_function=None for "do whatever ad-hoc coercion NumPy current does"?
All the best,
Marten _______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@python.org https://mail.python.org/mailman/listinfo/numpy-discussion
On Fri, Apr 26, 2019 at 12:04 AM Stephan Hoyer <shoyer@gmail.com> wrote:
On Thu, Apr 25, 2019 at 12:46 PM Marten van Kerkwijk < m.h.vankerkwijk@gmail.com> wrote:
It seems we are adding to the wishlist! I see four so far: 1. Exposed in API, can be overridden with __array_ufunc__ 2. One that converts everything to ndarray (or subclass); essentially the current implementation; 3. One that does asduckarray 4. One that assumes all arguments are arrays.
Maybe handiest would be if there is a method to coerce all relevant arguments with a function of one's choice? I.e., in the example of Stephan, one would have ``` if function in JUST_COERCE: coerced_args, coerced_kwargs = function.__coerce__(np.asanyarray, *args, **kwargs) return function.__implementation__(*coerced_args, **coerced_kwargs) ``` Actually, this might in fact work with the plan proposed here, if we allow for an extra, optional kwarg that contains the coercion function, that is ``` return function.__implementation__(*args, coercion_function=np.asanyarray, **kwargs) ```
The possible advantage of this over yet more dunder methods is that one can fine-tune the extent to which something has to mimic an array properly (e.g., run `asanyarray` only if `shape` is not present).
I do like the look of this, but keep in mind that there is a downside to exposing the implementation of NumPy functions -- now the implementation details become part of NumPy's API. I suspect we do not want to commit ourselves to never changing the implementation of NumPy functions, so at the least this will need careful disclaimers about non-guarantees of backwards compatibility.
I honestly still am missing the point of claiming this. There is no change either way to what we've done for the last decade. If we change anything in the numpy implementation of any function, we use deprecation warnings etc. What am I missing here?
But for now, I would love to pick a name for "essentially the current implementation", which is something that would make a big difference for near-term NEP-18 use cases. Some options: __skipping_array_function__ __coercive_implementation__ __asarray_implementaiton__
The last two are not quite right, since there is some legacy dispatching to methods. Maybe __skipping_array_function__ is the best?
imho those names are all worse than __wrapped__ or __numpy_implementation__ Ralf
Whatever we pick, we can always make it an alias later, e.g., for func.__implementation__(*args, coercion_function=np.asanyarray, **kwargs).
It would be nice, though, if we could end up with also option 4 being available, if only because code that just can assume ndarray will be easiest to read.
This could perhaps just be coercion_function=None? Or maybe we want to keep around coercion_function=None for "do whatever ad-hoc coercion NumPy current does"?
All the best,
Marten _______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@python.org https://mail.python.org/mailman/listinfo/numpy-discussion
_______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@python.org https://mail.python.org/mailman/listinfo/numpy-discussion
On Thu, Apr 25, 2019 at 3:39 PM Ralf Gommers <ralf.gommers@gmail.com> wrote:
On Fri, Apr 26, 2019 at 12:04 AM Stephan Hoyer <shoyer@gmail.com> wrote:
I do like the look of this, but keep in mind that there is a downside to exposing the implementation of NumPy functions -- now the implementation details become part of NumPy's API. I suspect we do not want to commit ourselves to never changing the implementation of NumPy functions, so at the least this will need careful disclaimers about non-guarantees of backwards compatibility.
I honestly still am missing the point of claiming this. There is no change either way to what we've done for the last decade. If we change anything in the numpy implementation of any function, we use deprecation warnings etc. What am I missing here?
Hypothetically, wuppose we rewrite np.stack() in terms of np.block() instead of np.concatenate(), because it turns out it is faster. As long as we've coercing with np.asarray(), users don't notice any material difference -- their code just gets a little faster. But this could be problematic if we support duck typing. For example, I support dask arrays rely on NumPy's definition of np.stack in terms of np.concatenate, but they never bothered to implement np.block. Now upgrading NumPy breaks dask. This is basically the same reason why subclass support has been hard to maintain in NumPy. Apparently safe internal changes to NumPy functions can break other array types in surprising ways, even if they do not intentionally deviate from NumPy's semantics.
On Fri, Apr 26, 2019 at 1:02 AM Stephan Hoyer <shoyer@gmail.com> wrote:
On Thu, Apr 25, 2019 at 3:39 PM Ralf Gommers <ralf.gommers@gmail.com> wrote:
On Fri, Apr 26, 2019 at 12:04 AM Stephan Hoyer <shoyer@gmail.com> wrote:
I do like the look of this, but keep in mind that there is a downside to exposing the implementation of NumPy functions -- now the implementation details become part of NumPy's API. I suspect we do not want to commit ourselves to never changing the implementation of NumPy functions, so at the least this will need careful disclaimers about non-guarantees of backwards compatibility.
I honestly still am missing the point of claiming this. There is no change either way to what we've done for the last decade. If we change anything in the numpy implementation of any function, we use deprecation warnings etc. What am I missing here?
Hypothetically, wuppose we rewrite np.stack() in terms of np.block() instead of np.concatenate(), because it turns out it is faster.
As long as we've coercing with np.asarray(), users don't notice any material difference -- their code just gets a little faster.
But this could be problematic if we support duck typing. For example, I support dask arrays rely on NumPy's definition of np.stack in terms of np.concatenate, but they never bothered to implement np.block. Now upgrading NumPy breaks dask.
Thanks, this helped clarify what's going on here. This example is clear. The problem seems to be that there's two separate discussions in this thread: 1. your original proposal, __numpy_implementation__. it does not have the problem of your np.concatenate example, as the "numpy implementation" is exactly the same as it is today. 2. splitting up the current numpy implementation into *multiple* entry points. this can be with and without coercion, with and without checking for invalid values etc. So far NEP 18 does (1). Your proposed __numpy_implementation__ addition to NEP 18 is still (1). Claiming that this affects the situation with respect to backwards compatibility is incorrect. (2) is actually a much more invasive change, and one that does much more to increase the size of the NumPy API surface. And yes, affects our backwards compatibility situation as well. Also note that these have very different purposes: (1) was to (quoting from the NEP) "allow using NumPy as a high level API for efficient multi-dimensional array operations, even with array implementations that differ greatly from numpy.ndarray." (2) is for making duck arrays work with numpy implementations of functions (not just with the NumPy API) I think (1) is mostly achieved, and I'm +1 on your NEP addition for that. (2) is quickly becoming a mess, and I agree with Nathaniel's sentiment above "I shouldn't expect __array_function__ to be useful for duck arrays?". For (2) we really need to go back and have a well thought out design. Hameer's mention of uarray could be that. Growing more __array_*__ protocols in a band-aid fashion seems unlikely to get us there.
This is basically the same reason why subclass support has been hard to maintain in NumPy. Apparently safe internal changes to NumPy functions can break other array types in surprising ways, even if they do not intentionally deviate from NumPy's semantics.
Agreed. Therefore optionally skipping asarray & co is a separate discussion. That's part of the problem caused by numpy trying to be both a library and an end user interface - and often those goals conflict. Cheers, Ralf
Here’s my take on it: The goal is basically “separation of interface from implementation”, NumPy reference becomes just one (reference) implementation (kind of like CPython is today). The idea is that unumpy/NumPy drive the interface, while there can be many implementations. To make duck-arrays work with the same code. This is achieved by `__array_function__`, other than for cases where we’re creating an array. Composability, and traversing backend boundaries. Coercion to native library objects: This requires the “reverse dispatcher” I kept mentioning to take the args/kwargs and “put back” the coerced arrays into it. This is impossible in the current framework, but can be made possible using the proposals by Stephan and Marten. Dispatch over arbitrary objects, such as dtypes or ufuncs, from other libraries. We are far from this goal, and, it will require repitions of protocols already available for arrays… Here’s how `uarray` solves each of these issues: Backends… There is no default implementation. This is handled by (thread-safe) context managers, which make switching easy. There’s one coercion function per type of objec Libraries are only asked to dispatch over objects they know how to convert, so there’s no backwards-incompatible break when we add dtypes or ufuncs. Conversion can be as simple as lambda x: x. There’s a generic dispatcher and reverse dispatcher per function, with “marks” to indicate the type of object. Arrays are just one “type” of object you can dispatch over, so there’s no repition by definition. Best Regards, Hameer Abbasi
On Friday, Apr 26, 2019 at 10:31 AM, Ralf Gommers <ralf.gommers@gmail.com (mailto:ralf.gommers@gmail.com)> wrote:
On Fri, Apr 26, 2019 at 1:02 AM Stephan Hoyer <shoyer@gmail.com (mailto:shoyer@gmail.com)> wrote:
On Thu, Apr 25, 2019 at 3:39 PM Ralf Gommers <ralf.gommers@gmail.com (mailto:ralf.gommers@gmail.com)> wrote:
On Fri, Apr 26, 2019 at 12:04 AM Stephan Hoyer <shoyer@gmail.com (mailto:shoyer@gmail.com)> wrote:
I do like the look of this, but keep in mind that there is a downside to exposing the implementation of NumPy functions -- now the implementation details become part of NumPy's API. I suspect we do not want to commit ourselves to never changing the implementation of NumPy functions, so at the least this will need careful disclaimers about non-guarantees of backwards compatibility.
I honestly still am missing the point of claiming this. There is no change either way to what we've done for the last decade. If we change anything in the numpy implementation of any function, we use deprecation warnings etc. What am I missing here?
Hypothetically, wuppose we rewrite np.stack() in terms of np.block() instead of np.concatenate(), because it turns out it is faster.
As long as we've coercing with np.asarray(), users don't notice any material difference -- their code just gets a little faster.
But this could be problematic if we support duck typing. For example, I support dask arrays rely on NumPy's definition of np.stack in terms of np.concatenate, but they never bothered to implement np.block. Now upgrading NumPy breaks dask.
Thanks, this helped clarify what's going on here. This example is clear. The problem seems to be that there's two separate discussions in this thread: 1. your original proposal, __numpy_implementation__. it does not have the problem of your np.concatenate example, as the "numpy implementation" is exactly the same as it is today. 2. splitting up the current numpy implementation into *multiple* entry points. this can be with and without coercion, with and without checking for invalid values etc.
So far NEP 18 does (1). Your proposed __numpy_implementation__ addition to NEP 18 is still (1). Claiming that this affects the situation with respect to backwards compatibility is incorrect.
(2) is actually a much more invasive change, and one that does much more to increase the size of the NumPy API surface. And yes, affects our backwards compatibility situation as well.
Also note that these have very different purposes: (1) was to (quoting from the NEP) "allow using NumPy as a high level API for efficient multi-dimensional array operations, even with array implementations that differ greatly from numpy.ndarray." (2) is for making duck arrays work with numpy implementations of functions (not just with the NumPy API)
I think (1) is mostly achieved, and I'm +1 on your NEP addition for that. (2) is quickly becoming a mess, and I agree with Nathaniel's sentiment above "I shouldn't expect __array_function__ to be useful for duck arrays?". For (2) we really need to go back and have a well thought out design. Hameer's mention of uarray could be that. Growing more __array_*__ protocols in a band-aid fashion seems unlikely to get us there.
This is basically the same reason why subclass support has been hard to maintain in NumPy. Apparently safe internal changes to NumPy functions can break other array types in surprising ways, even if they do not intentionally deviate from NumPy's semantics.
Agreed. Therefore optionally skipping asarray & co is a separate discussion. That's part of the problem caused by numpy trying to be both a library and an end user interface - and often those goals conflict.
Cheers, Ralf
_______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@python.org https://mail.python.org/mailman/listinfo/numpy-discussion
On Fri, Apr 26, 2019 at 3:10 AM Hameer Abbasi <einstein.edison@gmail.com> wrote:
Here’s how `uarray` solves each of these issues:
1. Backends… There is no default implementation. 2. This is handled by (thread-safe) context managers, which make switching easy. 3. There’s one coercion function per type of objec - Libraries are only asked to dispatch over objects they know how to convert, so there’s no backwards-incompatible break when we add dtypes or ufuncs. - Conversion can be as simple as lambda x: x. - There’s a generic dispatcher and reverse dispatcher per function, with “marks” to indicate the type of object. 4. Arrays are just one “type” of object you can dispatch over, so there’s no repition by definition.
Hameer, it's great that you are exploring these problems with a fresh approach! I'm excited to see how dispatching problems could be solved without the constraint of compatibility with NumPy's legacy approaches.
When you have a prototype and/or design documents ready for review, please do share them with the numpy-discussion list. I would be very glad to review them and share my perspective. That said, please save it a separate discussion thread, given that the design of uarray is (wisely) orthogonal to NEP-18.
Hi Stephan,
Hameer, it's great that you are exploring these problems with a fresh approach! I'm excited to see how dispatching problems could be solved without the constraint of compatibility with NumPy's legacy approaches.
When you have a prototype and/or design documents ready for review, please do share them with the numpy-discussion list. I would be very glad to review them and share my perspective.
That’s a great idea! I’ll get those ready, perhaps a NEP.
That said, please save it a separate discussion thread, given that the design of uarray is (wisely) orthogonal to NEP-18. I disagree, I don’t consider it orthogonal: I’m presenting a way to avoid the very protocols being discussed, and I’d like to avoid duplicate work, or making NumPy itself un-maintainable. Please note the text of NEP-18:
The __array_function__protocol, and its use on particular functions, is experimental. We plan to retain an interface that makes it possible to override NumPy functions, but the way to do so for particular functions can and will change with little warning. If such reduced backwards compatibility guarantees are not accepted to you, do not rely upon overrides of NumPy functions for non-NumPy arrays. See “Non-goals” below for more details.
What I’m presenting is within scope, as it’s an alternative method. Best Regards, Hameer Abbasi
On Fri, Apr 26, 2019 at 9:16 AM Hameer Abbasi <einstein.edison@gmail.com> wrote:
That said, please save it a separate discussion thread, given that the design of uarray is (wisely) orthogonal to NEP-18.
I disagree, I don’t consider it orthogonal: I’m presenting a way to avoid the very protocols being discussed, and I’d like to avoid duplicate work, or making NumPy itself un-maintainable. Please note the text of NEP-18:
The __array_function__protocol, and its use on particular functions, is *experimental*. We plan to retain an interface that makes it possible to override NumPy functions, but the way to do so for particular functions *can and will change *with little warning. If such reduced backwards compatibility guarantees are not accepted to you, do not rely upon overrides of NumPy functions for non-NumPy arrays. See “Non-goals” below for more details.
What I’m presenting is within scope, as it’s an alternative method.
Best Regards, Hameer Abbasi
Are there aspects of your uarray proposal that are relevant to the current proposed revisions to NEP 18? If so, please restate them :). Thanks, Stephan
Hi Stephan,
On Saturday, Apr 27, 2019 at 6:21 PM, Stephan Hoyer <shoyer@gmail.com (mailto:shoyer@gmail.com)> wrote: On Fri, Apr 26, 2019 at 9:16 AM Hameer Abbasi <einstein.edison@gmail.com (mailto:einstein.edison@gmail.com)> wrote:
That said, please save it a separate discussion thread, given that the design of uarray is (wisely) orthogonal to NEP-18. I disagree, I don’t consider it orthogonal: I’m presenting a way to avoid the very protocols being discussed, and I’d like to avoid duplicate work, or making NumPy itself un-maintainable. Please note the text of NEP-18:
The __array_function__protocol, and its use on particular functions, is experimental. We plan to retain an interface that makes it possible to override NumPy functions, but the way to do so for particular functions can and will change with little warning. If such reduced backwards compatibility guarantees are not accepted to you, do not rely upon overrides of NumPy functions for non-NumPy arrays. See “Non-goals” below for more details.
What I’m presenting is within scope, as it’s an alternative method.
Best Regards, Hameer Abbasi
Are there aspects of your uarray proposal that are relevant to the current proposed revisions to NEP 18? If so, please restate them :).
Of course, here’s my proposal: We leave NEP-18 as-is for now, and instead of writing separate protocols for coercion, dtypes and ufuncs (which will be needed somewhere down the line), we have a discussion about uarray and see if it can help there. :-) Ralf and I discussed internally about the possibility of a dedicated call, with all important participants.
Thanks, Stephan
_______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@python.org https://mail.python.org/mailman/listinfo/numpy-discussion
Best Regards, Hameer Abbasi
On Sat, Apr 27, 2019 at 4:38 PM Hameer Abbasi <einstein.edison@gmail.com> wrote:
Of course, here’s my proposal:
We leave NEP-18 as-is for now, and instead of writing separate protocols for coercion, dtypes and ufuncs (which will be needed somewhere down the line), we have a discussion about uarray and see if it can help there. :-)
At a very high level, I don't understand yet how uarray is the kind of thing that could even potentially help, so maybe that's something that would be helpful to dig into. To me, the major challenges in supporting duck arrays in numpy are all about the economics of compatibility – how can third-party libraries support: - as broad a range of functionality as possible, - with the highest possible amount of compatibility, - at a reasonable implementation cost, - and maintain compatibility over time, - given numpy's complex pre-existing API and backwards compatibility commitments. My impression so far is that uarray has a generic multi-method dispatch system, and an independent implementation of some of numpy's functionality. Those are both cool things, but I don't see how they're relevant to the list of challenges above. For the simple strategy of simply letting third-party libraries "take over" dispatch and insert their own implementations, __array_function__ basically covers that. A multi-method system could do the same thing, but not in a materially different way – it's basically two different coats of paint on the same underlying idea. The challenge for __array_function__ is that because it's a simple black-box dispatch system wrapped around the whole library, there's no simple way for third-parties to reuse numpy's code, which creates challenges for compatibility, cost, maintenance, etc. Maybe that will turn out to be a showstopper, maybe not – we can each make our guesses, but since we're trying the experiment then we'll know soon enough :-). If it does turn out to be a showstopper, then the question will become: how do we provide finer-grained protocols that are deeply integrated into numpy's semantics? This can help address those questions above, because (a) it's a narrower set of APIs for implementors to worry about, so implementing will require less resources, (b) they're more precisely defined, so getting the details right is easier, (c) you get more re-use of numpy code in between calls to the protocols, so you automatically get numpy's bug fixes. But... doing this is hard because it really requires us to dig into the details of numpy's semantics. You mention a protocol for ufuncs – we already have that? And it took years, and an immense amount of discussion, because the integration details were genuinely super complicated (e.g., the famous thread about how to handle dispatch when there were both __add__ and __array_ufunc__ methods on the same object). Just saying "we'll use multimethods" doesn't tell you how + and np.add should interoperate. Similarly, an array coercion protocol itself is trivial – "it's called __asduckarray__, it works like __array__ but can return a duckarray", there, done! The hard part is stuff like: okay, but which functions invoke this – array, asarray, implicit coercion in ufuncs? under which circumstances, and what are the consequences for compatibility and deployability? what does asfortranarray do, do fortran arrays even make sense for duck arrays? well, probably not, but compatibility means we need to do something; what does existing code expect? etc. etc. How does uarray help us solve these problems? I don't know what a dtype protocol is. I don't think we want to support dispatching over dtype objects, at least in any of the senses I'm thinking of. But that could mean a lot of things so maybe I'm missing something. -n -- Nathaniel J. Smith -- https://vorpus.org <http://vorpus.org>
On Sat, Apr 27, 2019 at 4:39 PM Hameer Abbasi <einstein.edison@gmail.com> wrote:
On Saturday, Apr 27, 2019 at 6:21 PM, Stephan Hoyer <shoyer@gmail.com> wrote: Are there aspects of your uarray proposal that are relevant to the current proposed revisions to NEP 18? If so, please restate them :).
Of course, here’s my proposal:
We leave NEP-18 as-is for now, and instead of writing separate protocols for coercion, dtypes and ufuncs (which will be needed somewhere down the line), we have a discussion about uarray and see if it can help there. :-)
I don't want to add separate protocols for coercion, dtypes or ufuncs as part of NEP-18. Whatever form these should take, they should definitely be a separate proposals. __array_function__ is not the end of the story about duck array support in NumPy, but I think it's valuable incremental step, as evidenced by the projects that are already eager to adopt it. I would really, really like to try to get a usable and near-final version of it released in NumPy 1.17. That doesn't leave us much time. I've very interested in your work on uarray, but as far as I can tell, it would not directly interact with NumPy's implementation of __array_function__, so discussing it doesn't feel immediately urgent to me. Rather, it's an alternative and possibly more complete solution for some of the same problems. That's fantastic -- but please, let us finish __array_function__ first.
Hey Stephan, After some discussion with Ralf, I feel that the best way forward would be to add the __numpy_implementation__ (which is my preferred name for it). While I consider the interface final (or at least to the point where we would only add functionality and not remove it), I would prefer to keep the experimental tag, for this very reason: Avoiding having to write a new NEP for adding functionality. However, I’m open to declaring it non-experimental it as well. Best Regards, Hameer Abbasi
On Sunday, Apr 28, 2019 at 5:50 AM, Stephan Hoyer <shoyer@gmail.com (mailto:shoyer@gmail.com)> wrote: On Sat, Apr 27, 2019 at 4:39 PM Hameer Abbasi <einstein.edison@gmail.com (mailto:einstein.edison@gmail.com)> wrote:
On Saturday, Apr 27, 2019 at 6:21 PM, Stephan Hoyer <shoyer@gmail.com (mailto:shoyer@gmail.com)> wrote: Are there aspects of your uarray proposal that are relevant to the current proposed revisions to NEP 18? If so, please restate them :).
Of course, here’s my proposal:
We leave NEP-18 as-is for now, and instead of writing separate protocols for coercion, dtypes and ufuncs (which will be needed somewhere down the line), we have a discussion about uarray and see if it can help there. :-)
I don't want to add separate protocols for coercion, dtypes or ufuncs as part of NEP-18. Whatever form these should take, they should definitely be a separate proposals.
__array_function__ is not the end of the story about duck array support in NumPy, but I think it's valuable incremental step, as evidenced by the projects that are already eager to adopt it. I would really, really like to try to get a usable and near-final version of it released in NumPy 1.17. That doesn't leave us much time.
I've very interested in your work on uarray, but as far as I can tell, it would not directly interact with NumPy's implementation of __array_function__, so discussing it doesn't feel immediately urgent to me. Rather, it's an alternative and possibly more complete solution for some of the same problems. That's fantastic -- but please, let us finish __array_function__ first. _______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@python.org https://mail.python.org/mailman/listinfo/numpy-discussion
Hi All, I agree with Ralf that there are two discussions going on, but also with Hameer that they are related, in that part of the very purpose of __array_function__ was to gain freedom to experiment with implementations. And in particular the freedom to *assume* that inputs are arrays so that we can stop worrying about breaking subclasses and duck arrays: with __array_function__, we can now (well, eventually) tell people to just override the function and make it do what they want. In that respect, if having `__numpy_implementation__` means it would not change the existing situation with backwards compatibility, then that is bad: we do want to change that! The point is to keep the exposed API stable, not the implementation. Of course, the idea proposed here is not to keep the implementation stable, but rather to help people implement __array_function__ who are daunted by the many functions that can be overridden, especially for those classes for which many numpy functions work out of the box: they can have a default of just calling the old implementation. In the end, the proposed goal for __numpy_implementation__ really seems simple: to provide a better name for __wrapped__. But to me the discussion has proven that this is in fact not a good idea. It does suggest stability where there should be none, and the name itself is contentious. Maybe it is best to just stick with __wrapped__? If so, the the only change would be that we mention it in the NEP, making again explicit that the wrapped implementation can and will be changed. All the best, Marten
On Sat, Apr 27, 2019 at 7:05 PM Marten van Kerkwijk < m.h.vankerkwijk@gmail.com> wrote:
Hi All,
I agree with Ralf that there are two discussions going on, but also with Hameer that they are related, in that part of the very purpose of __array_function__ was to gain freedom to experiment with implementations.
This may have been your motivation, but I don't see that in NEP 18 anywhere. On the contrary - the single sentence under "backwards compatibility" in the NEP reads: "This proposal does not change existing semantics, except for those arguments that currently have __array_function__ methods, which should be rare." If I'm missing something that's actually in NEP 18, can you please point out the actual text? By the way, I suspect that Hameer may have meant replacing __array_function__ completely (not sure though). Technically that may even be a good idea, but despite the "experimental" label I'm pretty sure that that ship has sailed given the review process we went through for NEP 18 and it starting to get uptake in other libraries. And in particular the freedom to *assume* that inputs are arrays so that we
can stop worrying about breaking subclasses and duck arrays: with __array_function__, we can now (well, eventually) tell people to just override the function and make it do what they want.
This is not in NEP 18, and against our backwards compatibility policy. In that respect, if having `__numpy_implementation__` means it would not
change the existing situation with backwards compatibility, then that is bad: we do want to change that! The point is to keep the exposed API stable, not the implementation.
API and semantics. The NEP really is very clear on this.
Of course, the idea proposed here is not to keep the implementation stable, but rather to help people implement __array_function__ who are daunted by the many functions that can be overridden, especially for those classes for which many numpy functions work out of the box: they can have a default of just calling the old implementation.
This is not feasible. All the things you talk about are related to NEP 22 (duck typing for NumPy arrays). NEP 18 is about overrides. It's not about skipping parts of the implementation of functions (like asarray). For code like: ``` import numpy as np import mylib x = mylib.customarray([1, 2, 3]) # duck array or subclass np.somefunc(x) ``` you simply cannot bypass asarray in the future if somefunc has asarray now. To get somefunc-without-asarray, you need a new opt-in interface. Telling our whole userbase to use __array_ufunc__ or __array_function__ to keep their code working is just not a good idea.
In the end, the proposed goal for __numpy_implementation__ really seems simple: to provide a better name for __wrapped__. But to me the discussion has proven that this is in fact not a good idea. It does suggest stability where there should be none, and the name itself is contentious. Maybe it is best to just stick with __wrapped__? If so, the the only change would be that we mention it in the NEP,
Sticking with __wrapped__ is fine with me too. It's a reasonable name. Marten, I understand where you're coming from and why you want to bypass array coercion, but this insistence on being able to break the world isn't helpful. I like the goal and would like to see a design for how to achieve it. It just needs that: a new design. It could be more protocols, or be based on uarray (what Hameer proposes, probably the better option because it's a coherent design that can do what you want), or yet something else. It seems like we all have a different mental model of what NEP 18 actually does. I'm going to try to put mine on a few slides with diagrams/examples to see if that helps, since mailing list threads are hard to process. Cheers, Ralf
Hi Ralf, [snip]
If I'm missing something that's actually in NEP 18, can you please point out the actual text?
NEP-22 is the high-level overview of the goals, but NEP-18 is the concrete proposal for __array_function__. Quoting that NEP, right under “Implementation":
The __array_function__protocol, and its use on particular functions, is experimental. We plan to retain an interface that makes it possible to override NumPy functions, but the way to do so for particular functionscan and will changewith little warning. If such reduced backwards compatibility guarantees are not accepted to you, do not rely upon overrides of NumPy functions for non-NumPy arrays. See “Non-goals” below for more details.
[snip] Best Regards, Hameer Abbasi
On Sat, Apr 27, 2019 at 11:44 AM Ralf Gommers <ralf.gommers@gmail.com> wrote:
On Sat, Apr 27, 2019 at 7:05 PM Marten van Kerkwijk < m.h.vankerkwijk@gmail.com> wrote:
Hi All,
I agree with Ralf that there are two discussions going on, but also with Hameer that they are related, in that part of the very purpose of __array_function__ was to gain freedom to experiment with implementations.
This may have been your motivation, but I don't see that in NEP 18 anywhere. On the contrary - the single sentence under "backwards compatibility" in the NEP reads: "This proposal does not change existing semantics, except for those arguments that currently have __array_function__ methods, which should be rare."
I also have a hard time seeing how we could ever make such major backwards incompatible changes to NumPy. I do think it's useful to consider possible future changes to duck-typing support in NumPy, to the extend that they interact with the current design of NEP-18. It seems quite plausible to me that we might want to expose some version of "NumPy's implementation" of its functions. This adds to the possible confusion of a name like "numpy implementation", regardless of whether or not we will break backwards compatibility. Whatever we call it, the proposed attribute is just a way to say "skip dispatch with __array_function__" when calling a NumPy function. This is an API that we are already committed to maintaining, so like the rest of NEP-18 I don't think the functionality itself has any implications for backwards compatibility.
In the end, the proposed goal for __numpy_implementation__ really seems
simple: to provide a better name for __wrapped__. But to me the discussion has proven that this is in fact not a good idea. It does suggest stability where there should be none, and the name itself is contentious. Maybe it is best to just stick with __wrapped__? If so, the the only change would be that we mention it in the NEP,
Sticking with __wrapped__ is fine with me too. It's a reasonable name.
I don't care exactly what name we pick, but as I said before I think "__wrapped__" is a bad name for this functionality, because it is neither self-descriptive nor searchable. For example: what does "np.something.__wrapped__" mean? It tells you that this is a "wrapped" function, but it doesn't say *what* is wrapped. The name has all the same issues as a generic name like "__implementation__". The way that NumPy uses functools.wraps() internally is an implementation detail, not something that users should need to know. Worst, "__wrapped__" would be difficult to search for, because it already means something in Python (referring to functools.wrapped). At least "__numpy_implementation__" and "__skipping_array_function__" are both unique tokens without any existing meaning.
On Sat, Apr 27, 2019 at 7:46 PM Stephan Hoyer <shoyer@gmail.com> wrote:
Worst, "__wrapped__" would be difficult to search for, because it already means something in Python (referring to functools.wrapped). At least "__numpy_implementation__" and "__skipping_array_function__" are both unique tokens without any existing meaning.
It's not just functools.wrapped – there's definitely other code out there that reads/writes __wrapped__ attributes on arbitrary callables and tries to do something clever with it. Debian apparently has 182 packages that contain the token __wrapped__ in their source code [1]. There's a real risk that some of this code will think it knows what to do with numpy's __wrapped__ and be wrong, or that new code that's trying to skip __array_function__ dispatch will accidentally call someone else's __wrapped__ without realizing. -n [1] https://codesearch.debian.net/search?q=__wrapped__&perpkg=1 -- Nathaniel J. Smith -- https://vorpus.org
On Sun, Apr 28, 2019 at 5:02 AM Nathaniel Smith <njs@pobox.com> wrote:
On Sat, Apr 27, 2019 at 7:46 PM Stephan Hoyer <shoyer@gmail.com> wrote:
Worst, "__wrapped__" would be difficult to search for, because it already means something in Python (referring to functools.wrapped). At least "__numpy_implementation__" and "__skipping_array_function__" are both unique tokens without any existing meaning.
It's not just functools.wrapped – there's definitely other code out there that reads/writes __wrapped__ attributes on arbitrary callables and tries to do something clever with it. Debian apparently has 182 packages that contain the token __wrapped__ in their source code [1]. There's a real risk that some of this code will think it knows what to do with numpy's __wrapped__ and be wrong, or that new code that's trying to skip __array_function__ dispatch will accidentally call someone else's __wrapped__ without realizing.
Good point, I didn't think about that. One other thought: the proposal in this thread is about skipping the override mechanism for numpy functions. NEP 18 reserves the freedom to swap out __array_function__ with __array_ufunc__ if we make something a ufunc. So __skipping_array_function__ is too limited a name, __skipping_override__ or similar would be better. And then make __array_ufunc__ respect it as well. Cheers, Ralf
On Sun, Apr 28, 2019, 02:22 Ralf Gommers <ralf.gommers@gmail.com> wrote:
One other thought: the proposal in this thread is about skipping the override mechanism for numpy functions. NEP 18 reserves the freedom to swap out __array_function__ with __array_ufunc__ if we make something a ufunc. So __skipping_array_function__ is too limited a name, __skipping_override__ or similar would be better. And then make __array_ufunc__ respect it as well.
Heh, that's opening a can of worms. Stephen, you're the one who's been working with the folks requesting this skipping feature. Can you comment on how this would fit in to what they're trying to do? Do they need the ability to skip __array_ufunc__ too? Up thread, I argued that if we end up with both __asduckarray__ and __skipping_array_function__, then that means we can't call __array_function__ on the result from __asduckarray__, because that would break the layering that we're trying to use to keep this pile of features factored out enough to understand. But we definitely want to call __array_ufunc__ on the result from __asduckarray__. So if __skipping_array_function__ also applies to __array_ufunc__, then that creates the same problem. One possibility is to leave __array_ufunc__ out of this for now, and if it later becomes an issue then we can debate the semantics of __skipping_array_ufunc__ as a seperate issue. -n
On Sat, Apr 27, 2019 at 8:10 PM Ralf Gommers <ralf.gommers@gmail.com> wrote:
It seems like we all have a different mental model of what NEP 18 actually does. I'm going to try to put mine on a few slides with diagrams/examples to see if that helps, since mailing list threads are hard to process.
Here is my attempt: https://www.slideshare.net/RalfGommers/arrayfunction-conceptual-design-relat... Slides 6-7 of https://www.slideshare.net/RalfGommers/numpy-roadmap-presentation-at-numfocu... are Stephan's figures for the key NEP 18 concept (I just reused them for a presentation last year). Cheers, Ralf
Hi Ralf, Thanks for the comments and summary slides. I think you're over-interpreting my wish to break people's code! I certainly believe - and think we all agree - that we remain as committed as ever to ensure that ``` np.function(inputs) ``` continues to work just as before. My main comment is that I want to ensure that no similar guarantee will exist for ``` np.function.__wrapped__(inputs) ``` (or whatever we call it). I think that is quite consistent with NEP-18, since as originally written there was not even the possibility to access the implementation directly (which was after long discussions about whether to allow it, including ideas like `import numpy.internal_apt as np`). In this respect, the current proposal is a large deviation from the original intent, so we need to be clear about what we are promising. In summary, I think the guarantees should be as follows: 1.If you call np.function and - do not define __array_function__, changes happen only via the usual cycle. - define __array_function__, you take responsibility for returning the result. 2. If you call np.function.__wrapped__ and - input only ndarray, changes happen only via the usual cycle; - input anything but ndarray, changes can happen in any release. On the larger picture,in your slides, the further split that happens is that if no override is present, the first thing that actually gets called is not the function implementation but rather `ndarray.__array_function__`. I think it is important to add this to your mental image (and the slides), since it means that generic parts of the implementations (like coercion ;-), can be moved there. For ufuncs, this is relatively easy, for other functions less so since they differ quite a bit in what coercion they do. All the best, Marten
On Sun, Apr 28, 2019 at 5:41 PM Marten van Kerkwijk < m.h.vankerkwijk@gmail.com> wrote:
Hi Ralf,
Thanks for the comments and summary slides. I think you're over-interpreting my wish to break people's code! I certainly believe - and think we all agree - that we remain as committed as ever to ensure that ``` np.function(inputs) ``` continues to work just as before. My main comment is that I want to ensure that no similar guarantee will exist for ``` np.function.__wrapped__(inputs) ``` (or whatever we call it). I think that is quite consistent with NEP-18, since as originally written there was not even the possibility to access the implementation directly (which was after long discussions about whether to allow it, including ideas like `import numpy.internal_apt as np`). In this respect, the current proposal is a large deviation from the original intent, so we need to be clear about what we are promising.
In summary, I think the guarantees should be as follows: 1.If you call np.function and - do not define __array_function__, changes happen only via the usual cycle. - define __array_function__, you take responsibility for returning the result. 2. If you call np.function.__wrapped__ and - input only ndarray, changes happen only via the usual cycle; - input anything but ndarray, changes can happen in any release.
Thanks. These guarantees make sense I think. __wrapped__ is new, so specifying that it should be called only with ndarrays seems reasonable.
On the larger picture,in your slides, the further split that happens is that if no override is present, the first thing that actually gets called is not the function implementation but rather `ndarray.__array_function__`. I think it is important to add this to your mental image (and the slides),
This is tricky. Yes, that's how it's implemented, but making __array_function__ do anything at all is potentially hazardous. Conceptually I think it's meant to be a do-nothing wrapper that just forwards directly to the actual wrapped function. Although if we could get it consistent, it may work. since it means that generic parts of the implementations (like coercion
;-), can be moved there.
If all wrapped functions have identical generic parts, because there's just one ndarray.__array_function__. Are you sure that all functions have identical generic parts? I'm not - it's a bit of work, but I'd be surprised if there is even one line of code that's common between all functions. And if you have to special case functions within ndarray.__array_function__ to let some have asarray and some asanyarray etc., there's probably also a significant amount of overhead added. There's also the thought of exposing the dispatching mechanism itself in the NEP: https://www.numpy.org/neps/nep-0018-array-function-protocol.html#use-outside.... That may also get more complicated. By the way, can you think of any other generic part besides coercion? There's other things that could be nice to skip (input validation for one) but nothing is generic. Cheers, Ralf For ufuncs, this is relatively easy, for other functions less so since they
differ quite a bit in what coercion they do.
All the best,
Marten _______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@python.org https://mail.python.org/mailman/listinfo/numpy-discussion
Hi Ralf, Agreed that the coercion right now is *not* generic, with some doing `asarray`, others `asanyarray` and yet others nothing. There are multiple possible solutions, with one indeed being that for each function one moves the coercion bits out to an associated intermediate function. In principle, as I mentioned above, one could then think of letting that intermediate function take on a coercion function (i.e., `asarray`, `asanyarray` or even any one's favourite coercion function), which might make it possible to generate them semi-automatically. Anyway, as said, mostly I want to be sure we leave ourselves the freedom to experiment with that as well, and not get bound by `__wrapped__` or `__numpy_implementation__` becoming effectively a second layer of API. But for actual experiments, it may well be better to try `__array_ufunc__` first, as for ufuncs coercion is uniform. All the best, Marten p.s. Good point also about checking of non-array inputs.
On Sun, Apr 28, 2019 at 6:57 PM Marten van Kerkwijk < m.h.vankerkwijk@gmail.com> wrote:
Hi Ralf,
Agreed that the coercion right now is *not* generic, with some doing `asarray`, others `asanyarray` and yet others nothing. There are multiple possible solutions, with one indeed being that for each function one moves the coercion bits out to an associated intermediate function. In principle, as I mentioned above, one could then think of letting that intermediate function take on a coercion function (i.e., `asarray`, `asanyarray` or even any one's favourite coercion function), which might make it possible to generate them semi-automatically.
Anyway, as said, mostly I want to be sure we leave ourselves the freedom to experiment with that as well, and not get bound by `__wrapped__` or `__numpy_implementation__` becoming effectively a second layer of API.
Well, it is becoming a second layer of API:) Just with clearly articulated guarantees. I think we're on the same wavelength now. Thanks, Ralf But for actual experiments, it may well be better to try `__array_ufunc__`
first, as for ufuncs coercion is uniform.
All the best,
Marten
p.s. Good point also about checking of non-array inputs. _______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@python.org https://mail.python.org/mailman/listinfo/numpy-discussion
On Sun, Apr 28, 2019, 08:41 Marten van Kerkwijk <m.h.vankerkwijk@gmail.com> wrote:
Hi Ralf,
Thanks for the comments and summary slides. I think you're over-interpreting my wish to break people's code! I certainly believe - and think we all agree - that we remain as committed as ever to ensure that ``` np.function(inputs) ``` continues to work just as before. My main comment is that I want to ensure that no similar guarantee will exist for ``` np.function.__wrapped__(inputs) ``` (or whatever we call it). I think that is quite consistent with NEP-18, since as originally written there was not even the possibility to access the implementation directly (which was after long discussions about whether to allow it, including ideas like `import numpy.internal_apt as np`). In this respect, the current proposal is a large deviation from the original intent, so we need to be clear about what we are promising.
In summary, I think the guarantees should be as follows: 1.If you call np.function and - do not define __array_function__, changes happen only via the usual cycle. - define __array_function__, you take responsibility for returning the result. 2. If you call np.function.__wrapped__ and - input only ndarray, changes happen only via the usual cycle; - input anything but ndarray, changes can happen in any release.
Let's just say that __skip_array_function__ is provisional, the same as __array_function__ itself.
On the larger picture,in your slides, the further split that happens is that if no override is present, the first thing that actually gets called is not the function implementation but rather `ndarray.__array_function__`.
This is tricky. I've definitely wanted to figure out some way the conceptual model could be simplified by integrating __array_function__ into regular dispatch to reduce special cases. (It's possible I suggested adding ndarray.__array_dispatch__ in the first place?) But, on further consideration, I don't think there's actually any way to pretend that ndarray is just another duck array with an __array_function__ method. The big problem is: np.concatenate([1, 2], [3, 4]) Here none of the arguments have __array_function__ methods. So the implementation *has* to start by doing coercion. The coercion can't happen inside __array_function__, because there is no __array_function__ until after coercion. So ndarray coercion and everything after it has to remain a special case. ndarray.__array_function__ isn't fooling anyone. Also: if we add Stephan's __skipping_array_function__ (or whatever we call it), then that's also incompatible with the idea that ndarray.__array_function__ is where the real work happens. I'm starting to think ndarray.__array_function__ is a mistake – it was supposed to simplify the conceptual model, by letting us handle the fallback logic and the override logic using the same unified framework. But it fails. -n
Hi Nathaniel, I'm a bit confused why` np.concatenate([1, 2], [3, 4])` would be a problem. In the current model, all (numpy) functions fall back to `ndarray.__array_function__`, which does know what to do with anything that doesn't have `__array_function__`: it just coerces it to array. Am I missing something? All the best, Marten
On Sun, Apr 28, 2019 at 1:38 PM Marten van Kerkwijk <m.h.vankerkwijk@gmail.com> wrote:
Hi Nathaniel,
I'm a bit confused why` np.concatenate([1, 2], [3, 4])` would be a problem. In the current model, all (numpy) functions fall back to `ndarray.__array_function__`, which does know what to do with anything that doesn't have `__array_function__`: it just coerces it to array. Am I missing something?
IMO, the reason that having ndarray.__array_function__ was attractive in the first place, was that we were hoping it would let you pretend that there's nothing special about ndarray. Like, when you call np.concatenate, it just looks for __array_function__ methods and dispatches to them; sometimes that means calling thirdpartyobject.__array_function__, and sometimes it means calling ndarray.__array_function__, but as far as np.concatenate is concerned those are interchangeable and treated in the same way. But in fact ndarray.__array_function__ *is* special. I guess you could write down the semantics so that np.concatenate([1, 2], [3, 4]) still calls ndarray.__array_function__, by defining a special dispatch rules just for ndarray.__array_function__. But if ndarray.__array_function__ isn't going to follow the same dispatch rule, then why should it exist and be called "__array_function__"? A special method like "__array_function__" is nothing except a name for a dispatch rule. And if we add __skipping_array_function__, it makes this even worse. In a model where dispatch always goes through *some* object's __array_function__, then __skipping_array_function__ makes no sense -- if you skip __array_function__ then there's nothing left. You might try to save it by saying, ok, we'll only skip third-party __array_function__, but still dispatch to ndarray.__array_function__. But this doesn't work either. np.concatenate.__skipping_array_function__(...) is different from ndarray.__array_function__(np.concatenate, ...), because they treat arguments with __array_function__ methods differently. (The former coerces them to ndarray; the latter returns NotImplemented.) Neither can be implemented in terms of the other (!). ndarray.__array_function__ was a nice idea, but I don't think there's any way to fit into a coherent system. -n -- Nathaniel J. Smith -- https://vorpus.org
Hi Nathaniel, Thanks, I now see your point. I think I can weasel my way partially out: the default *output* from `np.concatenate` is an ndarray, so in that respect it is not that strange that when no input defines __array_function__, one would call `ndarray.__array_function__` (I realize this is sophistry and that it breaks down with all-scalar functions, but still feel it is defensible...). Your point about `__skipping_array_function__` is well taken, though: it is not very logical since suddenly one again ignores items that define __array_function__. Its real purpose is to be a useful crutch if one wants to start to define __array_function__ on one's class. But arguably this is yet more reason to just stick with __wrapped__ - i.e., be explicit that it is an implementation detail. All the best, Marten On Sun, Apr 28, 2019 at 6:50 PM Nathaniel Smith <njs@pobox.com> wrote:
On Sun, Apr 28, 2019 at 1:38 PM Marten van Kerkwijk <m.h.vankerkwijk@gmail.com> wrote:
Hi Nathaniel,
I'm a bit confused why` np.concatenate([1, 2], [3, 4])` would be a
problem. In the current model, all (numpy) functions fall back to `ndarray.__array_function__`, which does know what to do with anything that doesn't have `__array_function__`: it just coerces it to array. Am I missing something?
IMO, the reason that having ndarray.__array_function__ was attractive in the first place, was that we were hoping it would let you pretend that there's nothing special about ndarray. Like, when you call np.concatenate, it just looks for __array_function__ methods and dispatches to them; sometimes that means calling thirdpartyobject.__array_function__, and sometimes it means calling ndarray.__array_function__, but as far as np.concatenate is concerned those are interchangeable and treated in the same way.
But in fact ndarray.__array_function__ *is* special. I guess you could write down the semantics so that np.concatenate([1, 2], [3, 4]) still calls ndarray.__array_function__, by defining a special dispatch rules just for ndarray.__array_function__. But if ndarray.__array_function__ isn't going to follow the same dispatch rule, then why should it exist and be called "__array_function__"? A special method like "__array_function__" is nothing except a name for a dispatch rule.
And if we add __skipping_array_function__, it makes this even worse. In a model where dispatch always goes through *some* object's __array_function__, then __skipping_array_function__ makes no sense -- if you skip __array_function__ then there's nothing left.
You might try to save it by saying, ok, we'll only skip third-party __array_function__, but still dispatch to ndarray.__array_function__. But this doesn't work either. np.concatenate.__skipping_array_function__(...) is different from ndarray.__array_function__(np.concatenate, ...), because they treat arguments with __array_function__ methods differently. (The former coerces them to ndarray; the latter returns NotImplemented.) Neither can be implemented in terms of the other (!).
ndarray.__array_function__ was a nice idea, but I don't think there's any way to fit into a coherent system.
-n
-- Nathaniel J. Smith -- https://vorpus.org _______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@python.org https://mail.python.org/mailman/listinfo/numpy-discussion
On Sun, Apr 28, 2019 at 8:42 AM Marten van Kerkwijk < m.h.vankerkwijk@gmail.com> wrote:
In summary, I think the guarantees should be as follows: 1.If you call np.function and - do not define __array_function__, changes happen only via the usual cycle. - define __array_function__, you take responsibility for returning the result. 2. If you call np.function.__wrapped__ and - input only ndarray, changes happen only via the usual cycle; - input anything but ndarray, changes can happen in any release.
The uses that I've seen so far (in CuPy and JAX), involve a handful of functions that are directly re-exported from NumPy, e.g., jax.numpy.array_repr is the exact same object as numpy.array_repr: https://github.com/cupy/cupy/blob/c3f1be602bf6951b007beaae644a5662f910048b/c... https://github.com/google/jax/blob/5edb23679f2605654949156da84e330205840695/... I suspect this will be less common in the future if __array_function__ takes off, but for now it's convenient because users don't need to know exactly which functions have been reimplemented. They can just use "import jax.numpy as np" and everything works. These libraries are indeed passing CuPy or JAX arrays into NumPy functions, which currently happen to have the desired behavior, thanks to accidental details about how NumPy currently supports duck-typing and/or coercions. To this end, it would be really nice to have an alias that *is* guaranteed to work exactly as if __array_function__ didn't exist, and not only for numpy.ndarray arrays. Ralf raises a good point about the name. We don't need to add this attribute for ufuncs and __array_ufunc__ yet, but (1) we might want this in the future, just for consistency in the design of __array_function__ and __array_ufunc__, and (2) we definitely don't want to rule out converting functions into ufunc. So we might as well pick a name that works for both, e.g., __skip_array_overrides__ rather than __skip_array_function__. This would let us save our users a bit of pain by not requiring them to make changes like np.where.__skip_array_function__ -> np.where.__skip_array_ufunc__.
On Sun, Apr 28, 2019 at 9:20 PM Stephan Hoyer <shoyer@gmail.com> wrote:
On Sun, Apr 28, 2019 at 8:42 AM Marten van Kerkwijk < m.h.vankerkwijk@gmail.com> wrote:
In summary, I think the guarantees should be as follows: 1.If you call np.function and - do not define __array_function__, changes happen only via the usual cycle. - define __array_function__, you take responsibility for returning the result. 2. If you call np.function.__wrapped__ and - input only ndarray, changes happen only via the usual cycle; - input anything but ndarray, changes can happen in any release.
The uses that I've seen so far (in CuPy and JAX), involve a handful of functions that are directly re-exported from NumPy, e.g., jax.numpy.array_repr is the exact same object as numpy.array_repr:
https://github.com/cupy/cupy/blob/c3f1be602bf6951b007beaae644a5662f910048b/c...
https://github.com/google/jax/blob/5edb23679f2605654949156da84e330205840695/...
I suspect this will be less common in the future if __array_function__ takes off, but for now it's convenient because users don't need to know exactly which functions have been reimplemented. They can just use "import jax.numpy as np" and everything works.
These libraries are indeed passing CuPy or JAX arrays into NumPy functions, which currently happen to have the desired behavior, thanks to accidental details about how NumPy currently supports duck-typing and/or coercions.
To this end, it would be really nice to have an alias that *is* guaranteed to work exactly as if __array_function__ didn't exist, and not only for numpy.ndarray arrays.
Just to be clear: for this purpose, being able to call the implementation is still mostly a convenient crutch, correct? For classes that define __array_function__, would you expect more than the guarantee I wrote above, that the wrapped version will continue to work as advertised for ndarray input only? In particular, suppose we change an implementation to use different other numpy functions inside (which are of course overridden using __array_function__). I could imagine situations where that would work fine for everything that does not define __array_ufunc__, but where it would not for classes that do define it. Is that then a problem for numpy or for the project that has a class that defines __array_function__?
Ralf raises a good point about the name. We don't need to add this attribute for ufuncs and __array_ufunc__ yet, but (1) we might want this in the future, just for consistency in the design of __array_function__ and __array_ufunc__, and (2) we definitely don't want to rule out converting functions into ufunc.
So we might as well pick a name that works for both, e.g., __skip_array_overrides__ rather than __skip_array_function__. This would let us save our users a bit of pain by not requiring them to make changes like np.where.__skip_array_function__ -> np.where.__skip_array_ufunc__.
Note that for ufuncs it is not currently possible to skip the override. I don't think it is super-hard to do it, but I'm not sure I see the need to add a crutch where none has been needed so far. More generally, it is not obvious there is any C code where skipping the override is useful, since the C code relies much more directly on inputs being ndarray. All the best, Marten _______________________________________________
NumPy-Discussion mailing list NumPy-Discussion@python.org https://mail.python.org/mailman/listinfo/numpy-discussion
On Mon, Apr 29, 2019 at 5:49 AM Marten van Kerkwijk < m.h.vankerkwijk@gmail.com> wrote:
The uses that I've seen so far (in CuPy and JAX), involve a handful of
functions that are directly re-exported from NumPy, e.g., jax.numpy.array_repr is the exact same object as numpy.array_repr:
https://github.com/cupy/cupy/blob/c3f1be602bf6951b007beaae644a5662f910048b/c...
https://github.com/google/jax/blob/5edb23679f2605654949156da84e330205840695/...
I suspect this will be less common in the future if __array_function__ takes off, but for now it's convenient because users don't need to know exactly which functions have been reimplemented. They can just use "import jax.numpy as np" and everything works.
These libraries are indeed passing CuPy or JAX arrays into NumPy functions, which currently happen to have the desired behavior, thanks to accidental details about how NumPy currently supports duck-typing and/or coercions.
To this end, it would be really nice to have an alias that *is* guaranteed to work exactly as if __array_function__ didn't exist, and not only for numpy.ndarray arrays.
Just to be clear: for this purpose, being able to call the implementation is still mostly a convenient crutch, correct? For classes that define __array_function__, would you expect more than the guarantee I wrote above, that the wrapped version will continue to work as advertised for ndarray input only?
I'm not sure I agree -- what would be the more principled alternative here? Modules that emulate NumPy's public API for a new array type are both pretty common (cupy, jax.numpy, autograd, dask.array, pydata/sparse, etc) and also the best early candidates for adopting NEP-18, because they don't need to do much extra work to write a __array_function__ method. I want to make it as easy as possible for these early adopters, because their success will make or break the entire __array_function__ protocol. In the long term, I agree that the importance of these numpy-like namespaces will diminish, because it will be possible to use the original NumPy namespace instead. Possibly new projects will decide that they don't need to bother with them at all. But there are still lots of plausible reasons for keeping them around even for a project that implements __array_function__, e.g., (a) to avoid the overhead of NumPy's dispatching (b) to access functions like np.ones that return a different array type (c) to make use of optional duck-array specific arguments, e.g., the split_every argument to dask.array.sum() (d) if they care about supporting versions of NumPy older than 1.17 In practice, I suspect we'll see these modules continue to exist for a long time. And they really do rely upon the exact behavior of NumPy today, whatever that happens to be (e.g., the undocumented fact that np.result_type supports duck-typing with the .dtype attribute rather than coercing arguments to NumPy arrays).. In particular, suppose we change an implementation to use different other
numpy functions inside (which are of course overridden using __array_function__). I could imagine situations where that would work fine for everything that does not define __array_ufunc__, but where it would not for classes that do define it. Is that then a problem for numpy or for the project that has a class that defines __array_function__?
If we change an existing NumPy function to start calling ufuncs directly on input arguments, rather than calling np.asarray() on its inputs, that will already (potentially) be a breaking change. We lost the ability to these sorts of refactors without breaking backwards compatibility when we added __array_ufunc__. So I think it's already our problem, unless we're willing to risk breaking __array_ufunc__ users. That said, I doubt this would actually be a major issue in practice. The projects for which __array_function__ makes the most sense are "full duck arrays," and all these projects are going to implement __array_ufunc__, too, in a mostly compatible way. I'm a little puzzled by why you are concerned about retaining this flexibility to reuse the attribute I'm asking for here for a function that works differently. What I want is a special attribute that is guaranteed to work like the public version of a NumPy function, but without checking for an __array_function__ attribute. If we later decide we want to expose an attribute that provides a non-coercing function that calls ufuncs directly instead of np.asarray, what do we lose by giving it a new name so users don't need to worry about changed behavior? There is plenty of room for special attributes on NumPy functions. We can have both np.something.__skip_array_overrides__ and np.something.__array_implementation__. So we might as well pick a name that works for both, e.g.,
__skip_array_overrides__ rather than __skip_array_function__. This would let us save our users a bit of pain by not requiring them to make changes like np.where.__skip_array_function__ -> np.where.__skip_array_ufunc__.
Note that for ufuncs it is not currently possible to skip the override. I don't think it is super-hard to do it, but I'm not sure I see the need to add a crutch where none has been needed so far. More generally, it is not obvious there is any C code where skipping the override is useful, since the C code relies much more directly on inputs being ndarray.
To be entirely clear: I was thinking of ufunc.method.__skip_array_overrides__() as "equivalent to ufunc.method() except not checking for __array_ufunc__ attributes". I think the use-cases would be for Python code that ufuncs, in much the same way that there are use-cases for Python code that call other NumPy functions, e.g., - np.sin.__skip_array_overrides__() could be a slightly faster than np.sin(), because it avoids checking for __array_ufunc__ attributes. - np.add.__skip_array_overrides__(x, y) is definitely going to be a faster than np.add(np.asarray(x), np.asarray(y)), because it avoids the overhead of two Python function calls. The use cases here are certainly not as compelling as those for __array_function__, because __array_ufunc__'s arguments are in a standardized form, but I think there's still meaningful. Not to mention, we can refactor np.ndarray.__array_ufunc__ to work exactly like np.ndarray.__array_function__, eliminating the special case in NEP-13's dispatch rules. I agree that it wouldn't make sense to call the "generic duck-array implementation" of a ufunc (these don't exist), but that wasn't what I was proposing here.
We seem to have run out of steam a bit here. On Tue, Apr 30, 2019 at 7:24 AM Stephan Hoyer <shoyer@gmail.com> wrote:
On Mon, Apr 29, 2019 at 5:49 AM Marten van Kerkwijk < m.h.vankerkwijk@gmail.com> wrote:
The uses that I've seen so far (in CuPy and JAX), involve a handful of
functions that are directly re-exported from NumPy, e.g., jax.numpy.array_repr is the exact same object as numpy.array_repr:
https://github.com/cupy/cupy/blob/c3f1be602bf6951b007beaae644a5662f910048b/c...
https://github.com/google/jax/blob/5edb23679f2605654949156da84e330205840695/...
I suspect this will be less common in the future if __array_function__ takes off, but for now it's convenient because users don't need to know exactly which functions have been reimplemented. They can just use "import jax.numpy as np" and everything works.
These libraries are indeed passing CuPy or JAX arrays into NumPy functions, which currently happen to have the desired behavior, thanks to accidental details about how NumPy currently supports duck-typing and/or coercions.
To this end, it would be really nice to have an alias that *is* guaranteed to work exactly as if __array_function__ didn't exist, and not only for numpy.ndarray arrays.
Just to be clear: for this purpose, being able to call the implementation is still mostly a convenient crutch, correct? For classes that define __array_function__, would you expect more than the guarantee I wrote above, that the wrapped version will continue to work as advertised for ndarray input only?
I'm not sure I agree -- what would be the more principled alternative here?
Modules that emulate NumPy's public API for a new array type are both pretty common (cupy, jax.numpy, autograd, dask.array, pydata/sparse, etc) and also the best early candidates for adopting NEP-18, because they don't need to do much extra work to write a __array_function__ method. I want to make it as easy as possible for these early adopters, because their success will make or break the entire __array_function__ protocol.
In the long term, I agree that the importance of these numpy-like namespaces will diminish, because it will be possible to use the original NumPy namespace instead. Possibly new projects will decide that they don't need to bother with them at all. But there are still lots of plausible reasons for keeping them around even for a project that implements __array_function__, e.g., (a) to avoid the overhead of NumPy's dispatching (b) to access functions like np.ones that return a different array type (c) to make use of optional duck-array specific arguments, e.g., the split_every argument to dask.array.sum() (d) if they care about supporting versions of NumPy older than 1.17
In practice, I suspect we'll see these modules continue to exist for a long time. And they really do rely upon the exact behavior of NumPy today, whatever that happens to be (e.g., the undocumented fact that np.result_type supports duck-typing with the .dtype attribute rather than coercing arguments to NumPy arrays)..
In particular, suppose we change an implementation to use different other
numpy functions inside (which are of course overridden using __array_function__). I could imagine situations where that would work fine for everything that does not define __array_ufunc__, but where it would not for classes that do define it. Is that then a problem for numpy or for the project that has a class that defines __array_function__?
If we change an existing NumPy function to start calling ufuncs directly on input arguments, rather than calling np.asarray() on its inputs,
This wasn't really the question I believe. More like, if numpy function A now calls B under the hood, and we replace it with C (in a way that's fully backwards compatible for users of A), then will that be a problem in the future? I think that in practice this doesn't happen a lot, and is quite unlikely to be a problem. that will already (potentially) be a breaking change. We lost the ability
to these sorts of refactors without breaking backwards compatibility when we added __array_ufunc__. So I think it's already our problem, unless we're willing to risk breaking __array_ufunc__ users.
That said, I doubt this would actually be a major issue in practice. The projects for which __array_function__ makes the most sense are "full duck arrays," and all these projects are going to implement __array_ufunc__, too, in a mostly compatible way.
I'm a little puzzled by why you are concerned about retaining this flexibility to reuse the attribute I'm asking for here for a function that works differently. What I want is a special attribute that is guaranteed to work like the public version of a NumPy function, but without checking for an __array_function__ attribute.
If we later decide we want to expose an attribute that provides a non-coercing function that calls ufuncs directly instead of np.asarray, what do we lose by giving it a new name so users don't need to worry about changed behavior? There is plenty of room for special attributes on NumPy functions. We can have both np.something.__skip_array_overrides__ and np.something.__array_implementation__.
That's a good argument I think. Ralf
So we might as well pick a name that works for both, e.g.,
__skip_array_overrides__ rather than __skip_array_function__. This would let us save our users a bit of pain by not requiring them to make changes like np.where.__skip_array_function__ -> np.where.__skip_array_ufunc__.
Note that for ufuncs it is not currently possible to skip the override. I don't think it is super-hard to do it, but I'm not sure I see the need to add a crutch where none has been needed so far. More generally, it is not obvious there is any C code where skipping the override is useful, since the C code relies much more directly on inputs being ndarray.
To be entirely clear: I was thinking of ufunc.method.__skip_array_overrides__() as "equivalent to ufunc.method() except not checking for __array_ufunc__ attributes".
I think the use-cases would be for Python code that ufuncs, in much the same way that there are use-cases for Python code that call other NumPy functions, e.g., - np.sin.__skip_array_overrides__() could be a slightly faster than np.sin(), because it avoids checking for __array_ufunc__ attributes. - np.add.__skip_array_overrides__(x, y) is definitely going to be a faster than np.add(np.asarray(x), np.asarray(y)), because it avoids the overhead of two Python function calls.
The use cases here are certainly not as compelling as those for __array_function__, because __array_ufunc__'s arguments are in a standardized form, but I think there's still meaningful. Not to mention, we can refactor np.ndarray.__array_ufunc__ to work exactly like np.ndarray.__array_function__, eliminating the special case in NEP-13's dispatch rules.
I agree that it wouldn't make sense to call the "generic duck-array implementation" of a ufunc (these don't exist), but that wasn't what I was proposing here. _______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@python.org https://mail.python.org/mailman/listinfo/numpy-discussion
On Sat, May 4, 2019 at 12:29 PM Ralf Gommers <ralf.gommers@gmail.com> wrote:
We seem to have run out of steam a bit here.
We discussed this today in person at the NumPy sprint. The consensus was to go for a name like __skip_array_function__. Ufuncs don't have very good use-cases for a function that skips dispatch: 1. The overhead of the ufunc dispatch machinery is much smaller, especially in the case where all arguments are NumPy arrays, because there is no need for a wrapper function in Python. 2. Inside __array_ufunc__ it's possible to cast arguments into NumPy arrays explicitly and then call the ufunc again. There's no need to explicitly skip overrides. We also don't really care about supporting the use-case where a function gets changed into a ufunc. We already warn users not to call __skip_array_function__ directly (without using getattr) outside __array_function__. Given all this, it seems best to stick with a name that mirrors __array_function__ as closely as possible. I picked "skip" instead of "skpping" just because it's slightly shorter, but otherwise don't have a strong preference. I've edited the NEP [1] and implementation [2] pull requests to use this new name, and clarify the use-cases. If there no serious objections, I'd love to merge these soon, in time for the NumPy 1.17 release candidate. [1] https://github.com/numpy/numpy/pull/13305 [2] https://github.com/numpy/numpy/pull/13389
On Sat, May 11, 2019 at 4:04 AM Stephan Hoyer <shoyer@gmail.com> wrote:
On Sat, May 4, 2019 at 12:29 PM Ralf Gommers <ralf.gommers@gmail.com> wrote:
We seem to have run out of steam a bit here.
We discussed this today in person at the NumPy sprint.
The consensus was to go for a name like __skip_array_function__. Ufuncs don't have very good use-cases for a function that skips dispatch: 1. The overhead of the ufunc dispatch machinery is much smaller, especially in the case where all arguments are NumPy arrays, because there is no need for a wrapper function in Python. 2. Inside __array_ufunc__ it's possible to cast arguments into NumPy arrays explicitly and then call the ufunc again. There's no need to explicitly skip overrides.
We also don't really care about supporting the use-case where a function gets changed into a ufunc. We already warn users not to call __skip_array_function__ directly (without using getattr) outside __array_function__.
Given all this, it seems best to stick with a name that mirrors __array_function__ as closely as possible. I picked "skip" instead of "skpping" just because it's slightly shorter, but otherwise don't have a strong preference.
I've edited the NEP [1] and implementation [2] pull requests to use this new name, and clarify the use-cases. If there no serious objections, I'd love to merge these soon, in time for the NumPy 1.17 release candidate.
[1] https://github.com/numpy/numpy/pull/13305 [2] https://github.com/numpy/numpy/pull/13389
Thanks for the update Stephan, that all sounds good to me. Looks like it was a productive sprint! Cheers, Ralf
Hi Ralf, Would it be much hassle for you to duplicate your slides somewhere else, too? Cheers, Evgeni вс, 28 апр. 2019 г., 15:38 Ralf Gommers <ralf.gommers@gmail.com>:
On Sat, Apr 27, 2019 at 8:10 PM Ralf Gommers <ralf.gommers@gmail.com> wrote:
It seems like we all have a different mental model of what NEP 18 actually does. I'm going to try to put mine on a few slides with diagrams/examples to see if that helps, since mailing list threads are hard to process.
Here is my attempt: https://www.slideshare.net/RalfGommers/arrayfunction-conceptual-design-relat...
Slides 6-7 of https://www.slideshare.net/RalfGommers/numpy-roadmap-presentation-at-numfocu... are Stephan's figures for the key NEP 18 concept (I just reused them for a presentation last year).
Cheers, Ralf
_______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@python.org https://mail.python.org/mailman/listinfo/numpy-discussion
On Sun, Apr 28, 2019 at 7:43 PM Evgeni Burovski <evgeny.burovskiy@gmail.com> wrote:
Hi Ralf,
Would it be much hassle for you to duplicate your slides somewhere else, too?
Oh fun, SlideShare is blocked in Russia I see. So is https://speakerdeck.com/. I just sent you the slides, will think about a more structural solution later. Cheers, Ralf
вс, 28 апр. 2019 г., 15:38 Ralf Gommers <ralf.gommers@gmail.com>:
On Sat, Apr 27, 2019 at 8:10 PM Ralf Gommers <ralf.gommers@gmail.com> wrote:
It seems like we all have a different mental model of what NEP 18 actually does. I'm going to try to put mine on a few slides with diagrams/examples to see if that helps, since mailing list threads are hard to process.
Here is my attempt: https://www.slideshare.net/RalfGommers/arrayfunction-conceptual-design-relat...
Slides 6-7 of https://www.slideshare.net/RalfGommers/numpy-roadmap-presentation-at-numfocu... are Stephan's figures for the key NEP 18 concept (I just reused them for a presentation last year).
Cheers, Ralf
On Fri, Apr 26, 2019 at 1:24 AM Ralf Gommers <ralf.gommers@gmail.com> wrote:
Thanks, this helped clarify what's going on here. This example is clear. The problem seems to be that there's two separate discussions in this thread: 1. your original proposal, __numpy_implementation__. it does not have the problem of your np.concatenate example, as the "numpy implementation" is exactly the same as it is today. 2. splitting up the current numpy implementation into *multiple* entry points. this can be with and without coercion, with and without checking for invalid values etc.
So far NEP 18 does (1). Your proposed __numpy_implementation__ addition to NEP 18 is still (1). Claiming that this affects the situation with respect to backwards compatibility is incorrect.
(2) is actually a much more invasive change, and one that does much more to increase the size of the NumPy API surface. And yes, affects our backwards compatibility situation as well.
Also note that these have very different purposes: (1) was to (quoting from the NEP) "allow using NumPy as a high level API for efficient multi-dimensional array operations, even with array implementations that differ greatly from numpy.ndarray." (2) is for making duck arrays work with numpy implementations of functions (not just with the NumPy API)
I think (1) is mostly achieved, and I'm +1 on your NEP addition for that. (2) is quickly becoming a mess, and I agree with Nathaniel's sentiment above "I shouldn't expect __array_function__ to be useful for duck arrays?". For (2) we really need to go back and have a well thought out design. Hameer's mention of uarray could be that. Growing more __array_*__ protocols in a band-aid fashion seems unlikely to get us there.
Yes, very well put. I agree, let's try to keep focused on (1) for now. (2) got brought up because of potential confusing about what "__numpy_implementation__" means, but certainly we don't want to figure out those issues now. To that end, I'd love to hear more suggestions for naming what I tentatively called "__numpy_implementation__". I suppose we could always go for "__implementation_used_by_numpy_ndarray_array_function__" ;)
On Thu, Apr 25, 2019 at 6:04 PM Stephan Hoyer <shoyer@gmail.com> wrote:
On Thu, Apr 25, 2019 at 12:46 PM Marten van Kerkwijk < m.h.vankerkwijk@gmail.com> wrote:
<snip>
It would be nice, though, if we could end up with also option 4 being
available, if only because code that just can assume ndarray will be easiest to read.
This could perhaps just be coercion_function=None? Or maybe we want to keep around coercion_function=None for "do whatever ad-hoc coercion NumPy current does"?
I think `None` better mean no coercion... But the default doesn't have
to be `None`, of course. -- Marten
On Thu, Apr 25, 2019 at 10:10 AM Stephan Hoyer <shoyer@gmail.com> wrote:
On Wed, Apr 24, 2019 at 9:56 PM Nathaniel Smith <njs@pobox.com> wrote:
When you say "numpy array specific" and "__numpy_(nd)array_implementation__", that sounds to me like you're trying to say "just step 3, skipping steps 1 and 2"? Step 3 is the one that operates on ndarrays...
My thinking was that if we implement NumPy functions with duck typing (e.g., `np.stack()` in terms of `.shape` + `np.concatenate()`), then step (3) could in some sense be the generic "array implementation", not only for NumPy arrays.
Okay right, so roughly speaking there are two different types of functions that support __array_function__: * "Core" numpy functions that typically do implicit coercion and then iterate over raw memory * "Derived" functions, the kind of thing that could just as well be implemented in another library or end-user code, and often are... but since these ones happen to be in the numpy package namespace, they support __array_function__. There are probably some weird cases that don't fall neatly into either category, but I think the distinction is at least useful for organizing our thoughts.
When we have some kind of __asduckarray__ coercion, then that will complicate things too, because presumably we'll do something like
1. __array_function__ dispatch 2. __asduckarray__ coercion 3. __array_function__ dispatch again 4. ndarray coercion 5. [either "the implementation", or __array_function__ dispatch again, depending on how you want to think about it]
I was thinking of something a little simpler: do __asduckarray__ rather than numpy.ndarray coercion inside the implementation of NumPy functions. Then making use of NumPy's implementations would be a matter of calling the NumPy implementation without ndarray coercion from side __array_function__.
e.g.,
class MyArray: def __duck_array__(self): return self def __array_function__(self, func, types, args, kwargs): ... if func in {np.stack, np.atleast_1d, ...}: # use NumPy's "duck typing" implementations for these functions return func.__duck_array_implementation__(*args, **kwargs) elif func == np.concatenate: # write my own version of np.concatenate ...
This would let you make use of duck typing in a controlled way if you use __array_function__. np.stack.__duck_array_implementation__ would look exactly like np.stack, except np.asanyarray() would be replaced by np.asduckarray().
The reason why we need the separate __duck_array_implementation__ and __numpy_array_implementation__/__skipping_array_function__ is because there are also use cases where you *don't* want to worry about how np.stack is implemented under the hood (i.e., in terms of np.concatenate), and want to go straight to the coercive numpy.ndarray implementation. This lets you avoid both the complexity and overhead associated with further dispatch checks.
I don't think we want repeated dispatching with __array_function__. That seems like a recipe for slow performance and confusion.
I don't understand this part, but it makes me worry that instead of designing something that fits together based on some underlying logical framework, you're hoping to just keep throwing more and more hooks at things and hoping that if 3rd party libraries have enough hooks they'll be able to somehow monkeypatch things into working most of the time if you don't look too hard :-/. I hope that's wrong. Stepping back a bit: My objection to the phrase "numpy implemention" has been that "implementation" is one of those words like "low level", whose meaning completely changes depending on which part of the system you happen to be thinking about when you say it. I think I see what you're getting at now, though; you've been working on adding __array_function__ dispatch, and from the perspective of a wrapper function implementing __array_function__ dispatch, there's a clear distinction between the caller, the dispatch, and then the fallback "implementation" that it delegates to if no __array_function__ methods were found. The wrapper treats the fallback function like a black box. That's an internally consistent approach, and if you want __array_function__ to work on "derived" functions like np.stack... well, they're just arbitrary Python functions, so you *have* to treat the fallback like a black box, and __array_function__ dispatch as a cleanly decoupled step. And if that's the model for __array_function__, then it makes perfect sense to talk about skipping the __array_function__ dispatch step. I think the word "implementation" is too vague, but the idea makes sense. The thing I didn't realize until these last few posts, though, is that if this is the model for __array_function__, then it means you *have* to treat the fallback as a black box. Which means that __array_function__ cannot be integrated into numpy's coercion rules, which are inside the black box. And duck arrays need to be integrated into numpy's coercion rules, because you have to be able to coerce to a duck array before calling whatever special methods it has. So therefore... duck arrays cannot use __array_function__? That seems like an unfortunate conclusion but I don't see any way around it. Like, for a concrete example: if obj1 has an __asduckarray__ method, and that returns obj2 with __array_ufunc__, then I would absolutely expect np.sin(obj1) to end up calling obj2.__array_ufunc__. But if __array_function__ is a decoupled step applicable to arbitrary functions, then np.sin(obj1) can't call obj2.__array_function__. Alternatively, we could make __array_function__ part of numpy's standard coercion/dispatch sequence, but then it doesn't make much sense for np.stack to do __array_function__ dispatch. I guess this is just another manifestion trade-off we accepted when we decided to implement __array_function__, instead of more finer-grained, semantically-integrated hooks like __array_concatenate__, and I shouldn't expect __array_function__ to be useful for duck arrays? I don't have a conclusion but I'd like to know what you think about the above :-). -n -- Nathaniel J. Smith -- https://vorpus.org
participants (8)
-
Evgeni Burovski
-
Hameer Abbasi
-
Marten van Kerkwijk
-
Matti Picus
-
Nathaniel Smith
-
Ralf Gommers
-
Stefan van der Walt
-
Stephan Hoyer