Experimental `like=` attribute for array creation functions
Hi all, as a heads up that Peter Entschev has a PR open to add `like=` to most array creation functions, my current plan is to merge it soon as a preliminary API and bring it up again before the actual release (in a few months). This allows overriding for array-likes, e.g. it will allow: arr = np.asarray([3], like=dask_array) type(arr) is dask.array.Array This was proposed in NEP 35: https://numpy.org/neps/nep-0035-array-creation-dispatch-with-array-function.... Although that has not been accepted as of now, the PR is: https://github.com/numpy/numpy/pull/16935 This was discussed in a smaller group, and is an attempt to see how we can make the array-function protocol viable to allow packages such as sklearn to work with non-NumPy arrays. As of now, this would be experimental and can revisit it before the actual NumPy release. We should probably discuss accepting NEP 35 more. At this time, I hope that we can put in the functionality to facilitate this discussion, rather the other way around. If anyone feels nervous about this step, I would be happy to document that we will not include it in the next release unless the NEP is accepted first, or at least hide it behind an environment variable. Cheers, Sebastian
Hi, We should have a higher-bandwidth meeting/communication for all stakeholders, and particularly some library authors, to see what would be good for them. We should definitely have language in the NEP that says it won’t be in a release unless the NEP is accepted. Best regards, Hameer Abbasi -- Sent from Canary (https://canarymail.io)
On Monday, Aug 10, 2020 at 5:31 PM, Sebastian Berg <sebastian@sipsolutions.net (mailto:sebastian@sipsolutions.net)> wrote: Hi all,
as a heads up that Peter Entschev has a PR open to add `like=` to most array creation functions, my current plan is to merge it soon as a preliminary API and bring it up again before the actual release (in a few months). This allows overriding for array-likes, e.g. it will allow:
arr = np.asarray([3], like=dask_array) type(arr) is dask.array.Array
This was proposed in NEP 35:
https://numpy.org/neps/nep-0035-array-creation-dispatch-with-array-function....
Although that has not been accepted as of now, the PR is:
https://github.com/numpy/numpy/pull/16935
This was discussed in a smaller group, and is an attempt to see how we can make the array-function protocol viable to allow packages such as sklearn to work with non-NumPy arrays.
As of now, this would be experimental and can revisit it before the actual NumPy release. We should probably discuss accepting NEP 35 more. At this time, I hope that we can put in the functionality to facilitate this discussion, rather the other way around.
If anyone feels nervous about this step, I would be happy to document that we will not include it in the next release unless the NEP is accepted first, or at least hide it behind an environment variable.
Cheers,
Sebastian
_______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@python.org https://mail.python.org/mailman/listinfo/numpy-discussion
On Mon, 2020-08-10 at 17:35 +0200, Hameer Abbasi wrote:
Hi,
We should have a higher-bandwidth meeting/communication for all stakeholders, and particularly some library authors, to see what would be good for them.
We should definitely have language in the NEP that says it won’t be in a release unless the NEP is accepted.
In that case, I think the important part is to have language right now in the implementation, although that can refer to the NEP itself of course. You can't expect everyone who may be tempted to use it to actually read the NEP draft, at least not without pointing it out. I will say that I think it is not very high risk, because I think annoying or not, the argument could be deprecated again with a transition short phase. Admittedly, that argument only works if we have a replacement solution. Cheers, Sebastian
Best regards, Hameer Abbasi
-- Sent from Canary (https://canarymail.io)
On Monday, Aug 10, 2020 at 5:31 PM, Sebastian Berg < sebastian@sipsolutions.net (mailto:sebastian@sipsolutions.net)> wrote: Hi all,
as a heads up that Peter Entschev has a PR open to add `like=` to most array creation functions, my current plan is to merge it soon as a preliminary API and bring it up again before the actual release (in a few months). This allows overriding for array-likes, e.g. it will allow:
arr = np.asarray([3], like=dask_array) type(arr) is dask.array.Array
This was proposed in NEP 35:
https://numpy.org/neps/nep-0035-array-creation-dispatch-with-array-function....
Although that has not been accepted as of now, the PR is:
https://github.com/numpy/numpy/pull/16935
This was discussed in a smaller group, and is an attempt to see how we can make the array-function protocol viable to allow packages such as sklearn to work with non-NumPy arrays.
As of now, this would be experimental and can revisit it before the actual NumPy release. We should probably discuss accepting NEP 35 more. At this time, I hope that we can put in the functionality to facilitate this discussion, rather the other way around.
If anyone feels nervous about this step, I would be happy to document that we will not include it in the next release unless the NEP is accepted first, or at least hide it behind an environment variable.
Cheers,
Sebastian
_______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@python.org https://mail.python.org/mailman/listinfo/numpy-discussion
_______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@python.org https://mail.python.org/mailman/listinfo/numpy-discussion
On Mon, Aug 10, 2020 at 8:37 PM Sebastian Berg <sebastian@sipsolutions.net> wrote:
On Mon, 2020-08-10 at 17:35 +0200, Hameer Abbasi wrote:
Hi,
We should have a higher-bandwidth meeting/communication for all stakeholders, and particularly some library authors, to see what would be good for them.
I'm not sure that helps. At this point there's little progress since the last meeting, I think the plan is unchanged: we need implementations of all the options on offer, and then try them out in PRs for scikit-learn, SciPy and perhaps another package who's maintainers are interested, to test like=, __array_module__ in realistic situations.
We should definitely have language in the NEP that says it won’t be in a release unless the NEP is accepted.
In that case, I think the important part is to have language right now in the implementation, although that can refer to the NEP itself of course. You can't expect everyone who may be tempted to use it to actually read the NEP draft, at least not without pointing it out.
Agreed, I think the decision is on this list not in the NEP, and to make sure we won't forget we need an issue opened with the 1.20 milestone. Cheers, Ralf
I will say that I think it is not very high risk, because I think annoying or not, the argument could be deprecated again with a transition short phase. Admittedly, that argument only works if we have a replacement solution.
Cheers,
Sebastian
Best regards, Hameer Abbasi
-- Sent from Canary (https://canarymail.io)
On Monday, Aug 10, 2020 at 5:31 PM, Sebastian Berg < sebastian@sipsolutions.net (mailto:sebastian@sipsolutions.net)> wrote: Hi all,
as a heads up that Peter Entschev has a PR open to add `like=` to most array creation functions, my current plan is to merge it soon as a preliminary API and bring it up again before the actual release (in a few months). This allows overriding for array-likes, e.g. it will allow:
arr = np.asarray([3], like=dask_array) type(arr) is dask.array.Array
This was proposed in NEP 35:
https://numpy.org/neps/nep-0035-array-creation-dispatch-with-array-function....
Although that has not been accepted as of now, the PR is:
https://github.com/numpy/numpy/pull/16935
This was discussed in a smaller group, and is an attempt to see how we can make the array-function protocol viable to allow packages such as sklearn to work with non-NumPy arrays.
As of now, this would be experimental and can revisit it before the actual NumPy release. We should probably discuss accepting NEP 35 more. At this time, I hope that we can put in the functionality to facilitate this discussion, rather the other way around.
If anyone feels nervous about this step, I would be happy to document that we will not include it in the next release unless the NEP is accepted first, or at least hide it behind an environment variable.
Cheers,
Sebastian
_______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@python.org https://mail.python.org/mailman/listinfo/numpy-discussion
_______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@python.org https://mail.python.org/mailman/listinfo/numpy-discussion
_______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@python.org https://mail.python.org/mailman/listinfo/numpy-discussion
For what is worth, as a potential consumer in SciPy, it really doesn't say anything (both in NEP and the PR) about how the regular users of NumPy will benefit from this. If only and only 3rd parties are going to benefit from it, I am not sure adding a new keyword to an already confusing function is the right thing to do. Let me clarify, - This is already a very (I mean extremely very) easy keyword name to confuse with ones_like, zeros_like and by its nature any other interpretation. It is not signalling anything about the functionality that is being discussed. I would seriously consider reserving such obvious names for really obvious tasks. Because you would also expect the shape and ndim would be mimicked by the "like"d argument but it turns out it is acting more like "typeof=" and not "like=" at all. Because if we follow the semantics it reads as "make your argument asarray like the other thing" but it is actually doing, "make your argument an array with the other thing's type" which might not be an array after all. - Again, if this is meant for downstream libraries (because that's what I got out of the PR discussion, cupy, dask, and JAX were the only examples I could read) then hiding it in another function and writing with capital letters "this is not meant for numpy users" would be a much more convenient way to separate the target audience and regular users. numpy.astypedarray([[some data], [...]], type_of=x) or whatever else it may be would be quite clean and to the point with no ambiguous keywords. I think, arriving to an agreement would be much faster if there is an executive summary of who this is intended for and what the regular usage is. Because with no offense, all I see is "dispatch", "_array_function_" and a lot of technical details of which I am absolutely ignorant. Finally as a minor point, I know we are mostly (ex-)academics but this necessity of formal language on NEPs is self-imposed (probably PEPs are to blame) and not quite helping. It can be a bit more descriptive in my external opinion. best, ilhan On Tue, Aug 11, 2020 at 12:18 AM Ralf Gommers <ralf.gommers@gmail.com> wrote:
On Mon, Aug 10, 2020 at 8:37 PM Sebastian Berg <sebastian@sipsolutions.net> wrote:
On Mon, 2020-08-10 at 17:35 +0200, Hameer Abbasi wrote:
Hi,
We should have a higher-bandwidth meeting/communication for all stakeholders, and particularly some library authors, to see what would be good for them.
I'm not sure that helps. At this point there's little progress since the last meeting, I think the plan is unchanged: we need implementations of all the options on offer, and then try them out in PRs for scikit-learn, SciPy and perhaps another package who's maintainers are interested, to test like=, __array_module__ in realistic situations.
We should definitely have language in the NEP that says it won’t be in a release unless the NEP is accepted.
In that case, I think the important part is to have language right now in the implementation, although that can refer to the NEP itself of course. You can't expect everyone who may be tempted to use it to actually read the NEP draft, at least not without pointing it out.
Agreed, I think the decision is on this list not in the NEP, and to make sure we won't forget we need an issue opened with the 1.20 milestone.
Cheers, Ralf
I will say that I think it is not very high risk, because I think annoying or not, the argument could be deprecated again with a transition short phase. Admittedly, that argument only works if we have a replacement solution.
Cheers,
Sebastian
Best regards, Hameer Abbasi
-- Sent from Canary (https://canarymail.io)
On Monday, Aug 10, 2020 at 5:31 PM, Sebastian Berg < sebastian@sipsolutions.net (mailto:sebastian@sipsolutions.net)> wrote: Hi all,
as a heads up that Peter Entschev has a PR open to add `like=` to most array creation functions, my current plan is to merge it soon as a preliminary API and bring it up again before the actual release (in a few months). This allows overriding for array-likes, e.g. it will allow:
arr = np.asarray([3], like=dask_array) type(arr) is dask.array.Array
This was proposed in NEP 35:
https://numpy.org/neps/nep-0035-array-creation-dispatch-with-array-function....
Although that has not been accepted as of now, the PR is:
https://github.com/numpy/numpy/pull/16935
This was discussed in a smaller group, and is an attempt to see how we can make the array-function protocol viable to allow packages such as sklearn to work with non-NumPy arrays.
As of now, this would be experimental and can revisit it before the actual NumPy release. We should probably discuss accepting NEP 35 more. At this time, I hope that we can put in the functionality to facilitate this discussion, rather the other way around.
If anyone feels nervous about this step, I would be happy to document that we will not include it in the next release unless the NEP is accepted first, or at least hide it behind an environment variable.
Cheers,
Sebastian
_______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@python.org https://mail.python.org/mailman/listinfo/numpy-discussion
_______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@python.org https://mail.python.org/mailman/listinfo/numpy-discussion
_______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@python.org https://mail.python.org/mailman/listinfo/numpy-discussion
_______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@python.org https://mail.python.org/mailman/listinfo/numpy-discussion
I’ve generally been on the “let the NumPy devs worry about it” side of things, but I do agree with Ilhan that `like=` is confusing and `typeof=` would be a much more appropriate name for that parameter. I do think library writers are NumPy users and so I wouldn’t really make that distinction, though. Users writing their own analysis code could very well be interested in writing code using numpy functions that will transparently work when the input is a CuPy array or whatever. I also share Ilhan’s concern (and I mentioned this in a previous NEP discussion) that NEPs are getting pretty inaccessible. In a sense these are difficult topics and readers should be expected to have *some* familiarity with the topics being discussed, but perhaps more effort should be put into the context/motivation/background of a NEP before accepting it. One way to ensure this might be to require a final proofreading step by someone who has not been involved at all in the discussions, like peer review does for papers. Food for thought. Juan.
On 13 Aug 2020, at 9:24 am, Ilhan Polat <ilhanpolat@gmail.com> wrote:
For what is worth, as a potential consumer in SciPy, it really doesn't say anything (both in NEP and the PR) about how the regular users of NumPy will benefit from this. If only and only 3rd parties are going to benefit from it, I am not sure adding a new keyword to an already confusing function is the right thing to do.
Let me clarify,
- This is already a very (I mean extremely very) easy keyword name to confuse with ones_like, zeros_like and by its nature any other interpretation. It is not signalling anything about the functionality that is being discussed. I would seriously consider reserving such obvious names for really obvious tasks. Because you would also expect the shape and ndim would be mimicked by the "like"d argument but it turns out it is acting more like "typeof=" and not "like=" at all. Because if we follow the semantics it reads as "make your argument asarray like the other thing" but it is actually doing, "make your argument an array with the other thing's type" which might not be an array after all.
- Again, if this is meant for downstream libraries (because that's what I got out of the PR discussion, cupy, dask, and JAX were the only examples I could read) then hiding it in another function and writing with capital letters "this is not meant for numpy users" would be a much more convenient way to separate the target audience and regular users. numpy.astypedarray([[some data], [...]], type_of=x) or whatever else it may be would be quite clean and to the point with no ambiguous keywords.
I think, arriving to an agreement would be much faster if there is an executive summary of who this is intended for and what the regular usage is. Because with no offense, all I see is "dispatch", "_array_function_" and a lot of technical details of which I am absolutely ignorant.
Finally as a minor point, I know we are mostly (ex-)academics but this necessity of formal language on NEPs is self-imposed (probably PEPs are to blame) and not quite helping. It can be a bit more descriptive in my external opinion.
best, ilhan
On Tue, Aug 11, 2020 at 12:18 AM Ralf Gommers <ralf.gommers@gmail.com <mailto:ralf.gommers@gmail.com>> wrote:
On Mon, Aug 10, 2020 at 8:37 PM Sebastian Berg <sebastian@sipsolutions.net <mailto:sebastian@sipsolutions.net>> wrote: On Mon, 2020-08-10 at 17:35 +0200, Hameer Abbasi wrote:
Hi,
We should have a higher-bandwidth meeting/communication for all stakeholders, and particularly some library authors, to see what would be good for them.
I'm not sure that helps. At this point there's little progress since the last meeting, I think the plan is unchanged: we need implementations of all the options on offer, and then try them out in PRs for scikit-learn, SciPy and perhaps another package who's maintainers are interested, to test like=, __array_module__ in realistic situations.
We should definitely have language in the NEP that says it won’t be in a release unless the NEP is accepted.
In that case, I think the important part is to have language right now in the implementation, although that can refer to the NEP itself of course. You can't expect everyone who may be tempted to use it to actually read the NEP draft, at least not without pointing it out.
Agreed, I think the decision is on this list not in the NEP, and to make sure we won't forget we need an issue opened with the 1.20 milestone.
Cheers, Ralf
I will say that I think it is not very high risk, because I think annoying or not, the argument could be deprecated again with a transition short phase. Admittedly, that argument only works if we have a replacement solution.
Cheers,
Sebastian
Best regards, Hameer Abbasi
-- Sent from Canary (https://canarymail.io <https://canarymail.io/>)
On Monday, Aug 10, 2020 at 5:31 PM, Sebastian Berg < sebastian@sipsolutions.net <mailto:sebastian@sipsolutions.net> (mailto:sebastian@sipsolutions.net <mailto:sebastian@sipsolutions.net>)> wrote: Hi all,
as a heads up that Peter Entschev has a PR open to add `like=` to most array creation functions, my current plan is to merge it soon as a preliminary API and bring it up again before the actual release (in a few months). This allows overriding for array-likes, e.g. it will allow:
arr = np.asarray([3], like=dask_array) type(arr) is dask.array.Array
This was proposed in NEP 35:
https://numpy.org/neps/nep-0035-array-creation-dispatch-with-array-function.... <https://numpy.org/neps/nep-0035-array-creation-dispatch-with-array-function.html>
Although that has not been accepted as of now, the PR is:
https://github.com/numpy/numpy/pull/16935 <https://github.com/numpy/numpy/pull/16935>
This was discussed in a smaller group, and is an attempt to see how we can make the array-function protocol viable to allow packages such as sklearn to work with non-NumPy arrays.
As of now, this would be experimental and can revisit it before the actual NumPy release. We should probably discuss accepting NEP 35 more. At this time, I hope that we can put in the functionality to facilitate this discussion, rather the other way around.
If anyone feels nervous about this step, I would be happy to document that we will not include it in the next release unless the NEP is accepted first, or at least hide it behind an environment variable.
Cheers,
Sebastian
_______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@python.org <mailto:NumPy-Discussion@python.org> https://mail.python.org/mailman/listinfo/numpy-discussion <https://mail.python.org/mailman/listinfo/numpy-discussion>
_______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@python.org <mailto:NumPy-Discussion@python.org> https://mail.python.org/mailman/listinfo/numpy-discussion <https://mail.python.org/mailman/listinfo/numpy-discussion>
_______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@python.org <mailto:NumPy-Discussion@python.org> https://mail.python.org/mailman/listinfo/numpy-discussion <https://mail.python.org/mailman/listinfo/numpy-discussion> _______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@python.org <mailto:NumPy-Discussion@python.org> https://mail.python.org/mailman/listinfo/numpy-discussion <https://mail.python.org/mailman/listinfo/numpy-discussion> _______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@python.org https://mail.python.org/mailman/listinfo/numpy-discussion
I am not sure adding a new keyword to an already confusing function is the right thing to do.
Could you clarify what is the confusing function in question?
This is already a very (I mean extremely very) easy keyword name to confuse with ones_like, zeros_like and by its nature any other interpretation.
To be fair, the usage is the same. Therefore empty_like(downstream_array, ...) and empty(downstream_array, ..., like=downstream_array) should have the exact same behavior, which is arguably redundant now.
It is not signalling anything about the functionality that is being discussed. I would seriously consider reserving such obvious names for really obvious tasks. Because you would also expect the shape and ndim would be mimicked by the "like"d argument but it turns out it is acting more like "typeof=" and not "like=" at all.
I understand this can be confusing, and naming was one of the hardest discussions as there's no clear unambiguous name to use for this keyword, "like=" was simply the name that got closer to converging during discussions. At the same time I think "typeof=" is perhaps a better name than "like=", it could be very much confusing with "dtype=", and that would possibly just shift the confusion.
Again, if this is meant for downstream libraries (because that's what I got out of the PR discussion, cupy, dask, and JAX were the only examples I could read) then hiding it in another function and writing with capital letters "this is not meant for numpy users" would be a much more convenient way to separate the target audience and regular users.
The problem with this approach is that the __array_function__ protocol relies on downstream libraries implementing functions with the same signature (for example, Dask and CuPy both implement an "array" function that matches NumPy). The purpose of __array_function__ and NEP-35 is to introduce only minimal changes to both NumPy's API and downstream libraries. Of course adding new functions for such cases would work, but IMO it would defeat the purpose of __array_function__ in general as it would require a considerable amount of work in downstream libraries, and we discussed this previously deciding that an argument is better than many new functions [1].
I think, arriving to an agreement would be much faster if there is an executive summary of who this is intended for and what the regular usage is. Because with no offense, all I see is "dispatch", "_array_function_" and a lot of technical details of which I am absolutely ignorant.
This is what I intended to do in the Usage Guidance [2] section. Could you elaborate on what more information you'd want to see there? Or is it just a matter of reorganizing the NEP a bit to try and summarize such things right at the top?
Finally as a minor point, I know we are mostly (ex-)academics but this necessity of formal language on NEPs is self-imposed (probably PEPs are to blame) and not quite helping. It can be a bit more descriptive in my external opinion.
TBH, I don't really know how to solve that point, so if you have any specific suggestions, that's certainly welcome. I understand the frustration for a reader trying to understand all the details, with many being only described in NEP-18 [3], but we also strive to avoid rewriting things that are written elsewhere, which would also overburden those who are aware of what's being discussed.
I’ve generally been on the “let the NumPy devs worry about it” side of things, but I do agree with Ilhan that `like=` is confusing and `typeof=` would be a much more appropriate name for that parameter.
To be clear, I have no strong opinion on renaming it, I'm fine either way but I think it's unrealistic to expect that we find somewhat short, unambiguous and properly descriptive names in a single name. If the preference now shifts towards the "typeof=" name, we can change it, but "like=" was really named after "empty_like" and similar functions.
I do think library writers are NumPy users and so I wouldn’t really make that distinction, though. Users writing their own analysis code could very well be interested in writing code using numpy functions that will transparently work when the input is a CuPy array or whatever.
I'm guessing this is somewhat of a loose definition of "library", to some extent if you really need "like=" it means that you're writing your own functions around the NumPy API (and that IMO is a library, even if you call it something else), rather than just writing your application on top of the existing NumPy API. I'm also happy to rephrase that in the NEP if people feel it should be done.
I also share Ilhan’s concern (and I mentioned this in a previous NEP discussion) that NEPs are getting pretty inaccessible. In a sense these are difficult topics and readers should be expected to have *some* familiarity with the topics being discussed, but perhaps more effort should be put into the context/motivation/background of a NEP before accepting it. One way to ensure this might be to require a final proofreading step by someone who has not been involved at all in the discussions, like peer review does for papers.
This is a good point, and we do always notify people over the mailing list of new NEPs as per NEP-0 [4], which was done for NEP-35 [5] (originally NEP-33, but renamed due to other open NEPs at that time), unfortunately not many concerns were raised about that back then. Best, Peter [1] https://github.com/numpy/numpy/issues/14441#issuecomment-529969572 [2] https://numpy.org/neps/nep-0035-array-creation-dispatch-with-array-function.... [3] https://numpy.org/neps/nep-0018-array-function-protocol.html [4] https://numpy.org/neps/nep-0000.html#nep-workflow [5] https://mail.python.org/pipermail/numpy-discussion/2019-October/080176.html On Thu, Aug 13, 2020 at 3:44 AM Juan Nunez-Iglesias <jni@fastmail.com> wrote:
I’ve generally been on the “let the NumPy devs worry about it” side of things, but I do agree with Ilhan that `like=` is confusing and `typeof=` would be a much more appropriate name for that parameter.
I do think library writers are NumPy users and so I wouldn’t really make that distinction, though. Users writing their own analysis code could very well be interested in writing code using numpy functions that will transparently work when the input is a CuPy array or whatever.
I also share Ilhan’s concern (and I mentioned this in a previous NEP discussion) that NEPs are getting pretty inaccessible. In a sense these are difficult topics and readers should be expected to have *some* familiarity with the topics being discussed, but perhaps more effort should be put into the context/motivation/background of a NEP before accepting it. One way to ensure this might be to require a final proofreading step by someone who has not been involved at all in the discussions, like peer review does for papers.
Food for thought.
Juan.
On 13 Aug 2020, at 9:24 am, Ilhan Polat <ilhanpolat@gmail.com> wrote:
For what is worth, as a potential consumer in SciPy, it really doesn't say anything (both in NEP and the PR) about how the regular users of NumPy will benefit from this. If only and only 3rd parties are going to benefit from it, I am not sure adding a new keyword to an already confusing function is the right thing to do.
Let me clarify,
- This is already a very (I mean extremely very) easy keyword name to confuse with ones_like, zeros_like and by its nature any other interpretation. It is not signalling anything about the functionality that is being discussed. I would seriously consider reserving such obvious names for really obvious tasks. Because you would also expect the shape and ndim would be mimicked by the "like"d argument but it turns out it is acting more like "typeof=" and not "like=" at all. Because if we follow the semantics it reads as "make your argument asarray like the other thing" but it is actually doing, "make your argument an array with the other thing's type" which might not be an array after all.
- Again, if this is meant for downstream libraries (because that's what I got out of the PR discussion, cupy, dask, and JAX were the only examples I could read) then hiding it in another function and writing with capital letters "this is not meant for numpy users" would be a much more convenient way to separate the target audience and regular users. numpy.astypedarray([[some data], [...]], type_of=x) or whatever else it may be would be quite clean and to the point with no ambiguous keywords.
I think, arriving to an agreement would be much faster if there is an executive summary of who this is intended for and what the regular usage is. Because with no offense, all I see is "dispatch", "_array_function_" and a lot of technical details of which I am absolutely ignorant.
Finally as a minor point, I know we are mostly (ex-)academics but this necessity of formal language on NEPs is self-imposed (probably PEPs are to blame) and not quite helping. It can be a bit more descriptive in my external opinion.
best, ilhan
On Tue, Aug 11, 2020 at 12:18 AM Ralf Gommers <ralf.gommers@gmail.com> wrote:
On Mon, Aug 10, 2020 at 8:37 PM Sebastian Berg <sebastian@sipsolutions.net> wrote:
On Mon, 2020-08-10 at 17:35 +0200, Hameer Abbasi wrote:
Hi,
We should have a higher-bandwidth meeting/communication for all stakeholders, and particularly some library authors, to see what would be good for them.
I'm not sure that helps. At this point there's little progress since the last meeting, I think the plan is unchanged: we need implementations of all the options on offer, and then try them out in PRs for scikit-learn, SciPy and perhaps another package who's maintainers are interested, to test like=, __array_module__ in realistic situations.
We should definitely have language in the NEP that says it won’t be in a release unless the NEP is accepted.
In that case, I think the important part is to have language right now in the implementation, although that can refer to the NEP itself of course. You can't expect everyone who may be tempted to use it to actually read the NEP draft, at least not without pointing it out.
Agreed, I think the decision is on this list not in the NEP, and to make sure we won't forget we need an issue opened with the 1.20 milestone.
Cheers, Ralf
I will say that I think it is not very high risk, because I think annoying or not, the argument could be deprecated again with a transition short phase. Admittedly, that argument only works if we have a replacement solution.
Cheers,
Sebastian
Best regards, Hameer Abbasi
-- Sent from Canary (https://canarymail.io)
On Monday, Aug 10, 2020 at 5:31 PM, Sebastian Berg < sebastian@sipsolutions.net (mailto:sebastian@sipsolutions.net)> wrote: Hi all,
as a heads up that Peter Entschev has a PR open to add `like=` to most array creation functions, my current plan is to merge it soon as a preliminary API and bring it up again before the actual release (in a few months). This allows overriding for array-likes, e.g. it will allow:
arr = np.asarray([3], like=dask_array) type(arr) is dask.array.Array
This was proposed in NEP 35:
https://numpy.org/neps/nep-0035-array-creation-dispatch-with-array-function....
Although that has not been accepted as of now, the PR is:
https://github.com/numpy/numpy/pull/16935
This was discussed in a smaller group, and is an attempt to see how we can make the array-function protocol viable to allow packages such as sklearn to work with non-NumPy arrays.
As of now, this would be experimental and can revisit it before the actual NumPy release. We should probably discuss accepting NEP 35 more. At this time, I hope that we can put in the functionality to facilitate this discussion, rather the other way around.
If anyone feels nervous about this step, I would be happy to document that we will not include it in the next release unless the NEP is accepted first, or at least hide it behind an environment variable.
Cheers,
Sebastian
_______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@python.org https://mail.python.org/mailman/listinfo/numpy-discussion
_______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@python.org https://mail.python.org/mailman/listinfo/numpy-discussion
_______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@python.org https://mail.python.org/mailman/listinfo/numpy-discussion
_______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@python.org https://mail.python.org/mailman/listinfo/numpy-discussion
_______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@python.org https://mail.python.org/mailman/listinfo/numpy-discussion
_______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@python.org https://mail.python.org/mailman/listinfo/numpy-discussion
Thanks for raising these concerns Ilhan and Juan, and for answering Peter. Let me give my perspective as well. To start with, this is not specifically about Peter's NEP and PR. NEP 35 simply follows the pattern set by previous PRs, and given its tight scope is less difficult to understand than other NEPs on such technical topics. Peter has done a lot of things right, and is close to the finish line. On Thu, Aug 13, 2020 at 12:02 PM Peter Andreas Entschev <peter@entschev.com> wrote:
I think, arriving to an agreement would be much faster if there is an executive summary of who this is intended for and what the regular usage is. Because with no offense, all I see is "dispatch", "_array_function_" and a lot of technical details of which I am absolutely ignorant.
This is what I intended to do in the Usage Guidance [2] section. Could you elaborate on what more information you'd want to see there? Or is it just a matter of reorganizing the NEP a bit to try and summarize such things right at the top?
We adapted the NEP template [6] several times last year to try and improve this. And specified in there as well that NEP content set to the mailing list should only contain the sections: Abstract, Motivation and Scope, Usage and Impact, and Backwards compatibility. This to ensure we fully understand the "why" and "what" before the "how". Unfortunately that template and procedure hasn't been exercised much yet, only in NEP 38 [7] and partially in NEP 41 [8]. If we have long-time maintainers of SciPy (Ilhan and myself), scikit-image (Juan) and CuPy (Leo, on the PR review) all saying they don't understand the goals, relevance, target audience, or how they're supposed to use a new feature, that indicates that the people doing the writing and having the discussion are doing something wrong at a very fundamental level. At this point I'm pretty disappointed in and tired of how we write and discuss NEPs on technical topics like dispatching, dtypes and the like. People literally refuse to write down concrete motivations, goals and non-goals, code that's problematic now and will be better/working post-NEP and usage examples before launching into extensive discussion of the gory details of the internals. I'm not sure what to do about it. Completely separate API and behavior proposals from implementation proposals? Make separate "API" and "internals" teams with the likes of Juan, Ilhan and Leo on the API team which then needs to approve every API change in new NEPs? Offer to co-write NEPs if someone is willing but doesn't understand how to go about it? Keep the current structure/process but veto further approvals until NEP authors get it right? I want to make an exception for merging the current NEP, for which the plan is to merge it as experimental to try in downstream PRs and get more experience. That does mean that master will be in an unreleasable state by the way, which is unusual and it'd be nice to get Chuck's explicit OK for that. But after that, I think we need a change here. I would like to hear what everyone thinks is the shape that change should take - any of my above suggestions, or something else?
Finally as a minor point, I know we are mostly (ex-)academics but this necessity of formal language on NEPs is self-imposed (probably PEPs are to blame) and not quite helping. It can be a bit more descriptive in my external opinion.
TBH, I don't really know how to solve that point, so if you have any specific suggestions, that's certainly welcome. I understand the frustration for a reader trying to understand all the details, with many being only described in NEP-18 [3], but we also strive to avoid rewriting things that are written elsewhere, which would also overburden those who are aware of what's being discussed.
I also share Ilhan’s concern (and I mentioned this in a previous NEP discussion) that NEPs are getting pretty inaccessible. In a sense these are difficult topics and readers should be expected to have *some* familiarity with the topics being discussed, but perhaps more effort should be put into the context/motivation/background of a NEP before accepting it. One way to ensure this might be to require a final proofreading step by someone who has not been involved at all in the discussions, like peer review does for papers.
Some variant of this proposal would be my preference. Cheers, Ralf
[1] https://github.com/numpy/numpy/issues/14441#issuecomment-529969572 [2] https://numpy.org/neps/nep-0035-array-creation-dispatch-with-array-function.... [3] https://numpy.org/neps/nep-0018-array-function-protocol.html [4] https://numpy.org/neps/nep-0000.html#nep-workflow [5] https://mail.python.org/pipermail/numpy-discussion/2019-October/080176.html
[6] https://github.com/numpy/numpy/blob/master/doc/neps/nep-template.rst [7] https://github.com/numpy/numpy/blob/master/doc/neps/nep-0038-SIMD-optimizati... [8] https://github.com/numpy/numpy/blob/master/doc/neps/nep-0041-improved-dtype-...
On Thu, Aug 13, 2020 at 3:44 AM Juan Nunez-Iglesias <jni@fastmail.com> wrote:
I’ve generally been on the “let the NumPy devs worry about it” side of
things, but I do agree with Ilhan that `like=` is confusing and `typeof=` would be a much more appropriate name for that parameter.
I do think library writers are NumPy users and so I wouldn’t really make
that distinction, though. Users writing their own analysis code could very well be interested in writing code using numpy functions that will transparently work when the input is a CuPy array or whatever.
I also share Ilhan’s concern (and I mentioned this in a previous NEP
discussion) that NEPs are getting pretty inaccessible. In a sense these are difficult topics and readers should be expected to have *some* familiarity with the topics being discussed, but perhaps more effort should be put into the context/motivation/background of a NEP before accepting it. One way to ensure this might be to require a final proofreading step by someone who has not been involved at all in the discussions, like peer review does for papers.
Food for thought.
Juan.
On 13 Aug 2020, at 9:24 am, Ilhan Polat <ilhanpolat@gmail.com> wrote:
For what is worth, as a potential consumer in SciPy, it really doesn't
say anything (both in NEP and the PR) about how the regular users of NumPy will benefit from this. If only and only 3rd parties are going to benefit from it, I am not sure adding a new keyword to an already confusing function is the right thing to do.
Let me clarify,
- This is already a very (I mean extremely very) easy keyword name to
confuse with ones_like, zeros_like and by its nature any other interpretation. It is not signalling anything about the functionality that is being discussed. I would seriously consider reserving such obvious names for really obvious tasks. Because you would also expect the shape and ndim would be mimicked by the "like"d argument but it turns out it is acting more like "typeof=" and not "like=" at all. Because if we follow the semantics it reads as "make your argument asarray like the other thing" but it is actually doing, "make your argument an array with the other thing's type" which might not be an array after all.
- Again, if this is meant for downstream libraries (because that's what
I got out of the PR discussion, cupy, dask, and JAX were the only examples I could read) then hiding it in another function and writing with capital letters "this is not meant for numpy users" would be a much more convenient way to separate the target audience and regular users. numpy.astypedarray([[some data], [...]], type_of=x) or whatever else it may be would be quite clean and to the point with no ambiguous keywords.
I think, arriving to an agreement would be much faster if there is an
executive summary of who this is intended for and what the regular usage is. Because with no offense, all I see is "dispatch", "_array_function_" and a lot of technical details of which I am absolutely ignorant.
Finally as a minor point, I know we are mostly (ex-)academics but this
necessity of formal language on NEPs is self-imposed (probably PEPs are to blame) and not quite helping. It can be a bit more descriptive in my external opinion.
To maybe lighten up the discussion a bit and to make my outsider confusion more tangible, let me start by apologizing for diving head first without weighing the past luggage :-) I always forget how much effort goes into these things and for outsiders like me, it's a matter of dipping the finger and tasting it just before starting to complain how much salt is missing etc. What I was mentioning about NEPs wasn't only related specifically to this one by the way. It's the generic feeling that I have. First let me start what I mean by NumPy users and downstreamers distinction. This is very much related to how data-science and huge-array users are magnetizing every tool out there in the Python world which is fine though the majority of number-crunchers have nothing to do with any of GPU/Parallelism/ClusterUsage etc. Hence when I mention NumPy users, think of people who use NumPy as its own right with no duck-typing and nothing related to subclassing. Just straightforward array creation and lots of ops on these arrays. For those people (I'm one of them), this option brings in a keyword that we would never use. And it gets into many major functions (linspace and others mentioned somewhere). So it has a very appealing name but has nothing to do with me in an already very crowded namespace and keyword catalogue. That's basically a UX issue to be addressed (under the assumption that users like me are the majority). Either making its name as esoteric as possible so I naturally stay away from it or I don't see it. This has absolutely nothing to do with looking down on the downstream libraries. They are flat-out amazing and the more we can support them the merrier. Using yet another metaphor, I was hoping that NumPy would have a loading dock for heavy duty deliveries for downstream projects or specialized array creations and won't disturb the regular customer entrance. Because if I look at this page https://numpy.org/doc/stable/referenc/routines.array-creation.html, there are a lot of functions and I think most of them are candidates to gain this keyword. I wish I can comment on a viable alternative but I really cannot understand the _array_xxxx_ discussions since they fly way over my head no matter how many times I tried. So that's why I naively mentioned the "np.astypedarray" or "np.asarray_but_not_numpy_array" or whatever. Now I see that it is even more complicated and I generated extra noise. So you can just ignore my previous suggestions. Except that I want to draw attention to the UX problem and I'd like to leave it at that. The other point is about the NEP stuff. I think I need to elaborate. If the NEPs are meant for internal NumPy discussions, then by all means, crank up the pointer*-meter to 11 and dive into it, totally fine with me. But if you also want to get feedback from outside, then probably a few lines of code examples for mere mortals would go a long way. Also it would make the discussion much more streamlined in my humble opinion. What I was trying to get at was that almost all NEPs read like a legal document that I want to agree as soon as possible. Because they often come without any or minimal amount of code in it. In NEP35 for example, there are nice code blocks in function dispatching but I guess it's not meant for me. Because it is only decorating asarray with some black magic happening there somehow (I guess). So I can't even comprehend what the proposition would mean for the regular, friendly, anti-duck users. But I am pretty sure it is about dispatching something because the word is repeated ~20 times :-) Thus the feedback would be limited. That was also what I meant there. But again I totally understand the complexity of these issues. So I'm not expecting to understand all details of NumPy machinery in a single NEP. But anyways, hope this clarifies a few things that I failed to convey in my previous mail. ilhan On Thu, Aug 13, 2020 at 2:23 PM Ralf Gommers <ralf.gommers@gmail.com> wrote:
Thanks for raising these concerns Ilhan and Juan, and for answering Peter. Let me give my perspective as well.
To start with, this is not specifically about Peter's NEP and PR. NEP 35 simply follows the pattern set by previous PRs, and given its tight scope is less difficult to understand than other NEPs on such technical topics. Peter has done a lot of things right, and is close to the finish line.
On Thu, Aug 13, 2020 at 12:02 PM Peter Andreas Entschev < peter@entschev.com> wrote:
I think, arriving to an agreement would be much faster if there is an executive summary of who this is intended for and what the regular usage is. Because with no offense, all I see is "dispatch", "_array_function_" and a lot of technical details of which I am absolutely ignorant.
This is what I intended to do in the Usage Guidance [2] section. Could you elaborate on what more information you'd want to see there? Or is it just a matter of reorganizing the NEP a bit to try and summarize such things right at the top?
We adapted the NEP template [6] several times last year to try and improve this. And specified in there as well that NEP content set to the mailing list should only contain the sections: Abstract, Motivation and Scope, Usage and Impact, and Backwards compatibility. This to ensure we fully understand the "why" and "what" before the "how". Unfortunately that template and procedure hasn't been exercised much yet, only in NEP 38 [7] and partially in NEP 41 [8].
If we have long-time maintainers of SciPy (Ilhan and myself), scikit-image (Juan) and CuPy (Leo, on the PR review) all saying they don't understand the goals, relevance, target audience, or how they're supposed to use a new feature, that indicates that the people doing the writing and having the discussion are doing something wrong at a very fundamental level.
At this point I'm pretty disappointed in and tired of how we write and discuss NEPs on technical topics like dispatching, dtypes and the like. People literally refuse to write down concrete motivations, goals and non-goals, code that's problematic now and will be better/working post-NEP and usage examples before launching into extensive discussion of the gory details of the internals. I'm not sure what to do about it. Completely separate API and behavior proposals from implementation proposals? Make separate "API" and "internals" teams with the likes of Juan, Ilhan and Leo on the API team which then needs to approve every API change in new NEPs? Offer to co-write NEPs if someone is willing but doesn't understand how to go about it? Keep the current structure/process but veto further approvals until NEP authors get it right?
I want to make an exception for merging the current NEP, for which the plan is to merge it as experimental to try in downstream PRs and get more experience. That does mean that master will be in an unreleasable state by the way, which is unusual and it'd be nice to get Chuck's explicit OK for that. But after that, I think we need a change here. I would like to hear what everyone thinks is the shape that change should take - any of my above suggestions, or something else?
Finally as a minor point, I know we are mostly (ex-)academics but this necessity of formal language on NEPs is self-imposed (probably PEPs are to blame) and not quite helping. It can be a bit more descriptive in my external opinion.
TBH, I don't really know how to solve that point, so if you have any specific suggestions, that's certainly welcome. I understand the frustration for a reader trying to understand all the details, with many being only described in NEP-18 [3], but we also strive to avoid rewriting things that are written elsewhere, which would also overburden those who are aware of what's being discussed.
I also share Ilhan’s concern (and I mentioned this in a previous NEP discussion) that NEPs are getting pretty inaccessible. In a sense these are difficult topics and readers should be expected to have *some* familiarity with the topics being discussed, but perhaps more effort should be put into the context/motivation/background of a NEP before accepting it. One way to ensure this might be to require a final proofreading step by someone who has not been involved at all in the discussions, like peer review does for papers.
Some variant of this proposal would be my preference.
Cheers, Ralf
[1] https://github.com/numpy/numpy/issues/14441#issuecomment-529969572 [2] https://numpy.org/neps/nep-0035-array-creation-dispatch-with-array-function.... [3] https://numpy.org/neps/nep-0018-array-function-protocol.html [4] https://numpy.org/neps/nep-0000.html#nep-workflow [5] https://mail.python.org/pipermail/numpy-discussion/2019-October/080176.html
[6] https://github.com/numpy/numpy/blob/master/doc/neps/nep-template.rst [7] https://github.com/numpy/numpy/blob/master/doc/neps/nep-0038-SIMD-optimizati... [8] https://github.com/numpy/numpy/blob/master/doc/neps/nep-0041-improved-dtype-...
On Thu, Aug 13, 2020 at 3:44 AM Juan Nunez-Iglesias <jni@fastmail.com> wrote:
I’ve generally been on the “let the NumPy devs worry about it” side of
things, but I do agree with Ilhan that `like=` is confusing and `typeof=` would be a much more appropriate name for that parameter.
I do think library writers are NumPy users and so I wouldn’t really
make that distinction, though. Users writing their own analysis code could very well be interested in writing code using numpy functions that will transparently work when the input is a CuPy array or whatever.
I also share Ilhan’s concern (and I mentioned this in a previous NEP
discussion) that NEPs are getting pretty inaccessible. In a sense these are difficult topics and readers should be expected to have *some* familiarity with the topics being discussed, but perhaps more effort should be put into the context/motivation/background of a NEP before accepting it. One way to ensure this might be to require a final proofreading step by someone who has not been involved at all in the discussions, like peer review does for papers.
Food for thought.
Juan.
On 13 Aug 2020, at 9:24 am, Ilhan Polat <ilhanpolat@gmail.com> wrote:
For what is worth, as a potential consumer in SciPy, it really doesn't
say anything (both in NEP and the PR) about how the regular users of NumPy will benefit from this. If only and only 3rd parties are going to benefit from it, I am not sure adding a new keyword to an already confusing function is the right thing to do.
Let me clarify,
- This is already a very (I mean extremely very) easy keyword name to
confuse with ones_like, zeros_like and by its nature any other interpretation. It is not signalling anything about the functionality that is being discussed. I would seriously consider reserving such obvious names for really obvious tasks. Because you would also expect the shape and ndim would be mimicked by the "like"d argument but it turns out it is acting more like "typeof=" and not "like=" at all. Because if we follow the semantics it reads as "make your argument asarray like the other thing" but it is actually doing, "make your argument an array with the other thing's type" which might not be an array after all.
- Again, if this is meant for downstream libraries (because that's what
I got out of the PR discussion, cupy, dask, and JAX were the only examples I could read) then hiding it in another function and writing with capital letters "this is not meant for numpy users" would be a much more convenient way to separate the target audience and regular users. numpy.astypedarray([[some data], [...]], type_of=x) or whatever else it may be would be quite clean and to the point with no ambiguous keywords.
I think, arriving to an agreement would be much faster if there is an
executive summary of who this is intended for and what the regular usage is. Because with no offense, all I see is "dispatch", "_array_function_" and a lot of technical details of which I am absolutely ignorant.
Finally as a minor point, I know we are mostly (ex-)academics but this
necessity of formal language on NEPs is self-imposed (probably PEPs are to blame) and not quite helping. It can be a bit more descriptive in my external opinion.
_______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@python.org https://mail.python.org/mailman/listinfo/numpy-discussion
Ilhan, Thanks, that does clarify things. I think the main point -- and correct me here if I'm still wrong -- is that we want the NEP to have some very clear example of when/why/how to use it, preferably as early in the text as possible, maybe just below the Abstract, in a Motivation and Scope section, as the NEP Template [6] pointed out to by Ralf earlier suggests. That is a totally valid ask, and I'll try to address it as soon as possible (hopefully today or tomorrow). To the point of whether NEPs are to be read by users, I normally don't expect users to be required to read and understand those NEPs other than by pure curiosity. If we need them to do so, then there's definitely a big problem in the API. This may sound counterintuitive with what I said before about the "like=" name, but that's really the piece of the NumPy API that I with a somewhat reasonable understand of arrays don't quite get or like, for instance "asarray" and "like" sound exactly the same thing, but they're not in the NumPy context, and on the other hand it's quite difficult to find a reasonable name to clarify that. And once more, I do like the "typeof=" suggestion more than "like=" to be perfectly honest, I'm just afraid it could be mistaken by the "dtype=" keyword somehow and thus still not solve the clarity problem. Going back to users reading NEPs or not, I would really expect that the docstring from the function is sufficiently clear to keep users off of it, but still give them an understanding of why that exists, the current docstring is in [9], please do comment on it if you have ideas of how to make it more accessible to users. You also mentioned you'd like that the name is as esoteric as possible, do you have any suggestions for an esoteric name that is hopefully unambiguous too? Naming has definitely been very much on the table since the NEP was written, but the consensus was more that "like=" is reasonably similar enough in both application and the name itself to "empty_like" and derived functions, that's why we just stuck to it. Best, Peter [9] https://github.com/numpy/numpy/pull/16935/files#diff-e5969453e399f2d32519d30... On Thu, Aug 13, 2020 at 3:43 PM Ilhan Polat <ilhanpolat@gmail.com> wrote:
To maybe lighten up the discussion a bit and to make my outsider confusion more tangible, let me start by apologizing for diving head first without weighing the past luggage :-) I always forget how much effort goes into these things and for outsiders like me, it's a matter of dipping the finger and tasting it just before starting to complain how much salt is missing etc. What I was mentioning about NEPs wasn't only related specifically to this one by the way. It's the generic feeling that I have.
First let me start what I mean by NumPy users and downstreamers distinction. This is very much related to how data-science and huge-array users are magnetizing every tool out there in the Python world which is fine though the majority of number-crunchers have nothing to do with any of GPU/Parallelism/ClusterUsage etc. Hence when I mention NumPy users, think of people who use NumPy as its own right with no duck-typing and nothing related to subclassing. Just straightforward array creation and lots of ops on these arrays. For those people (I'm one of them), this option brings in a keyword that we would never use. And it gets into many major functions (linspace and others mentioned somewhere). So it has a very appealing name but has nothing to do with me in an already very crowded namespace and keyword catalogue. That's basically a UX issue to be addressed (under the assumption that users like me are the majority). Either making its name as esoteric as possible so I naturally stay away from it or I don't see it. This has absolutely nothing to do with looking down on the downstream libraries. They are flat-out amazing and the more we can support them the merrier.
Using yet another metaphor, I was hoping that NumPy would have a loading dock for heavy duty deliveries for downstream projects or specialized array creations and won't disturb the regular customer entrance. Because if I look at this page https://numpy.org/doc/stable/referenc/routines.array-creation.html, there are a lot of functions and I think most of them are candidates to gain this keyword. I wish I can comment on a viable alternative but I really cannot understand the _array_xxxx_ discussions since they fly way over my head no matter how many times I tried. So that's why I naively mentioned the "np.astypedarray" or "np.asarray_but_not_numpy_array" or whatever. Now I see that it is even more complicated and I generated extra noise. So you can just ignore my previous suggestions. Except that I want to draw attention to the UX problem and I'd like to leave it at that.
The other point is about the NEP stuff. I think I need to elaborate. If the NEPs are meant for internal NumPy discussions, then by all means, crank up the pointer*-meter to 11 and dive into it, totally fine with me. But if you also want to get feedback from outside, then probably a few lines of code examples for mere mortals would go a long way. Also it would make the discussion much more streamlined in my humble opinion. What I was trying to get at was that almost all NEPs read like a legal document that I want to agree as soon as possible. Because they often come without any or minimal amount of code in it. In NEP35 for example, there are nice code blocks in function dispatching but I guess it's not meant for me. Because it is only decorating asarray with some black magic happening there somehow (I guess). So I can't even comprehend what the proposition would mean for the regular, friendly, anti-duck users. But I am pretty sure it is about dispatching something because the word is repeated ~20 times :-) Thus the feedback would be limited. That was also what I meant there. But again I totally understand the complexity of these issues. So I'm not expecting to understand all details of NumPy machinery in a single NEP.
But anyways, hope this clarifies a few things that I failed to convey in my previous mail. ilhan
On Thu, Aug 13, 2020 at 2:23 PM Ralf Gommers <ralf.gommers@gmail.com> wrote:
Thanks for raising these concerns Ilhan and Juan, and for answering Peter. Let me give my perspective as well.
To start with, this is not specifically about Peter's NEP and PR. NEP 35 simply follows the pattern set by previous PRs, and given its tight scope is less difficult to understand than other NEPs on such technical topics. Peter has done a lot of things right, and is close to the finish line.
On Thu, Aug 13, 2020 at 12:02 PM Peter Andreas Entschev <peter@entschev.com> wrote:
I think, arriving to an agreement would be much faster if there is an executive summary of who this is intended for and what the regular usage is. Because with no offense, all I see is "dispatch", "_array_function_" and a lot of technical details of which I am absolutely ignorant.
This is what I intended to do in the Usage Guidance [2] section. Could you elaborate on what more information you'd want to see there? Or is it just a matter of reorganizing the NEP a bit to try and summarize such things right at the top?
We adapted the NEP template [6] several times last year to try and improve this. And specified in there as well that NEP content set to the mailing list should only contain the sections: Abstract, Motivation and Scope, Usage and Impact, and Backwards compatibility. This to ensure we fully understand the "why" and "what" before the "how". Unfortunately that template and procedure hasn't been exercised much yet, only in NEP 38 [7] and partially in NEP 41 [8].
If we have long-time maintainers of SciPy (Ilhan and myself), scikit-image (Juan) and CuPy (Leo, on the PR review) all saying they don't understand the goals, relevance, target audience, or how they're supposed to use a new feature, that indicates that the people doing the writing and having the discussion are doing something wrong at a very fundamental level.
At this point I'm pretty disappointed in and tired of how we write and discuss NEPs on technical topics like dispatching, dtypes and the like. People literally refuse to write down concrete motivations, goals and non-goals, code that's problematic now and will be better/working post-NEP and usage examples before launching into extensive discussion of the gory details of the internals. I'm not sure what to do about it. Completely separate API and behavior proposals from implementation proposals? Make separate "API" and "internals" teams with the likes of Juan, Ilhan and Leo on the API team which then needs to approve every API change in new NEPs? Offer to co-write NEPs if someone is willing but doesn't understand how to go about it? Keep the current structure/process but veto further approvals until NEP authors get it right?
I want to make an exception for merging the current NEP, for which the plan is to merge it as experimental to try in downstream PRs and get more experience. That does mean that master will be in an unreleasable state by the way, which is unusual and it'd be nice to get Chuck's explicit OK for that. But after that, I think we need a change here. I would like to hear what everyone thinks is the shape that change should take - any of my above suggestions, or something else?
Finally as a minor point, I know we are mostly (ex-)academics but this necessity of formal language on NEPs is self-imposed (probably PEPs are to blame) and not quite helping. It can be a bit more descriptive in my external opinion.
TBH, I don't really know how to solve that point, so if you have any specific suggestions, that's certainly welcome. I understand the frustration for a reader trying to understand all the details, with many being only described in NEP-18 [3], but we also strive to avoid rewriting things that are written elsewhere, which would also overburden those who are aware of what's being discussed.
I also share Ilhan’s concern (and I mentioned this in a previous NEP discussion) that NEPs are getting pretty inaccessible. In a sense these are difficult topics and readers should be expected to have *some* familiarity with the topics being discussed, but perhaps more effort should be put into the context/motivation/background of a NEP before accepting it. One way to ensure this might be to require a final proofreading step by someone who has not been involved at all in the discussions, like peer review does for papers.
Some variant of this proposal would be my preference.
Cheers, Ralf
[1] https://github.com/numpy/numpy/issues/14441#issuecomment-529969572 [2] https://numpy.org/neps/nep-0035-array-creation-dispatch-with-array-function.... [3] https://numpy.org/neps/nep-0018-array-function-protocol.html [4] https://numpy.org/neps/nep-0000.html#nep-workflow [5] https://mail.python.org/pipermail/numpy-discussion/2019-October/080176.html
[6] https://github.com/numpy/numpy/blob/master/doc/neps/nep-template.rst [7] https://github.com/numpy/numpy/blob/master/doc/neps/nep-0038-SIMD-optimizati... [8] https://github.com/numpy/numpy/blob/master/doc/neps/nep-0041-improved-dtype-...
On Thu, Aug 13, 2020 at 3:44 AM Juan Nunez-Iglesias <jni@fastmail.com> wrote:
I’ve generally been on the “let the NumPy devs worry about it” side of things, but I do agree with Ilhan that `like=` is confusing and `typeof=` would be a much more appropriate name for that parameter.
I do think library writers are NumPy users and so I wouldn’t really make that distinction, though. Users writing their own analysis code could very well be interested in writing code using numpy functions that will transparently work when the input is a CuPy array or whatever.
I also share Ilhan’s concern (and I mentioned this in a previous NEP discussion) that NEPs are getting pretty inaccessible. In a sense these are difficult topics and readers should be expected to have *some* familiarity with the topics being discussed, but perhaps more effort should be put into the context/motivation/background of a NEP before accepting it. One way to ensure this might be to require a final proofreading step by someone who has not been involved at all in the discussions, like peer review does for papers.
Food for thought.
Juan.
On 13 Aug 2020, at 9:24 am, Ilhan Polat <ilhanpolat@gmail.com> wrote:
For what is worth, as a potential consumer in SciPy, it really doesn't say anything (both in NEP and the PR) about how the regular users of NumPy will benefit from this. If only and only 3rd parties are going to benefit from it, I am not sure adding a new keyword to an already confusing function is the right thing to do.
Let me clarify,
- This is already a very (I mean extremely very) easy keyword name to confuse with ones_like, zeros_like and by its nature any other interpretation. It is not signalling anything about the functionality that is being discussed. I would seriously consider reserving such obvious names for really obvious tasks. Because you would also expect the shape and ndim would be mimicked by the "like"d argument but it turns out it is acting more like "typeof=" and not "like=" at all. Because if we follow the semantics it reads as "make your argument asarray like the other thing" but it is actually doing, "make your argument an array with the other thing's type" which might not be an array after all.
- Again, if this is meant for downstream libraries (because that's what I got out of the PR discussion, cupy, dask, and JAX were the only examples I could read) then hiding it in another function and writing with capital letters "this is not meant for numpy users" would be a much more convenient way to separate the target audience and regular users. numpy.astypedarray([[some data], [...]], type_of=x) or whatever else it may be would be quite clean and to the point with no ambiguous keywords.
I think, arriving to an agreement would be much faster if there is an executive summary of who this is intended for and what the regular usage is. Because with no offense, all I see is "dispatch", "_array_function_" and a lot of technical details of which I am absolutely ignorant.
Finally as a minor point, I know we are mostly (ex-)academics but this necessity of formal language on NEPs is self-imposed (probably PEPs are to blame) and not quite helping. It can be a bit more descriptive in my external opinion.
_______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@python.org https://mail.python.org/mailman/listinfo/numpy-discussion
_______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@python.org https://mail.python.org/mailman/listinfo/numpy-discussion
Yes, the underlying gory details should be spelled out of course but if it is also modifying/adding to API then it is best to sound the horn and invite zombies to take a stab at it. Often people arrive with interesting use-cases that you wouldn't have thought about. And I am very familiar with the pushback feeling you are having right now, probably internally shouting "where have you been all this time you slackers?". As you might have seen me asking questions here and Cython lists, when I am done with some new feature over SciPy, it is also going to be a very very long and tiring process. I am really not looking forward to it :-) but I guess it is part of the deal. Maybe I can give some comfort that if more people start to flock over that means it has morphed into a finished product so people can shoot. But, I honestly thought this was a new NEP, that's a mistake on my part. For the like, typeof and other candidates, by esoteric I mean foreign enough to most users. We already have a nice candidate I think; ehm... "dispatch" or "dispatch_like" or something like that, nobody sober enough would confuse this with any other. And since this won't be typed in daily usage, or so I understood, I guess it is ok to make it verbose. But still take it as an initial guess and feel free to dismiss. I still would be in a platonic love with "numpy.DIY" or "numpy.hermes" namespace with a nice "bring your own _array_function_" service. On Thu, Aug 13, 2020 at 4:16 PM Peter Andreas Entschev <peter@entschev.com> wrote:
Ilhan,
Thanks, that does clarify things.
I think the main point -- and correct me here if I'm still wrong -- is that we want the NEP to have some very clear example of when/why/how to use it, preferably as early in the text as possible, maybe just below the Abstract, in a Motivation and Scope section, as the NEP Template [6] pointed out to by Ralf earlier suggests. That is a totally valid ask, and I'll try to address it as soon as possible (hopefully today or tomorrow).
To the point of whether NEPs are to be read by users, I normally don't expect users to be required to read and understand those NEPs other than by pure curiosity. If we need them to do so, then there's definitely a big problem in the API. This may sound counterintuitive with what I said before about the "like=" name, but that's really the piece of the NumPy API that I with a somewhat reasonable understand of arrays don't quite get or like, for instance "asarray" and "like" sound exactly the same thing, but they're not in the NumPy context, and on the other hand it's quite difficult to find a reasonable name to clarify that. And once more, I do like the "typeof=" suggestion more than "like=" to be perfectly honest, I'm just afraid it could be mistaken by the "dtype=" keyword somehow and thus still not solve the clarity problem. Going back to users reading NEPs or not, I would really expect that the docstring from the function is sufficiently clear to keep users off of it, but still give them an understanding of why that exists, the current docstring is in [9], please do comment on it if you have ideas of how to make it more accessible to users.
You also mentioned you'd like that the name is as esoteric as possible, do you have any suggestions for an esoteric name that is hopefully unambiguous too? Naming has definitely been very much on the table since the NEP was written, but the consensus was more that "like=" is reasonably similar enough in both application and the name itself to "empty_like" and derived functions, that's why we just stuck to it.
Best, Peter
[9] https://github.com/numpy/numpy/pull/16935/files#diff-e5969453e399f2d32519d30...
On Thu, Aug 13, 2020 at 3:43 PM Ilhan Polat <ilhanpolat@gmail.com> wrote:
To maybe lighten up the discussion a bit and to make my outsider
confusion more tangible, let me start by apologizing for diving head first without weighing the past luggage :-) I always forget how much effort goes into these things and for outsiders like me, it's a matter of dipping the finger and tasting it just before starting to complain how much salt is missing etc. What I was mentioning about NEPs wasn't only related specifically to this one by the way. It's the generic feeling that I have.
First let me start what I mean by NumPy users and downstreamers
distinction. This is very much related to how data-science and huge-array users are magnetizing every tool out there in the Python world which is fine though the majority of number-crunchers have nothing to do with any of GPU/Parallelism/ClusterUsage etc. Hence when I mention NumPy users, think of people who use NumPy as its own right with no duck-typing and nothing related to subclassing. Just straightforward array creation and lots of ops on these arrays. For those people (I'm one of them), this option brings in a keyword that we would never use. And it gets into many major functions (linspace and others mentioned somewhere). So it has a very appealing name but has nothing to do with me in an already very crowded namespace and keyword catalogue. That's basically a UX issue to be addressed (under the assumption that users like me are the majority). Either making its name as esoteric as possible so I naturally stay away from it or I don't see it. This has absolutely nothing to do with looking down on the downstream libraries. They are flat-out amazing and the more we can support them the merrier.
Using yet another metaphor, I was hoping that NumPy would have a loading
dock for heavy duty deliveries for downstream projects or specialized array creations and won't disturb the regular customer entrance. Because if I look at this page https://numpy.org/doc/stable/referenc/routines.array-creation.html, there are a lot of functions and I think most of them are candidates to gain this keyword. I wish I can comment on a viable alternative but I really cannot understand the _array_xxxx_ discussions since they fly way over my head no matter how many times I tried. So that's why I naively mentioned the "np.astypedarray" or "np.asarray_but_not_numpy_array" or whatever. Now I see that it is even more complicated and I generated extra noise. So you can just ignore my previous suggestions. Except that I want to draw attention to the UX problem and I'd like to leave it at that.
The other point is about the NEP stuff. I think I need to elaborate. If
the NEPs are meant for internal NumPy discussions, then by all means, crank up the pointer*-meter to 11 and dive into it, totally fine with me. But if you also want to get feedback from outside, then probably a few lines of code examples for mere mortals would go a long way. Also it would make the discussion much more streamlined in my humble opinion. What I was trying to get at was that almost all NEPs read like a legal document that I want to agree as soon as possible. Because they often come without any or minimal amount of code in it. In NEP35 for example, there are nice code blocks in function dispatching but I guess it's not meant for me. Because it is only decorating asarray with some black magic happening there somehow (I guess). So I can't even comprehend what the proposition would mean for the regular, friendly, anti-duck users. But I am pretty sure it is about dispatching something because the word is repeated ~20 times :-) Thus the feedback would be limited. That was also what I meant there. But again I totally understand the complexity of these issues. So I'm not expecting to understand all details of NumPy machinery in a single NEP.
But anyways, hope this clarifies a few things that I failed to convey in
ilhan
On Thu, Aug 13, 2020 at 2:23 PM Ralf Gommers <ralf.gommers@gmail.com> wrote:
Thanks for raising these concerns Ilhan and Juan, and for answering
Peter. Let me give my perspective as well.
To start with, this is not specifically about Peter's NEP and PR. NEP
35 simply follows the pattern set by previous PRs, and given its tight scope is less difficult to understand than other NEPs on such technical topics. Peter has done a lot of things right, and is close to the finish
On Thu, Aug 13, 2020 at 12:02 PM Peter Andreas Entschev <
I think, arriving to an agreement would be much faster if there is
an executive summary of who this is intended for and what the regular usage is. Because with no offense, all I see is "dispatch", "_array_function_" and a lot of technical details of which I am absolutely ignorant.
This is what I intended to do in the Usage Guidance [2] section. Could you elaborate on what more information you'd want to see there? Or is it just a matter of reorganizing the NEP a bit to try and summarize such things right at the top?
We adapted the NEP template [6] several times last year to try and improve this. And specified in there as well that NEP content set to the mailing list should only contain the sections: Abstract, Motivation and Scope, Usage and Impact, and Backwards compatibility. This to ensure we fully understand the "why" and "what" before the "how". Unfortunately that template and procedure hasn't been exercised much yet, only in NEP 38 [7] and partially in NEP 41 [8].
If we have long-time maintainers of SciPy (Ilhan and myself), scikit-image (Juan) and CuPy (Leo, on the PR review) all saying they don't understand the goals, relevance, target audience, or how they're supposed to use a new feature, that indicates that the people doing the writing and having the discussion are doing something wrong at a very fundamental level.
At this point I'm pretty disappointed in and tired of how we write and discuss NEPs on technical topics like dispatching, dtypes and the like. People literally refuse to write down concrete motivations, goals and non-goals, code that's problematic now and will be better/working post-NEP and usage examples before launching into extensive discussion of the gory
I want to make an exception for merging the current NEP, for which the
Finally as a minor point, I know we are mostly (ex-)academics but
TBH, I don't really know how to solve that point, so if you have any specific suggestions, that's certainly welcome. I understand the frustration for a reader trying to understand all the details, with many being only described in NEP-18 [3], but we also strive to avoid rewriting things that are written elsewhere, which would also overburden those who are aware of what's being discussed.
I also share Ilhan’s concern (and I mentioned this in a previous NEP
discussion) that NEPs are getting pretty inaccessible. In a sense these are difficult topics and readers should be expected to have *some* familiarity with the topics being discussed, but perhaps more effort should be put into
Some variant of this proposal would be my preference.
Cheers, Ralf
[1] https://github.com/numpy/numpy/issues/14441#issuecomment-529969572 [2]
https://numpy.org/neps/nep-0035-array-creation-dispatch-with-array-function....
[3] https://numpy.org/neps/nep-0018-array-function-protocol.html [4] https://numpy.org/neps/nep-0000.html#nep-workflow [5] https://mail.python.org/pipermail/numpy-discussion/2019-October/080176.html
[6] https://github.com/numpy/numpy/blob/master/doc/neps/nep-template.rst [7] https://github.com/numpy/numpy/blob/master/doc/neps/nep-0038-SIMD-optimizati... [8] https://github.com/numpy/numpy/blob/master/doc/neps/nep-0041-improved-dtype-...
On Thu, Aug 13, 2020 at 3:44 AM Juan Nunez-Iglesias <jni@fastmail.com>
wrote:
I’ve generally been on the “let the NumPy devs worry about it” side
of things, but I do agree with Ilhan that `like=` is confusing and `typeof=` would be a much more appropriate name for that parameter.
I do think library writers are NumPy users and so I wouldn’t really
make that distinction, though. Users writing their own analysis code could very well be interested in writing code using numpy functions that will
I also share Ilhan’s concern (and I mentioned this in a previous NEP
discussion) that NEPs are getting pretty inaccessible. In a sense these are difficult topics and readers should be expected to have *some* familiarity with the topics being discussed, but perhaps more effort should be put into
Food for thought.
Juan.
On 13 Aug 2020, at 9:24 am, Ilhan Polat <ilhanpolat@gmail.com>
wrote:
For what is worth, as a potential consumer in SciPy, it really
doesn't say anything (both in NEP and the PR) about how the regular users of NumPy will benefit from this. If only and only 3rd parties are going to benefit from it, I am not sure adding a new keyword to an already confusing function is the right thing to do.
Let me clarify,
- This is already a very (I mean extremely very) easy keyword name
to confuse with ones_like, zeros_like and by its nature any other interpretation. It is not signalling anything about the functionality that is being discussed. I would seriously consider reserving such obvious names for really obvious tasks. Because you would also expect the shape and ndim would be mimicked by the "like"d argument but it turns out it is acting more like "typeof=" and not "like=" at all. Because if we follow the semantics it reads as "make your argument asarray like the other thing" but it is actually doing, "make your argument an array with the other thing's type" which might not be an array after all.
- Again, if this is meant for downstream libraries (because that's
what I got out of the PR discussion, cupy, dask, and JAX were the only examples I could read) then hiding it in another function and writing with capital letters "this is not meant for numpy users" would be a much more convenient way to separate the target audience and regular users. numpy.astypedarray([[some data], [...]], type_of=x) or whatever else it may be would be quite clean and to the point with no ambiguous keywords.
I think, arriving to an agreement would be much faster if there is
an executive summary of who this is intended for and what the regular usage is. Because with no offense, all I see is "dispatch", "_array_function_" and a lot of technical details of which I am absolutely ignorant.
Finally as a minor point, I know we are mostly (ex-)academics but
my previous mail. line. peter@entschev.com> wrote: details of the internals. I'm not sure what to do about it. Completely separate API and behavior proposals from implementation proposals? Make separate "API" and "internals" teams with the likes of Juan, Ilhan and Leo on the API team which then needs to approve every API change in new NEPs? Offer to co-write NEPs if someone is willing but doesn't understand how to go about it? Keep the current structure/process but veto further approvals until NEP authors get it right? plan is to merge it as experimental to try in downstream PRs and get more experience. That does mean that master will be in an unreleasable state by the way, which is unusual and it'd be nice to get Chuck's explicit OK for that. But after that, I think we need a change here. I would like to hear what everyone thinks is the shape that change should take - any of my above suggestions, or something else? this necessity of formal language on NEPs is self-imposed (probably PEPs are to blame) and not quite helping. It can be a bit more descriptive in my external opinion. the context/motivation/background of a NEP before accepting it. One way to ensure this might be to require a final proofreading step by someone who has not been involved at all in the discussions, like peer review does for papers. transparently work when the input is a CuPy array or whatever. the context/motivation/background of a NEP before accepting it. One way to ensure this might be to require a final proofreading step by someone who has not been involved at all in the discussions, like peer review does for papers. this necessity of formal language on NEPs is self-imposed (probably PEPs are to blame) and not quite helping. It can be a bit more descriptive in my external opinion.
_______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@python.org https://mail.python.org/mailman/listinfo/numpy-discussion
_______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@python.org https://mail.python.org/mailman/listinfo/numpy-discussion
NumPy-Discussion mailing list NumPy-Discussion@python.org https://mail.python.org/mailman/listinfo/numpy-discussion
We adapted the NEP template [6] several times last year to try and improve this. And specified in there as well that NEP content set to the mailing list should only contain the sections: Abstract, Motivation and Scope, Usage and Impact, and Backwards compatibility. This to ensure we fully understand the "why" and "what" before the "how". Unfortunately that template and procedure hasn't been exercised much yet, only in NEP 38 [7] and partially in NEP 41 [8].
If we have long-time maintainers of SciPy (Ilhan and myself), scikit-image (Juan) and CuPy (Leo, on the PR review) all saying they don't understand the goals, relevance, target audience, or how they're supposed to use a new feature, that indicates that the people doing the writing and having the discussion are doing something wrong at a very fundamental level.
I'm more than happy to edit the NEP and try to clarify all the concerns. However, it gets pretty difficult to do so when I as an author don't understand where the difficulty is. Ilhan, Juan and Ralf now pointed out things that are missing/unclear, but no comment was made in that regard when I sent the NEP, my point being: I couldn't fix what I didn't know was a problem to others.
At this point I'm pretty disappointed in and tired of how we write and discuss NEPs on technical topics like dispatching, dtypes and the like. People literally refuse to write down concrete motivations, goals and non-goals, code that's problematic now and will be better/working post-NEP and usage examples before launching into extensive discussion of the gory details of the internals. I'm not sure what to do about it.
Honestly, I don't really understand this. From my perspective, there are two ways to deal with such things: 1. Templates are to be taken mainly as _guidelines_ rather than _hardlines_, and the current text of NEP-35 definitely falls in the first category; 2. Templates are _hardlines_ and to be guided/enforced by maintainers at some point (maybe before merging the PR?). If 2 is the desired case for NumPy, which sounds a lot like what is wanted from NEP-35 and other NEPs generally, maintainers should let the authors know as early as possible that something isn't following the template's hardlines and it should be corrected. I don't mean any of this to remove myself of any responsibility, but would like to express my frustration that a 10 month-old NEP is only now getting so much pushback for being unclear after its implementation is nearing completion.
I want to make an exception for merging the current NEP, for which the plan is to merge it as experimental to try in downstream PRs and get more experience. That does mean that master will be in an unreleasable state by the way, which is unusual and it'd be nice to get Chuck's explicit OK for that.
I don't quite understand this either, why would that leave master in an unreleasable state? Best, Peter On Thu, Aug 13, 2020 at 2:21 PM Ralf Gommers <ralf.gommers@gmail.com> wrote:
Thanks for raising these concerns Ilhan and Juan, and for answering Peter. Let me give my perspective as well.
To start with, this is not specifically about Peter's NEP and PR. NEP 35 simply follows the pattern set by previous PRs, and given its tight scope is less difficult to understand than other NEPs on such technical topics. Peter has done a lot of things right, and is close to the finish line.
On Thu, Aug 13, 2020 at 12:02 PM Peter Andreas Entschev <peter@entschev.com> wrote:
I think, arriving to an agreement would be much faster if there is an executive summary of who this is intended for and what the regular usage is. Because with no offense, all I see is "dispatch", "_array_function_" and a lot of technical details of which I am absolutely ignorant.
This is what I intended to do in the Usage Guidance [2] section. Could you elaborate on what more information you'd want to see there? Or is it just a matter of reorganizing the NEP a bit to try and summarize such things right at the top?
We adapted the NEP template [6] several times last year to try and improve this. And specified in there as well that NEP content set to the mailing list should only contain the sections: Abstract, Motivation and Scope, Usage and Impact, and Backwards compatibility. This to ensure we fully understand the "why" and "what" before the "how". Unfortunately that template and procedure hasn't been exercised much yet, only in NEP 38 [7] and partially in NEP 41 [8].
If we have long-time maintainers of SciPy (Ilhan and myself), scikit-image (Juan) and CuPy (Leo, on the PR review) all saying they don't understand the goals, relevance, target audience, or how they're supposed to use a new feature, that indicates that the people doing the writing and having the discussion are doing something wrong at a very fundamental level.
At this point I'm pretty disappointed in and tired of how we write and discuss NEPs on technical topics like dispatching, dtypes and the like. People literally refuse to write down concrete motivations, goals and non-goals, code that's problematic now and will be better/working post-NEP and usage examples before launching into extensive discussion of the gory details of the internals. I'm not sure what to do about it. Completely separate API and behavior proposals from implementation proposals? Make separate "API" and "internals" teams with the likes of Juan, Ilhan and Leo on the API team which then needs to approve every API change in new NEPs? Offer to co-write NEPs if someone is willing but doesn't understand how to go about it? Keep the current structure/process but veto further approvals until NEP authors get it right?
I want to make an exception for merging the current NEP, for which the plan is to merge it as experimental to try in downstream PRs and get more experience. That does mean that master will be in an unreleasable state by the way, which is unusual and it'd be nice to get Chuck's explicit OK for that. But after that, I think we need a change here. I would like to hear what everyone thinks is the shape that change should take - any of my above suggestions, or something else?
Finally as a minor point, I know we are mostly (ex-)academics but this necessity of formal language on NEPs is self-imposed (probably PEPs are to blame) and not quite helping. It can be a bit more descriptive in my external opinion.
TBH, I don't really know how to solve that point, so if you have any specific suggestions, that's certainly welcome. I understand the frustration for a reader trying to understand all the details, with many being only described in NEP-18 [3], but we also strive to avoid rewriting things that are written elsewhere, which would also overburden those who are aware of what's being discussed.
I also share Ilhan’s concern (and I mentioned this in a previous NEP discussion) that NEPs are getting pretty inaccessible. In a sense these are difficult topics and readers should be expected to have *some* familiarity with the topics being discussed, but perhaps more effort should be put into the context/motivation/background of a NEP before accepting it. One way to ensure this might be to require a final proofreading step by someone who has not been involved at all in the discussions, like peer review does for papers.
Some variant of this proposal would be my preference.
Cheers, Ralf
[1] https://github.com/numpy/numpy/issues/14441#issuecomment-529969572 [2] https://numpy.org/neps/nep-0035-array-creation-dispatch-with-array-function.... [3] https://numpy.org/neps/nep-0018-array-function-protocol.html [4] https://numpy.org/neps/nep-0000.html#nep-workflow [5] https://mail.python.org/pipermail/numpy-discussion/2019-October/080176.html
[6] https://github.com/numpy/numpy/blob/master/doc/neps/nep-template.rst [7] https://github.com/numpy/numpy/blob/master/doc/neps/nep-0038-SIMD-optimizati... [8] https://github.com/numpy/numpy/blob/master/doc/neps/nep-0041-improved-dtype-...
On Thu, Aug 13, 2020 at 3:44 AM Juan Nunez-Iglesias <jni@fastmail.com> wrote:
I’ve generally been on the “let the NumPy devs worry about it” side of things, but I do agree with Ilhan that `like=` is confusing and `typeof=` would be a much more appropriate name for that parameter.
I do think library writers are NumPy users and so I wouldn’t really make that distinction, though. Users writing their own analysis code could very well be interested in writing code using numpy functions that will transparently work when the input is a CuPy array or whatever.
I also share Ilhan’s concern (and I mentioned this in a previous NEP discussion) that NEPs are getting pretty inaccessible. In a sense these are difficult topics and readers should be expected to have *some* familiarity with the topics being discussed, but perhaps more effort should be put into the context/motivation/background of a NEP before accepting it. One way to ensure this might be to require a final proofreading step by someone who has not been involved at all in the discussions, like peer review does for papers.
Food for thought.
Juan.
On 13 Aug 2020, at 9:24 am, Ilhan Polat <ilhanpolat@gmail.com> wrote:
For what is worth, as a potential consumer in SciPy, it really doesn't say anything (both in NEP and the PR) about how the regular users of NumPy will benefit from this. If only and only 3rd parties are going to benefit from it, I am not sure adding a new keyword to an already confusing function is the right thing to do.
Let me clarify,
- This is already a very (I mean extremely very) easy keyword name to confuse with ones_like, zeros_like and by its nature any other interpretation. It is not signalling anything about the functionality that is being discussed. I would seriously consider reserving such obvious names for really obvious tasks. Because you would also expect the shape and ndim would be mimicked by the "like"d argument but it turns out it is acting more like "typeof=" and not "like=" at all. Because if we follow the semantics it reads as "make your argument asarray like the other thing" but it is actually doing, "make your argument an array with the other thing's type" which might not be an array after all.
- Again, if this is meant for downstream libraries (because that's what I got out of the PR discussion, cupy, dask, and JAX were the only examples I could read) then hiding it in another function and writing with capital letters "this is not meant for numpy users" would be a much more convenient way to separate the target audience and regular users. numpy.astypedarray([[some data], [...]], type_of=x) or whatever else it may be would be quite clean and to the point with no ambiguous keywords.
I think, arriving to an agreement would be much faster if there is an executive summary of who this is intended for and what the regular usage is. Because with no offense, all I see is "dispatch", "_array_function_" and a lot of technical details of which I am absolutely ignorant.
Finally as a minor point, I know we are mostly (ex-)academics but this necessity of formal language on NEPs is self-imposed (probably PEPs are to blame) and not quite helping. It can be a bit more descriptive in my external opinion.
_______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@python.org https://mail.python.org/mailman/listinfo/numpy-discussion
On Thu, Aug 13, 2020 at 2:47 PM Peter Andreas Entschev <peter@entschev.com> wrote:
We adapted the NEP template [6] several times last year to try and improve this. And specified in there as well that NEP content set to the mailing list should only contain the sections: Abstract, Motivation and Scope, Usage and Impact, and Backwards compatibility. This to ensure we fully understand the "why" and "what" before the "how". Unfortunately that template and procedure hasn't been exercised much yet, only in NEP 38 [7] and partially in NEP 41 [8].
If we have long-time maintainers of SciPy (Ilhan and myself), scikit-image (Juan) and CuPy (Leo, on the PR review) all saying they don't understand the goals, relevance, target audience, or how they're supposed to use a new feature, that indicates that the people doing the writing and having the discussion are doing something wrong at a very fundamental level.
I'm more than happy to edit the NEP and try to clarify all the concerns.
Thanks Peter. Let me reiterate, you did a lot of things right, have been happy to adapt when given feedback, and your willingness to go back and fix things up now is much appreciated (and I'm happy to help). No criticism of your work or attitude intended, on the contract.
However, it gets pretty difficult to do so when I as an author don't understand where the difficulty is. Ilhan, Juan and Ralf now pointed out things that are missing/unclear, but no comment was made in that regard when I sent the NEP, my point being: I couldn't fix what I didn't know was a problem to others.
Yes of course, I totally understand that.
At this point I'm pretty disappointed in and tired of how we write and discuss NEPs on technical topics like dispatching, dtypes and the like. People literally refuse to write down concrete motivations, goals and non-goals, code that's problematic now and will be better/working post-NEP and usage examples before launching into extensive discussion of the gory details of the internals. I'm not sure what to do about it.
Honestly, I don't really understand this. From my perspective, there are two ways to deal with such things:
1. Templates are to be taken mainly as _guidelines_ rather than _hardlines_, and the current text of NEP-35 definitely falls in the first category; 2. Templates are _hardlines_ and to be guided/enforced by maintainers at some point (maybe before merging the PR?).
If 2 is the desired case for NumPy, which sounds a lot like what is wanted from NEP-35 and other NEPs generally, maintainers should let the authors know as early as possible that something isn't following the template's hardlines and it should be corrected.
Yes agreed, maintainers should do this. It was always meant as something in between, "please follow but deviate if needed". If essential elements are missing, I think that should be flagged earlier going forward. As a concrete example: Stephan (the main author of __array_function__) was still fuzzy on the functions covered and whether it solves array coercion, in the last 24 hours*. You answered by pointing to concrete code in Dask and Xarray. That code, why it doesn't work well now but will work with like=, should be at the top of the NEP as concrete problem statement / code examples. It's quite unfortunate that no maintainer explicitly requested this many months ago. * https://github.com/numpy/numpy/pull/16935#issuecomment-673379038 I don't mean any of this to remove myself of any responsibility, but would
like to express my frustration that a 10 month-old NEP is only now getting so much pushback for being unclear after its implementation is nearing completion.
Totally understandable. I think part of the problem is that people only weigh in when they see concrete "this part is for you, and here's how you use it to solve problem X". As for me personally, if I'm saying things now that I didn't manage to respond to earlier (specific to your NEP), I apologize. 10 months ago I was in the middle of an intercontinental move and a new-ish job getting busier fast. Again, apologies and no criticism of your work.
I want to make an exception for merging the current NEP, for which the plan is to merge it as experimental to try in downstream PRs and get more experience. That does mean that master will be in an unreleasable state by the way, which is unusual and it'd be nice to get Chuck's explicit OK for that.
I don't quite understand this either, why would that leave master in an unreleasable state?
That's what Sebastian proposed yesterday: let's merge right now, open issues for all the things being brought up right now, and deal with them pre-1.20-release. I'm saying I'm fine with that, but then we actually need to go back and finalize the discussions before the next release. Cheers, Ralf
Best, Peter
On Thu, Aug 13, 2020 at 2:21 PM Ralf Gommers <ralf.gommers@gmail.com> wrote:
Thanks for raising these concerns Ilhan and Juan, and for answering
Peter. Let me give my perspective as well.
To start with, this is not specifically about Peter's NEP and PR. NEP 35
simply follows the pattern set by previous PRs, and given its tight scope is less difficult to understand than other NEPs on such technical topics. Peter has done a lot of things right, and is close to the finish line.
On Thu, Aug 13, 2020 at 12:02 PM Peter Andreas Entschev <
I think, arriving to an agreement would be much faster if there is an
executive summary of who this is intended for and what the regular usage is. Because with no offense, all I see is "dispatch", "_array_function_" and a lot of technical details of which I am absolutely ignorant.
This is what I intended to do in the Usage Guidance [2] section. Could you elaborate on what more information you'd want to see there? Or is it just a matter of reorganizing the NEP a bit to try and summarize such things right at the top?
We adapted the NEP template [6] several times last year to try and improve this. And specified in there as well that NEP content set to the mailing list should only contain the sections: Abstract, Motivation and Scope, Usage and Impact, and Backwards compatibility. This to ensure we fully understand the "why" and "what" before the "how". Unfortunately that template and procedure hasn't been exercised much yet, only in NEP 38 [7] and partially in NEP 41 [8].
If we have long-time maintainers of SciPy (Ilhan and myself), scikit-image (Juan) and CuPy (Leo, on the PR review) all saying they don't understand the goals, relevance, target audience, or how they're supposed to use a new feature, that indicates that the people doing the writing and having the discussion are doing something wrong at a very fundamental level.
At this point I'm pretty disappointed in and tired of how we write and discuss NEPs on technical topics like dispatching, dtypes and the like. People literally refuse to write down concrete motivations, goals and non-goals, code that's problematic now and will be better/working post-NEP and usage examples before launching into extensive discussion of the gory
peter@entschev.com> wrote: details of the internals. I'm not sure what to do about it. Completely separate API and behavior proposals from implementation proposals? Make separate "API" and "internals" teams with the likes of Juan, Ilhan and Leo on the API team which then needs to approve every API change in new NEPs? Offer to co-write NEPs if someone is willing but doesn't understand how to go about it? Keep the current structure/process but veto further approvals until NEP authors get it right?
I want to make an exception for merging the current NEP, for which the
plan is to merge it as experimental to try in downstream PRs and get more experience. That does mean that master will be in an unreleasable state by the way, which is unusual and it'd be nice to get Chuck's explicit OK for that. But after that, I think we need a change here. I would like to hear what everyone thinks is the shape that change should take - any of my above suggestions, or something else?
Finally as a minor point, I know we are mostly (ex-)academics but
TBH, I don't really know how to solve that point, so if you have any specific suggestions, that's certainly welcome. I understand the frustration for a reader trying to understand all the details, with many being only described in NEP-18 [3], but we also strive to avoid rewriting things that are written elsewhere, which would also overburden those who are aware of what's being discussed.
I also share Ilhan’s concern (and I mentioned this in a previous NEP
discussion) that NEPs are getting pretty inaccessible. In a sense these are difficult topics and readers should be expected to have *some* familiarity with the topics being discussed, but perhaps more effort should be put into
this necessity of formal language on NEPs is self-imposed (probably PEPs are to blame) and not quite helping. It can be a bit more descriptive in my external opinion. the context/motivation/background of a NEP before accepting it. One way to ensure this might be to require a final proofreading step by someone who has not been involved at all in the discussions, like peer review does for papers.
Some variant of this proposal would be my preference.
Cheers, Ralf
[1] https://github.com/numpy/numpy/issues/14441#issuecomment-529969572 [2]
[3] https://numpy.org/neps/nep-0018-array-function-protocol.html [4] https://numpy.org/neps/nep-0000.html#nep-workflow [5] https://mail.python.org/pipermail/numpy-discussion/2019-October/080176.html
[6] https://github.com/numpy/numpy/blob/master/doc/neps/nep-template.rst [7] https://github.com/numpy/numpy/blob/master/doc/neps/nep-0038-SIMD-optimizati... [8] https://github.com/numpy/numpy/blob/master/doc/neps/nep-0041-improved-dtype-...
On Thu, Aug 13, 2020 at 3:44 AM Juan Nunez-Iglesias <jni@fastmail.com>
wrote:
I’ve generally been on the “let the NumPy devs worry about it” side
of things, but I do agree with Ilhan that `like=` is confusing and `typeof=` would be a much more appropriate name for that parameter.
I do think library writers are NumPy users and so I wouldn’t really
make that distinction, though. Users writing their own analysis code could very well be interested in writing code using numpy functions that will
I also share Ilhan’s concern (and I mentioned this in a previous NEP
discussion) that NEPs are getting pretty inaccessible. In a sense these are difficult topics and readers should be expected to have *some* familiarity with the topics being discussed, but perhaps more effort should be put into
Food for thought.
Juan.
On 13 Aug 2020, at 9:24 am, Ilhan Polat <ilhanpolat@gmail.com> wrote:
For what is worth, as a potential consumer in SciPy, it really
doesn't say anything (both in NEP and the PR) about how the regular users of NumPy will benefit from this. If only and only 3rd parties are going to benefit from it, I am not sure adding a new keyword to an already confusing function is the right thing to do.
Let me clarify,
- This is already a very (I mean extremely very) easy keyword name to
confuse with ones_like, zeros_like and by its nature any other interpretation. It is not signalling anything about the functionality that is being discussed. I would seriously consider reserving such obvious names for really obvious tasks. Because you would also expect the shape and ndim would be mimicked by the "like"d argument but it turns out it is acting more like "typeof=" and not "like=" at all. Because if we follow the semantics it reads as "make your argument asarray like the other thing" but it is actually doing, "make your argument an array with the other thing's type" which might not be an array after all.
- Again, if this is meant for downstream libraries (because that's
what I got out of the PR discussion, cupy, dask, and JAX were the only examples I could read) then hiding it in another function and writing with capital letters "this is not meant for numpy users" would be a much more convenient way to separate the target audience and regular users. numpy.astypedarray([[some data], [...]], type_of=x) or whatever else it may be would be quite clean and to the point with no ambiguous keywords.
I think, arriving to an agreement would be much faster if there is an
executive summary of who this is intended for and what the regular usage is. Because with no offense, all I see is "dispatch", "_array_function_" and a lot of technical details of which I am absolutely ignorant.
Finally as a minor point, I know we are mostly (ex-)academics but
https://numpy.org/neps/nep-0035-array-creation-dispatch-with-array-function.... transparently work when the input is a CuPy array or whatever. the context/motivation/background of a NEP before accepting it. One way to ensure this might be to require a final proofreading step by someone who has not been involved at all in the discussions, like peer review does for papers. this necessity of formal language on NEPs is self-imposed (probably PEPs are to blame) and not quite helping. It can be a bit more descriptive in my external opinion.
_______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@python.org https://mail.python.org/mailman/listinfo/numpy-discussion
_______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@python.org https://mail.python.org/mailman/listinfo/numpy-discussion
Ralf, I know none of it is a criticism of my work or directly of anybody else's work. I was just making a couple of general points (or questions really): 1. What is accepted as a reasonably clear NEP? It seems to point that a NEP _must_ follow the Template 2. Should the NEP Template be followed as a hardline? Personally, I think that would be fine in general, and diverging seems to be only an option of when additional information is necessary, but less should not be acceptable. And to be perfectly clear, none of what I said is a criticism to anybody in particular, but it's a frustration about the process seemingly not clear in itself for either authors or maintainers, thus my two points above. I apologize if any of what I said so far has been taken as a personal criticism to someone, it was definitely not meant that way. Finally, I like Juan's previous suggestion that someone not involved in the discussion proof-reading would be a great idea, I'm not sure if that's achievable in practice though. However, I think that discussion is a bit out of context, so I'll try to address the unclear parts of this NEP in a PR and we could continue the general discussion of the NEP process in a different thread if people wish to do so. Best, Peter On Thu, Aug 13, 2020 at 4:13 PM Ralf Gommers <ralf.gommers@gmail.com> wrote:
On Thu, Aug 13, 2020 at 2:47 PM Peter Andreas Entschev <peter@entschev.com> wrote:
We adapted the NEP template [6] several times last year to try and improve this. And specified in there as well that NEP content set to the mailing list should only contain the sections: Abstract, Motivation and Scope, Usage and Impact, and Backwards compatibility. This to ensure we fully understand the "why" and "what" before the "how". Unfortunately that template and procedure hasn't been exercised much yet, only in NEP 38 [7] and partially in NEP 41 [8].
If we have long-time maintainers of SciPy (Ilhan and myself), scikit-image (Juan) and CuPy (Leo, on the PR review) all saying they don't understand the goals, relevance, target audience, or how they're supposed to use a new feature, that indicates that the people doing the writing and having the discussion are doing something wrong at a very fundamental level.
I'm more than happy to edit the NEP and try to clarify all the concerns.
Thanks Peter. Let me reiterate, you did a lot of things right, have been happy to adapt when given feedback, and your willingness to go back and fix things up now is much appreciated (and I'm happy to help). No criticism of your work or attitude intended, on the contract.
However, it gets pretty difficult to do so when I as an author don't understand where the difficulty is. Ilhan, Juan and Ralf now pointed out things that are missing/unclear, but no comment was made in that regard when I sent the NEP, my point being: I couldn't fix what I didn't know was a problem to others.
Yes of course, I totally understand that.
At this point I'm pretty disappointed in and tired of how we write and discuss NEPs on technical topics like dispatching, dtypes and the like. People literally refuse to write down concrete motivations, goals and non-goals, code that's problematic now and will be better/working post-NEP and usage examples before launching into extensive discussion of the gory details of the internals. I'm not sure what to do about it.
Honestly, I don't really understand this. From my perspective, there are two ways to deal with such things:
1. Templates are to be taken mainly as _guidelines_ rather than _hardlines_, and the current text of NEP-35 definitely falls in the first category; 2. Templates are _hardlines_ and to be guided/enforced by maintainers at some point (maybe before merging the PR?).
If 2 is the desired case for NumPy, which sounds a lot like what is wanted from NEP-35 and other NEPs generally, maintainers should let the authors know as early as possible that something isn't following the template's hardlines and it should be corrected.
Yes agreed, maintainers should do this. It was always meant as something in between, "please follow but deviate if needed". If essential elements are missing, I think that should be flagged earlier going forward.
As a concrete example: Stephan (the main author of __array_function__) was still fuzzy on the functions covered and whether it solves array coercion, in the last 24 hours*. You answered by pointing to concrete code in Dask and Xarray. That code, why it doesn't work well now but will work with like=, should be at the top of the NEP as concrete problem statement / code examples. It's quite unfortunate that no maintainer explicitly requested this many months ago.
* https://github.com/numpy/numpy/pull/16935#issuecomment-673379038
I don't mean any of this to remove myself of any responsibility, but would like to express my frustration that a 10 month-old NEP is only now getting so much pushback for being unclear after its implementation is nearing completion.
Totally understandable. I think part of the problem is that people only weigh in when they see concrete "this part is for you, and here's how you use it to solve problem X".
As for me personally, if I'm saying things now that I didn't manage to respond to earlier (specific to your NEP), I apologize. 10 months ago I was in the middle of an intercontinental move and a new-ish job getting busier fast. Again, apologies and no criticism of your work.
I want to make an exception for merging the current NEP, for which the plan is to merge it as experimental to try in downstream PRs and get more experience. That does mean that master will be in an unreleasable state by the way, which is unusual and it'd be nice to get Chuck's explicit OK for that.
I don't quite understand this either, why would that leave master in an unreleasable state?
That's what Sebastian proposed yesterday: let's merge right now, open issues for all the things being brought up right now, and deal with them pre-1.20-release. I'm saying I'm fine with that, but then we actually need to go back and finalize the discussions before the next release.
Cheers, Ralf
Best, Peter
On Thu, Aug 13, 2020 at 2:21 PM Ralf Gommers <ralf.gommers@gmail.com> wrote:
Thanks for raising these concerns Ilhan and Juan, and for answering Peter. Let me give my perspective as well.
To start with, this is not specifically about Peter's NEP and PR. NEP 35 simply follows the pattern set by previous PRs, and given its tight scope is less difficult to understand than other NEPs on such technical topics. Peter has done a lot of things right, and is close to the finish line.
On Thu, Aug 13, 2020 at 12:02 PM Peter Andreas Entschev <peter@entschev.com> wrote:
I think, arriving to an agreement would be much faster if there is an executive summary of who this is intended for and what the regular usage is. Because with no offense, all I see is "dispatch", "_array_function_" and a lot of technical details of which I am absolutely ignorant.
This is what I intended to do in the Usage Guidance [2] section. Could you elaborate on what more information you'd want to see there? Or is it just a matter of reorganizing the NEP a bit to try and summarize such things right at the top?
We adapted the NEP template [6] several times last year to try and improve this. And specified in there as well that NEP content set to the mailing list should only contain the sections: Abstract, Motivation and Scope, Usage and Impact, and Backwards compatibility. This to ensure we fully understand the "why" and "what" before the "how". Unfortunately that template and procedure hasn't been exercised much yet, only in NEP 38 [7] and partially in NEP 41 [8].
If we have long-time maintainers of SciPy (Ilhan and myself), scikit-image (Juan) and CuPy (Leo, on the PR review) all saying they don't understand the goals, relevance, target audience, or how they're supposed to use a new feature, that indicates that the people doing the writing and having the discussion are doing something wrong at a very fundamental level.
At this point I'm pretty disappointed in and tired of how we write and discuss NEPs on technical topics like dispatching, dtypes and the like. People literally refuse to write down concrete motivations, goals and non-goals, code that's problematic now and will be better/working post-NEP and usage examples before launching into extensive discussion of the gory details of the internals. I'm not sure what to do about it. Completely separate API and behavior proposals from implementation proposals? Make separate "API" and "internals" teams with the likes of Juan, Ilhan and Leo on the API team which then needs to approve every API change in new NEPs? Offer to co-write NEPs if someone is willing but doesn't understand how to go about it? Keep the current structure/process but veto further approvals until NEP authors get it right?
I want to make an exception for merging the current NEP, for which the plan is to merge it as experimental to try in downstream PRs and get more experience. That does mean that master will be in an unreleasable state by the way, which is unusual and it'd be nice to get Chuck's explicit OK for that. But after that, I think we need a change here. I would like to hear what everyone thinks is the shape that change should take - any of my above suggestions, or something else?
Finally as a minor point, I know we are mostly (ex-)academics but this necessity of formal language on NEPs is self-imposed (probably PEPs are to blame) and not quite helping. It can be a bit more descriptive in my external opinion.
TBH, I don't really know how to solve that point, so if you have any specific suggestions, that's certainly welcome. I understand the frustration for a reader trying to understand all the details, with many being only described in NEP-18 [3], but we also strive to avoid rewriting things that are written elsewhere, which would also overburden those who are aware of what's being discussed.
I also share Ilhan’s concern (and I mentioned this in a previous NEP discussion) that NEPs are getting pretty inaccessible. In a sense these are difficult topics and readers should be expected to have *some* familiarity with the topics being discussed, but perhaps more effort should be put into the context/motivation/background of a NEP before accepting it. One way to ensure this might be to require a final proofreading step by someone who has not been involved at all in the discussions, like peer review does for papers.
Some variant of this proposal would be my preference.
Cheers, Ralf
[1] https://github.com/numpy/numpy/issues/14441#issuecomment-529969572 [2] https://numpy.org/neps/nep-0035-array-creation-dispatch-with-array-function.... [3] https://numpy.org/neps/nep-0018-array-function-protocol.html [4] https://numpy.org/neps/nep-0000.html#nep-workflow [5] https://mail.python.org/pipermail/numpy-discussion/2019-October/080176.html
[6] https://github.com/numpy/numpy/blob/master/doc/neps/nep-template.rst [7] https://github.com/numpy/numpy/blob/master/doc/neps/nep-0038-SIMD-optimizati... [8] https://github.com/numpy/numpy/blob/master/doc/neps/nep-0041-improved-dtype-...
On Thu, Aug 13, 2020 at 3:44 AM Juan Nunez-Iglesias <jni@fastmail.com> wrote:
I’ve generally been on the “let the NumPy devs worry about it” side of things, but I do agree with Ilhan that `like=` is confusing and `typeof=` would be a much more appropriate name for that parameter.
I do think library writers are NumPy users and so I wouldn’t really make that distinction, though. Users writing their own analysis code could very well be interested in writing code using numpy functions that will transparently work when the input is a CuPy array or whatever.
I also share Ilhan’s concern (and I mentioned this in a previous NEP discussion) that NEPs are getting pretty inaccessible. In a sense these are difficult topics and readers should be expected to have *some* familiarity with the topics being discussed, but perhaps more effort should be put into the context/motivation/background of a NEP before accepting it. One way to ensure this might be to require a final proofreading step by someone who has not been involved at all in the discussions, like peer review does for papers.
Food for thought.
Juan.
On 13 Aug 2020, at 9:24 am, Ilhan Polat <ilhanpolat@gmail.com> wrote:
For what is worth, as a potential consumer in SciPy, it really doesn't say anything (both in NEP and the PR) about how the regular users of NumPy will benefit from this. If only and only 3rd parties are going to benefit from it, I am not sure adding a new keyword to an already confusing function is the right thing to do.
Let me clarify,
- This is already a very (I mean extremely very) easy keyword name to confuse with ones_like, zeros_like and by its nature any other interpretation. It is not signalling anything about the functionality that is being discussed. I would seriously consider reserving such obvious names for really obvious tasks. Because you would also expect the shape and ndim would be mimicked by the "like"d argument but it turns out it is acting more like "typeof=" and not "like=" at all. Because if we follow the semantics it reads as "make your argument asarray like the other thing" but it is actually doing, "make your argument an array with the other thing's type" which might not be an array after all.
- Again, if this is meant for downstream libraries (because that's what I got out of the PR discussion, cupy, dask, and JAX were the only examples I could read) then hiding it in another function and writing with capital letters "this is not meant for numpy users" would be a much more convenient way to separate the target audience and regular users. numpy.astypedarray([[some data], [...]], type_of=x) or whatever else it may be would be quite clean and to the point with no ambiguous keywords.
I think, arriving to an agreement would be much faster if there is an executive summary of who this is intended for and what the regular usage is. Because with no offense, all I see is "dispatch", "_array_function_" and a lot of technical details of which I am absolutely ignorant.
Finally as a minor point, I know we are mostly (ex-)academics but this necessity of formal language on NEPs is self-imposed (probably PEPs are to blame) and not quite helping. It can be a bit more descriptive in my external opinion.
_______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@python.org https://mail.python.org/mailman/listinfo/numpy-discussion
_______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@python.org https://mail.python.org/mailman/listinfo/numpy-discussion
_______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@python.org https://mail.python.org/mailman/listinfo/numpy-discussion
On Thu, 2020-08-13 at 15:47 +0200, Peter Andreas Entschev wrote:
We adapted the NEP template [6] several times last year to try and improve this. And specified in there as well that NEP content set to the mailing list should only contain the sections: Abstract, Motivation and Scope, Usage and Impact, and Backwards compatibility. This to ensure we fully understand the "why" and "what" before the "how". Unfortunately that template and procedure hasn't been exercised much yet, only in NEP 38 [7] and partially in NEP 41 [8].
If we have long-time maintainers of SciPy (Ilhan and myself), scikit-image (Juan) and CuPy (Leo, on the PR review) all saying they don't understand the goals, relevance, target audience, or how they're supposed to use a new feature, that indicates that the people doing the writing and having the discussion are doing something wrong at a very fundamental level.
I'm more than happy to edit the NEP and try to clarify all the concerns. However, it gets pretty difficult to do so when I as an author don't understand where the difficulty is. Ilhan, Juan and Ralf now pointed out things that are missing/unclear, but no comment was made in that regard when I sent the NEP, my point being: I couldn't fix what I didn't know was a problem to others.
At this point I'm pretty disappointed in and tired of how we write and discuss NEPs on technical topics like dispatching, dtypes and the like. People literally refuse to write down concrete motivations, goals and non-goals, code that's problematic now and will be better/working post-NEP and usage examples before launching into extensive discussion of the gory details of the internals. I'm not sure what to do about it.
Honestly, I don't really understand this. From my perspective, there are two ways to deal with such things:
1. Templates are to be taken mainly as _guidelines_ rather than _hardlines_, and the current text of NEP-35 definitely falls in the first category; 2. Templates are _hardlines_ and to be guided/enforced by maintainers at some point (maybe before merging the PR?).
If 2 is the desired case for NumPy, which sounds a lot like what is wanted from NEP-35 and other NEPs generally, maintainers should let the authors know as early as possible that something isn't following the template's hardlines and it should be corrected. I don't mean any of this to remove myself of any responsibility, but would like to express my frustration that a 10 month-old NEP is only now getting so much pushback for being unclear after its implementation is nearing completion.
I want to make an exception for merging the current NEP, for which the plan is to merge it as experimental to try in downstream PRs and get more experience. That does mean that master will be in an unreleasable state by the way, which is unusual and it'd be nice to get Chuck's explicit OK for that.
I don't quite understand this either, why would that leave master in an unreleasable state?
Well, a few points are not discussed to the end yet. The name is one that did not get much attention yet. Maybe because nobody had much concerns about it yet, or maybe it was just lower on the priority list. To be clear: I am fully prepared to pull this out of master before release or probably rather disable it in release versions. An alternative could be an environment variable (an env variable will not stop actual adoption, but we may be fine with that). And unless NEP 35 is accepted, that probably has to be the default, fortunately there is still some time until the next release. - Sebastian
Best, Peter
On Thu, Aug 13, 2020 at 2:21 PM Ralf Gommers <ralf.gommers@gmail.com> wrote:
Thanks for raising these concerns Ilhan and Juan, and for answering Peter. Let me give my perspective as well.
To start with, this is not specifically about Peter's NEP and PR. NEP 35 simply follows the pattern set by previous PRs, and given its tight scope is less difficult to understand than other NEPs on such technical topics. Peter has done a lot of things right, and is close to the finish line.
On Thu, Aug 13, 2020 at 12:02 PM Peter Andreas Entschev < peter@entschev.com> wrote:
I think, arriving to an agreement would be much faster if there is an executive summary of who this is intended for and what the regular usage is. Because with no offense, all I see is "dispatch", "_array_function_" and a lot of technical details of which I am absolutely ignorant.
This is what I intended to do in the Usage Guidance [2] section. Could you elaborate on what more information you'd want to see there? Or is it just a matter of reorganizing the NEP a bit to try and summarize such things right at the top?
We adapted the NEP template [6] several times last year to try and improve this. And specified in there as well that NEP content set to the mailing list should only contain the sections: Abstract, Motivation and Scope, Usage and Impact, and Backwards compatibility. This to ensure we fully understand the "why" and "what" before the "how". Unfortunately that template and procedure hasn't been exercised much yet, only in NEP 38 [7] and partially in NEP 41 [8].
If we have long-time maintainers of SciPy (Ilhan and myself), scikit-image (Juan) and CuPy (Leo, on the PR review) all saying they don't understand the goals, relevance, target audience, or how they're supposed to use a new feature, that indicates that the people doing the writing and having the discussion are doing something wrong at a very fundamental level.
At this point I'm pretty disappointed in and tired of how we write and discuss NEPs on technical topics like dispatching, dtypes and the like. People literally refuse to write down concrete motivations, goals and non-goals, code that's problematic now and will be better/working post-NEP and usage examples before launching into extensive discussion of the gory details of the internals. I'm not sure what to do about it. Completely separate API and behavior proposals from implementation proposals? Make separate "API" and "internals" teams with the likes of Juan, Ilhan and Leo on the API team which then needs to approve every API change in new NEPs? Offer to co-write NEPs if someone is willing but doesn't understand how to go about it? Keep the current structure/process but veto further approvals until NEP authors get it right?
I want to make an exception for merging the current NEP, for which the plan is to merge it as experimental to try in downstream PRs and get more experience. That does mean that master will be in an unreleasable state by the way, which is unusual and it'd be nice to get Chuck's explicit OK for that. But after that, I think we need a change here. I would like to hear what everyone thinks is the shape that change should take - any of my above suggestions, or something else?
Finally as a minor point, I know we are mostly (ex-)academics but this necessity of formal language on NEPs is self-imposed (probably PEPs are to blame) and not quite helping. It can be a bit more descriptive in my external opinion.
TBH, I don't really know how to solve that point, so if you have any specific suggestions, that's certainly welcome. I understand the frustration for a reader trying to understand all the details, with many being only described in NEP-18 [3], but we also strive to avoid rewriting things that are written elsewhere, which would also overburden those who are aware of what's being discussed.
I also share Ilhan’s concern (and I mentioned this in a previous NEP discussion) that NEPs are getting pretty inaccessible. In a sense these are difficult topics and readers should be expected to have *some* familiarity with the topics being discussed, but perhaps more effort should be put into the context/motivation/background of a NEP before accepting it. One way to ensure this might be to require a final proofreading step by someone who has not been involved at all in the discussions, like peer review does for papers.
Some variant of this proposal would be my preference.
Cheers, Ralf
[1] https://github.com/numpy/numpy/issues/14441#issuecomment-529969572 [2] https://numpy.org/neps/nep-0035-array-creation-dispatch-with-array-function.... [3] https://numpy.org/neps/nep-0018-array-function-protocol.html [4] https://numpy.org/neps/nep-0000.html#nep-workflow [5] https://mail.python.org/pipermail/numpy-discussion/2019-October/080176.html
[6] https://github.com/numpy/numpy/blob/master/doc/neps/nep-template.rst [7] https://github.com/numpy/numpy/blob/master/doc/neps/nep-0038-SIMD-optimizati... [8] https://github.com/numpy/numpy/blob/master/doc/neps/nep-0041-improved-dtype-...
On Thu, Aug 13, 2020 at 3:44 AM Juan Nunez-Iglesias < jni@fastmail.com> wrote:
I’ve generally been on the “let the NumPy devs worry about it” side of things, but I do agree with Ilhan that `like=` is confusing and `typeof=` would be a much more appropriate name for that parameter.
I do think library writers are NumPy users and so I wouldn’t really make that distinction, though. Users writing their own analysis code could very well be interested in writing code using numpy functions that will transparently work when the input is a CuPy array or whatever.
I also share Ilhan’s concern (and I mentioned this in a previous NEP discussion) that NEPs are getting pretty inaccessible. In a sense these are difficult topics and readers should be expected to have *some* familiarity with the topics being discussed, but perhaps more effort should be put into the context/motivation/background of a NEP before accepting it. One way to ensure this might be to require a final proofreading step by someone who has not been involved at all in the discussions, like peer review does for papers.
Food for thought.
Juan.
On 13 Aug 2020, at 9:24 am, Ilhan Polat <ilhanpolat@gmail.com> wrote:
For what is worth, as a potential consumer in SciPy, it really doesn't say anything (both in NEP and the PR) about how the regular users of NumPy will benefit from this. If only and only 3rd parties are going to benefit from it, I am not sure adding a new keyword to an already confusing function is the right thing to do.
Let me clarify,
- This is already a very (I mean extremely very) easy keyword name to confuse with ones_like, zeros_like and by its nature any other interpretation. It is not signalling anything about the functionality that is being discussed. I would seriously consider reserving such obvious names for really obvious tasks. Because you would also expect the shape and ndim would be mimicked by the "like"d argument but it turns out it is acting more like "typeof=" and not "like=" at all. Because if we follow the semantics it reads as "make your argument asarray like the other thing" but it is actually doing, "make your argument an array with the other thing's type" which might not be an array after all.
- Again, if this is meant for downstream libraries (because that's what I got out of the PR discussion, cupy, dask, and JAX were the only examples I could read) then hiding it in another function and writing with capital letters "this is not meant for numpy users" would be a much more convenient way to separate the target audience and regular users. numpy.astypedarray([[some data], [...]], type_of=x) or whatever else it may be would be quite clean and to the point with no ambiguous keywords.
I think, arriving to an agreement would be much faster if there is an executive summary of who this is intended for and what the regular usage is. Because with no offense, all I see is "dispatch", "_array_function_" and a lot of technical details of which I am absolutely ignorant.
Finally as a minor point, I know we are mostly (ex-)academics but this necessity of formal language on NEPs is self-imposed (probably PEPs are to blame) and not quite helping. It can be a bit more descriptive in my external opinion.
_______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@python.org https://mail.python.org/mailman/listinfo/numpy-discussion
NumPy-Discussion mailing list NumPy-Discussion@python.org https://mail.python.org/mailman/listinfo/numpy-discussion
On Thu, Aug 13, 2020 at 5:22 AM Ralf Gommers <ralf.gommers@gmail.com> wrote:
Thanks for raising these concerns Ilhan and Juan, and for answering Peter. Let me give my perspective as well.
To start with, this is not specifically about Peter's NEP and PR. NEP 35 simply follows the pattern set by previous PRs, and given its tight scope is less difficult to understand than other NEPs on such technical topics. Peter has done a lot of things right, and is close to the finish line.
On Thu, Aug 13, 2020 at 12:02 PM Peter Andreas Entschev < peter@entschev.com> wrote:
I think, arriving to an agreement would be much faster if there is an executive summary of who this is intended for and what the regular usage is. Because with no offense, all I see is "dispatch", "_array_function_" and a lot of technical details of which I am absolutely ignorant.
This is what I intended to do in the Usage Guidance [2] section. Could you elaborate on what more information you'd want to see there? Or is it just a matter of reorganizing the NEP a bit to try and summarize such things right at the top?
We adapted the NEP template [6] several times last year to try and improve this. And specified in there as well that NEP content set to the mailing list should only contain the sections: Abstract, Motivation and Scope, Usage and Impact, and Backwards compatibility. This to ensure we fully understand the "why" and "what" before the "how". Unfortunately that template and procedure hasn't been exercised much yet, only in NEP 38 [7] and partially in NEP 41 [8].
If we have long-time maintainers of SciPy (Ilhan and myself), scikit-image (Juan) and CuPy (Leo, on the PR review) all saying they don't understand the goals, relevance, target audience, or how they're supposed to use a new feature, that indicates that the people doing the writing and having the discussion are doing something wrong at a very fundamental level.
At this point I'm pretty disappointed in and tired of how we write and discuss NEPs on technical topics like dispatching, dtypes and the like. People literally refuse to write down concrete motivations, goals and non-goals, code that's problematic now and will be better/working post-NEP and usage examples before launching into extensive discussion of the gory details of the internals. I'm not sure what to do about it. Completely separate API and behavior proposals from implementation proposals? Make separate "API" and "internals" teams with the likes of Juan, Ilhan and Leo on the API team which then needs to approve every API change in new NEPs? Offer to co-write NEPs if someone is willing but doesn't understand how to go about it? Keep the current structure/process but veto further approvals until NEP authors get it right?
I think the NEP template is great, and we should try to be more diligent about following it! My own NEP 37 (__array_module__) is probably a good example of poor presentation due to not following the template structure. It goes pretty deep into low-level motivation and some implementation details before usage examples. Speaking just for myself, I would have appreciated a friendly nudge to use the template. Certainly I think it would be fine to require using the template for newly submitted NEPs. I did not remember about it when I started drafting NEP 37, and it definitely would have helped. I may still try to do a revision at some point to use the template structure.
I want to make an exception for merging the current NEP, for which the plan is to merge it as experimental to try in downstream PRs and get more experience. That does mean that master will be in an unreleasable state by the way, which is unusual and it'd be nice to get Chuck's explicit OK for that. But after that, I think we need a change here. I would like to hear what everyone thinks is the shape that change should take - any of my above suggestions, or something else?
Finally as a minor point, I know we are mostly (ex-)academics but this necessity of formal language on NEPs is self-imposed (probably PEPs are to blame) and not quite helping. It can be a bit more descriptive in my external opinion.
TBH, I don't really know how to solve that point, so if you have any specific suggestions, that's certainly welcome. I understand the frustration for a reader trying to understand all the details, with many being only described in NEP-18 [3], but we also strive to avoid rewriting things that are written elsewhere, which would also overburden those who are aware of what's being discussed.
I also share Ilhan’s concern (and I mentioned this in a previous NEP discussion) that NEPs are getting pretty inaccessible. In a sense these are difficult topics and readers should be expected to have *some* familiarity with the topics being discussed, but perhaps more effort should be put into the context/motivation/background of a NEP before accepting it. One way to ensure this might be to require a final proofreading step by someone who has not been involved at all in the discussions, like peer review does for papers.
Some variant of this proposal would be my preference.
Cheers, Ralf
[1] https://github.com/numpy/numpy/issues/14441#issuecomment-529969572 [2] https://numpy.org/neps/nep-0035-array-creation-dispatch-with-array-function.... [3] https://numpy.org/neps/nep-0018-array-function-protocol.html [4] https://numpy.org/neps/nep-0000.html#nep-workflow [5] https://mail.python.org/pipermail/numpy-discussion/2019-October/080176.html
[6] https://github.com/numpy/numpy/blob/master/doc/neps/nep-template.rst [7] https://github.com/numpy/numpy/blob/master/doc/neps/nep-0038-SIMD-optimizati... [8] https://github.com/numpy/numpy/blob/master/doc/neps/nep-0041-improved-dtype-...
On Thu, Aug 13, 2020 at 3:44 AM Juan Nunez-Iglesias <jni@fastmail.com> wrote:
I’ve generally been on the “let the NumPy devs worry about it” side of
things, but I do agree with Ilhan that `like=` is confusing and `typeof=` would be a much more appropriate name for that parameter.
I do think library writers are NumPy users and so I wouldn’t really
make that distinction, though. Users writing their own analysis code could very well be interested in writing code using numpy functions that will transparently work when the input is a CuPy array or whatever.
I also share Ilhan’s concern (and I mentioned this in a previous NEP
discussion) that NEPs are getting pretty inaccessible. In a sense these are difficult topics and readers should be expected to have *some* familiarity with the topics being discussed, but perhaps more effort should be put into the context/motivation/background of a NEP before accepting it. One way to ensure this might be to require a final proofreading step by someone who has not been involved at all in the discussions, like peer review does for papers.
Food for thought.
Juan.
On 13 Aug 2020, at 9:24 am, Ilhan Polat <ilhanpolat@gmail.com> wrote:
For what is worth, as a potential consumer in SciPy, it really doesn't
say anything (both in NEP and the PR) about how the regular users of NumPy will benefit from this. If only and only 3rd parties are going to benefit from it, I am not sure adding a new keyword to an already confusing function is the right thing to do.
Let me clarify,
- This is already a very (I mean extremely very) easy keyword name to
confuse with ones_like, zeros_like and by its nature any other interpretation. It is not signalling anything about the functionality that is being discussed. I would seriously consider reserving such obvious names for really obvious tasks. Because you would also expect the shape and ndim would be mimicked by the "like"d argument but it turns out it is acting more like "typeof=" and not "like=" at all. Because if we follow the semantics it reads as "make your argument asarray like the other thing" but it is actually doing, "make your argument an array with the other thing's type" which might not be an array after all.
- Again, if this is meant for downstream libraries (because that's what
I got out of the PR discussion, cupy, dask, and JAX were the only examples I could read) then hiding it in another function and writing with capital letters "this is not meant for numpy users" would be a much more convenient way to separate the target audience and regular users. numpy.astypedarray([[some data], [...]], type_of=x) or whatever else it may be would be quite clean and to the point with no ambiguous keywords.
I think, arriving to an agreement would be much faster if there is an
executive summary of who this is intended for and what the regular usage is. Because with no offense, all I see is "dispatch", "_array_function_" and a lot of technical details of which I am absolutely ignorant.
Finally as a minor point, I know we are mostly (ex-)academics but this
necessity of formal language on NEPs is self-imposed (probably PEPs are to blame) and not quite helping. It can be a bit more descriptive in my external opinion.
_______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@python.org https://mail.python.org/mailman/listinfo/numpy-discussion
Hello everyone again! A few clarifications about my proposal of external peer review: - Yes, all this work is public and announced on the mailing list. However, I don’t think there’s a single person in this discussion or even this whole ecosystem that does not have a more immediately-pressing and also virtually infinite to-do list, so it’s unreasonable to expect that generally they would do more than glance at the stuff in the mailing list. In the peer review analogy, the mailing list is like the arXiv or Biorxiv stream — yep, anyone can see the stuff on there and comment, but most people just don’t have the time or attention to grab onto that. The only reason I stopped to comment here is Sebastian’s “Imma merge, YOLO!”, which had me raising my eyebrows real high. 😂 Especially for something that would expand the NumPy API! - So, my proposal is that there needs to be an *editor* of NEPs who takes responsibility, once they are themselves satisfied with the NEP, for seeking out external reviewers and pinging them individually and asking them if they would be ok to review. - A good friend who does screenwriting once told me, “don’t use all your proofreaders at once”. You want to get feedback, improve things, then feedback from a *totally independent* new person who can see the document with fresh eyes. Obviously, all of the above slows things down. But “alone we go fast, together we go far”. The point of a NEP is to document critical decisions for the long term health of the project. If the documentation is insufficient, it defeats the whole purpose. Might as well just implement stuff and skip the whole NEP process. (Side note: Stephan, I for one would definitely appreciate an update to existing NEPs if there’s obvious ways they can be improved!) I do think that NEP templates should be strict, and I don’t think that is incompatible with plain, jargon-free text. The NEP template and guidelines should specify that, and that the motivation should be understandable by a casual NumPy user — the kind described by Ilhan, for whom bare NumPy actually meets all their needs. Maybe they’ve also used PyTorch but they’ve never really had cause to mix them or write a program that worked with both kinds of arrays. Ditto for backwards compatibility — everyone should be clear when their existing code is going to be broken. Actually NEP18 broke so much of my code, but its Backward compatibility section basically says all good! https://numpy.org/neps/nep-0018-array-function-protocol.html#backward-compat... <https://numpy.org/neps/nep-0018-array-function-protocol.html#backward-compatibility> Anywho, as always, none of this is criticism to work done — I thank you all, and am eternally grateful for all the hard work everyone is doing to keep the ecosystem from fragmenting. I’m just hoping that this discussion can improve the process going forward! And, yes, apologies to Peter, I know from repeated personal experience how frustrating it can be to have last-minute drive-by objections after months of consensus building! But I think in the end every time that happened the end result was better — I hope the same is true here! And yes, I’ll reiterate Ralf’s point: my concerns are about the NEP process itself rather than this one. I’ll summarise my proposal: - strict NEP template. NEPs with missing sections will not be accepted. - sections Abstract, Motivation, and Backwards Compatibility should be understandable at a high level by casual users with ~zero background on the topic - enforce the above with at least two independent rounds of coordinated peer review. Thank you, Juan.
On 14 Aug 2020, at 5:29 am, Stephan Hoyer <shoyer@gmail.com> wrote:
On Thu, Aug 13, 2020 at 5:22 AM Ralf Gommers <ralf.gommers@gmail.com <mailto:ralf.gommers@gmail.com>> wrote: Thanks for raising these concerns Ilhan and Juan, and for answering Peter. Let me give my perspective as well.
To start with, this is not specifically about Peter's NEP and PR. NEP 35 simply follows the pattern set by previous PRs, and given its tight scope is less difficult to understand than other NEPs on such technical topics. Peter has done a lot of things right, and is close to the finish line.
On Thu, Aug 13, 2020 at 12:02 PM Peter Andreas Entschev <peter@entschev.com <mailto:peter@entschev.com>> wrote:
I think, arriving to an agreement would be much faster if there is an executive summary of who this is intended for and what the regular usage is. Because with no offense, all I see is "dispatch", "_array_function_" and a lot of technical details of which I am absolutely ignorant.
This is what I intended to do in the Usage Guidance [2] section. Could you elaborate on what more information you'd want to see there? Or is it just a matter of reorganizing the NEP a bit to try and summarize such things right at the top?
We adapted the NEP template [6] several times last year to try and improve this. And specified in there as well that NEP content set to the mailing list should only contain the sections: Abstract, Motivation and Scope, Usage and Impact, and Backwards compatibility. This to ensure we fully understand the "why" and "what" before the "how". Unfortunately that template and procedure hasn't been exercised much yet, only in NEP 38 [7] and partially in NEP 41 [8].
If we have long-time maintainers of SciPy (Ilhan and myself), scikit-image (Juan) and CuPy (Leo, on the PR review) all saying they don't understand the goals, relevance, target audience, or how they're supposed to use a new feature, that indicates that the people doing the writing and having the discussion are doing something wrong at a very fundamental level.
At this point I'm pretty disappointed in and tired of how we write and discuss NEPs on technical topics like dispatching, dtypes and the like. People literally refuse to write down concrete motivations, goals and non-goals, code that's problematic now and will be better/working post-NEP and usage examples before launching into extensive discussion of the gory details of the internals. I'm not sure what to do about it. Completely separate API and behavior proposals from implementation proposals? Make separate "API" and "internals" teams with the likes of Juan, Ilhan and Leo on the API team which then needs to approve every API change in new NEPs? Offer to co-write NEPs if someone is willing but doesn't understand how to go about it? Keep the current structure/process but veto further approvals until NEP authors get it right?
I think the NEP template is great, and we should try to be more diligent about following it!
My own NEP 37 (__array_module__) is probably a good example of poor presentation due to not following the template structure. It goes pretty deep into low-level motivation and some implementation details before usage examples.
Speaking just for myself, I would have appreciated a friendly nudge to use the template. Certainly I think it would be fine to require using the template for newly submitted NEPs. I did not remember about it when I started drafting NEP 37, and it definitely would have helped. I may still try to do a revision at some point to use the template structure.
I want to make an exception for merging the current NEP, for which the plan is to merge it as experimental to try in downstream PRs and get more experience. That does mean that master will be in an unreleasable state by the way, which is unusual and it'd be nice to get Chuck's explicit OK for that. But after that, I think we need a change here. I would like to hear what everyone thinks is the shape that change should take - any of my above suggestions, or something else?
Finally as a minor point, I know we are mostly (ex-)academics but this necessity of formal language on NEPs is self-imposed (probably PEPs are to blame) and not quite helping. It can be a bit more descriptive in my external opinion.
TBH, I don't really know how to solve that point, so if you have any specific suggestions, that's certainly welcome. I understand the frustration for a reader trying to understand all the details, with many being only described in NEP-18 [3], but we also strive to avoid rewriting things that are written elsewhere, which would also overburden those who are aware of what's being discussed.
I also share Ilhan’s concern (and I mentioned this in a previous NEP discussion) that NEPs are getting pretty inaccessible. In a sense these are difficult topics and readers should be expected to have *some* familiarity with the topics being discussed, but perhaps more effort should be put into the context/motivation/background of a NEP before accepting it. One way to ensure this might be to require a final proofreading step by someone who has not been involved at all in the discussions, like peer review does for papers.
Some variant of this proposal would be my preference.
Cheers, Ralf
[1] https://github.com/numpy/numpy/issues/14441#issuecomment-529969572 <https://github.com/numpy/numpy/issues/14441#issuecomment-529969572> [2] https://numpy.org/neps/nep-0035-array-creation-dispatch-with-array-function.... <https://numpy.org/neps/nep-0035-array-creation-dispatch-with-array-function.html#usage-guidance> [3] https://numpy.org/neps/nep-0018-array-function-protocol.html <https://numpy.org/neps/nep-0018-array-function-protocol.html> [4] https://numpy.org/neps/nep-0000.html#nep-workflow <https://numpy.org/neps/nep-0000.html#nep-workflow> [5] https://mail.python.org/pipermail/numpy-discussion/2019-October/080176.html <https://mail.python.org/pipermail/numpy-discussion/2019-October/080176.html>
[6] https://github.com/numpy/numpy/blob/master/doc/neps/nep-template.rst <https://github.com/numpy/numpy/blob/master/doc/neps/nep-template.rst> [7] https://github.com/numpy/numpy/blob/master/doc/neps/nep-0038-SIMD-optimizati... <https://github.com/numpy/numpy/blob/master/doc/neps/nep-0038-SIMD-optimizations.rst> [8] https://github.com/numpy/numpy/blob/master/doc/neps/nep-0041-improved-dtype-... <https://github.com/numpy/numpy/blob/master/doc/neps/nep-0041-improved-dtype-support.rst>
On Thu, Aug 13, 2020 at 3:44 AM Juan Nunez-Iglesias <jni@fastmail.com <mailto:jni@fastmail.com>> wrote:
I’ve generally been on the “let the NumPy devs worry about it” side of things, but I do agree with Ilhan that `like=` is confusing and `typeof=` would be a much more appropriate name for that parameter.
I do think library writers are NumPy users and so I wouldn’t really make that distinction, though. Users writing their own analysis code could very well be interested in writing code using numpy functions that will transparently work when the input is a CuPy array or whatever.
I also share Ilhan’s concern (and I mentioned this in a previous NEP discussion) that NEPs are getting pretty inaccessible. In a sense these are difficult topics and readers should be expected to have *some* familiarity with the topics being discussed, but perhaps more effort should be put into the context/motivation/background of a NEP before accepting it. One way to ensure this might be to require a final proofreading step by someone who has not been involved at all in the discussions, like peer review does for papers.
Food for thought.
Juan.
On 13 Aug 2020, at 9:24 am, Ilhan Polat <ilhanpolat@gmail.com <mailto:ilhanpolat@gmail.com>> wrote:
For what is worth, as a potential consumer in SciPy, it really doesn't say anything (both in NEP and the PR) about how the regular users of NumPy will benefit from this. If only and only 3rd parties are going to benefit from it, I am not sure adding a new keyword to an already confusing function is the right thing to do.
Let me clarify,
- This is already a very (I mean extremely very) easy keyword name to confuse with ones_like, zeros_like and by its nature any other interpretation. It is not signalling anything about the functionality that is being discussed. I would seriously consider reserving such obvious names for really obvious tasks. Because you would also expect the shape and ndim would be mimicked by the "like"d argument but it turns out it is acting more like "typeof=" and not "like=" at all. Because if we follow the semantics it reads as "make your argument asarray like the other thing" but it is actually doing, "make your argument an array with the other thing's type" which might not be an array after all.
- Again, if this is meant for downstream libraries (because that's what I got out of the PR discussion, cupy, dask, and JAX were the only examples I could read) then hiding it in another function and writing with capital letters "this is not meant for numpy users" would be a much more convenient way to separate the target audience and regular users. numpy.astypedarray([[some data], [...]], type_of=x) or whatever else it may be would be quite clean and to the point with no ambiguous keywords.
I think, arriving to an agreement would be much faster if there is an executive summary of who this is intended for and what the regular usage is. Because with no offense, all I see is "dispatch", "_array_function_" and a lot of technical details of which I am absolutely ignorant.
Finally as a minor point, I know we are mostly (ex-)academics but this necessity of formal language on NEPs is self-imposed (probably PEPs are to blame) and not quite helping. It can be a bit more descriptive in my external opinion.
_______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@python.org <mailto:NumPy-Discussion@python.org> https://mail.python.org/mailman/listinfo/numpy-discussion <https://mail.python.org/mailman/listinfo/numpy-discussion> _______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@python.org <mailto:NumPy-Discussion@python.org> https://mail.python.org/mailman/listinfo/numpy-discussion <https://mail.python.org/mailman/listinfo/numpy-discussion>
Hi all, This thread has IMO drifted very far from its original purpose, due to that I decided to start a new thread specifically for the general NEP procedure discussed, please check your mail for "NEP Procedure Discussion" subject. On the topic of this thread, I'll try to rewrite NEP-35 to make it more accessible and ping back here once I have a PR for that. Is there anything else that's pressing here? If there is and I missed/forgot about it, please let me know. Best, Peter On Fri, Aug 14, 2020 at 5:00 AM Juan Nunez-Iglesias <jni@fastmail.com> wrote:
Hello everyone again!
A few clarifications about my proposal of external peer review:
- Yes, all this work is public and announced on the mailing list. However, I don’t think there’s a single person in this discussion or even this whole ecosystem that does not have a more immediately-pressing and also virtually infinite to-do list, so it’s unreasonable to expect that generally they would do more than glance at the stuff in the mailing list. In the peer review analogy, the mailing list is like the arXiv or Biorxiv stream — yep, anyone can see the stuff on there and comment, but most people just don’t have the time or attention to grab onto that. The only reason I stopped to comment here is Sebastian’s “Imma merge, YOLO!”, which had me raising my eyebrows real high. 😂 Especially for something that would expand the NumPy API!
- So, my proposal is that there needs to be an *editor* of NEPs who takes responsibility, once they are themselves satisfied with the NEP, for seeking out external reviewers and pinging them individually and asking them if they would be ok to review.
- A good friend who does screenwriting once told me, “don’t use all your proofreaders at once”. You want to get feedback, improve things, then feedback from a *totally independent* new person who can see the document with fresh eyes.
Obviously, all of the above slows things down. But “alone we go fast, together we go far”. The point of a NEP is to document critical decisions for the long term health of the project. If the documentation is insufficient, it defeats the whole purpose. Might as well just implement stuff and skip the whole NEP process. (Side note: Stephan, I for one would definitely appreciate an update to existing NEPs if there’s obvious ways they can be improved!)
I do think that NEP templates should be strict, and I don’t think that is incompatible with plain, jargon-free text. The NEP template and guidelines should specify that, and that the motivation should be understandable by a casual NumPy user — the kind described by Ilhan, for whom bare NumPy actually meets all their needs. Maybe they’ve also used PyTorch but they’ve never really had cause to mix them or write a program that worked with both kinds of arrays.
Ditto for backwards compatibility — everyone should be clear when their existing code is going to be broken. Actually NEP18 broke so much of my code, but its Backward compatibility section basically says all good! https://numpy.org/neps/nep-0018-array-function-protocol.html#backward-compat...
Anywho, as always, none of this is criticism to work done — I thank you all, and am eternally grateful for all the hard work everyone is doing to keep the ecosystem from fragmenting. I’m just hoping that this discussion can improve the process going forward!
And, yes, apologies to Peter, I know from repeated personal experience how frustrating it can be to have last-minute drive-by objections after months of consensus building! But I think in the end every time that happened the end result was better — I hope the same is true here! And yes, I’ll reiterate Ralf’s point: my concerns are about the NEP process itself rather than this one. I’ll summarise my proposal:
- strict NEP template. NEPs with missing sections will not be accepted. - sections Abstract, Motivation, and Backwards Compatibility should be understandable at a high level by casual users with ~zero background on the topic - enforce the above with at least two independent rounds of coordinated peer review.
Thank you,
Juan.
On 14 Aug 2020, at 5:29 am, Stephan Hoyer <shoyer@gmail.com> wrote:
On Thu, Aug 13, 2020 at 5:22 AM Ralf Gommers <ralf.gommers@gmail.com> wrote:
Thanks for raising these concerns Ilhan and Juan, and for answering Peter. Let me give my perspective as well.
To start with, this is not specifically about Peter's NEP and PR. NEP 35 simply follows the pattern set by previous PRs, and given its tight scope is less difficult to understand than other NEPs on such technical topics. Peter has done a lot of things right, and is close to the finish line.
On Thu, Aug 13, 2020 at 12:02 PM Peter Andreas Entschev < peter@entschev.com> wrote:
I think, arriving to an agreement would be much faster if there is an executive summary of who this is intended for and what the regular usage is. Because with no offense, all I see is "dispatch", "_array_function_" and a lot of technical details of which I am absolutely ignorant.
This is what I intended to do in the Usage Guidance [2] section. Could you elaborate on what more information you'd want to see there? Or is it just a matter of reorganizing the NEP a bit to try and summarize such things right at the top?
We adapted the NEP template [6] several times last year to try and improve this. And specified in there as well that NEP content set to the mailing list should only contain the sections: Abstract, Motivation and Scope, Usage and Impact, and Backwards compatibility. This to ensure we fully understand the "why" and "what" before the "how". Unfortunately that template and procedure hasn't been exercised much yet, only in NEP 38 [7] and partially in NEP 41 [8].
If we have long-time maintainers of SciPy (Ilhan and myself), scikit-image (Juan) and CuPy (Leo, on the PR review) all saying they don't understand the goals, relevance, target audience, or how they're supposed to use a new feature, that indicates that the people doing the writing and having the discussion are doing something wrong at a very fundamental level.
At this point I'm pretty disappointed in and tired of how we write and discuss NEPs on technical topics like dispatching, dtypes and the like. People literally refuse to write down concrete motivations, goals and non-goals, code that's problematic now and will be better/working post-NEP and usage examples before launching into extensive discussion of the gory details of the internals. I'm not sure what to do about it. Completely separate API and behavior proposals from implementation proposals? Make separate "API" and "internals" teams with the likes of Juan, Ilhan and Leo on the API team which then needs to approve every API change in new NEPs? Offer to co-write NEPs if someone is willing but doesn't understand how to go about it? Keep the current structure/process but veto further approvals until NEP authors get it right?
I think the NEP template is great, and we should try to be more diligent about following it!
My own NEP 37 (__array_module__) is probably a good example of poor presentation due to not following the template structure. It goes pretty deep into low-level motivation and some implementation details before usage examples.
Speaking just for myself, I would have appreciated a friendly nudge to use the template. Certainly I think it would be fine to require using the template for newly submitted NEPs. I did not remember about it when I started drafting NEP 37, and it definitely would have helped. I may still try to do a revision at some point to use the template structure.
I want to make an exception for merging the current NEP, for which the plan is to merge it as experimental to try in downstream PRs and get more experience. That does mean that master will be in an unreleasable state by the way, which is unusual and it'd be nice to get Chuck's explicit OK for that. But after that, I think we need a change here. I would like to hear what everyone thinks is the shape that change should take - any of my above suggestions, or something else?
Finally as a minor point, I know we are mostly (ex-)academics but this necessity of formal language on NEPs is self-imposed (probably PEPs are to blame) and not quite helping. It can be a bit more descriptive in my external opinion.
TBH, I don't really know how to solve that point, so if you have any specific suggestions, that's certainly welcome. I understand the frustration for a reader trying to understand all the details, with many being only described in NEP-18 [3], but we also strive to avoid rewriting things that are written elsewhere, which would also overburden those who are aware of what's being discussed.
I also share Ilhan’s concern (and I mentioned this in a previous NEP discussion) that NEPs are getting pretty inaccessible. In a sense these are difficult topics and readers should be expected to have *some* familiarity with the topics being discussed, but perhaps more effort should be put into the context/motivation/background of a NEP before accepting it. One way to ensure this might be to require a final proofreading step by someone who has not been involved at all in the discussions, like peer review does for papers.
Some variant of this proposal would be my preference.
Cheers, Ralf
[1] https://github.com/numpy/numpy/issues/14441#issuecomment-529969572 [2] https://numpy.org/neps/nep-0035-array-creation-dispatch-with-array-function.... [3] https://numpy.org/neps/nep-0018-array-function-protocol.html [4] https://numpy.org/neps/nep-0000.html#nep-workflow [5] https://mail.python.org/pipermail/numpy-discussion/2019-October/080176.html
[6] https://github.com/numpy/numpy/blob/master/doc/neps/nep-template.rst [7] https://github.com/numpy/numpy/blob/master/doc/neps/nep-0038-SIMD-optimizati... [8] https://github.com/numpy/numpy/blob/master/doc/neps/nep-0041-improved-dtype-...
On Thu, Aug 13, 2020 at 3:44 AM Juan Nunez-Iglesias <jni@fastmail.com> wrote:
I’ve generally been on the “let the NumPy devs worry about it” side of
things, but I do agree with Ilhan that `like=` is confusing and `typeof=` would be a much more appropriate name for that parameter.
I do think library writers are NumPy users and so I wouldn’t really
make that distinction, though. Users writing their own analysis code could very well be interested in writing code using numpy functions that will transparently work when the input is a CuPy array or whatever.
I also share Ilhan’s concern (and I mentioned this in a previous NEP
discussion) that NEPs are getting pretty inaccessible. In a sense these are difficult topics and readers should be expected to have *some* familiarity with the topics being discussed, but perhaps more effort should be put into the context/motivation/background of a NEP before accepting it. One way to ensure this might be to require a final proofreading step by someone who has not been involved at all in the discussions, like peer review does for papers.
Food for thought.
Juan.
On 13 Aug 2020, at 9:24 am, Ilhan Polat <ilhanpolat@gmail.com> wrote:
For what is worth, as a potential consumer in SciPy, it really doesn't
say anything (both in NEP and the PR) about how the regular users of NumPy will benefit from this. If only and only 3rd parties are going to benefit from it, I am not sure adding a new keyword to an already confusing function is the right thing to do.
Let me clarify,
- This is already a very (I mean extremely very) easy keyword name to
confuse with ones_like, zeros_like and by its nature any other interpretation. It is not signalling anything about the functionality that is being discussed. I would seriously consider reserving such obvious names for really obvious tasks. Because you would also expect the shape and ndim would be mimicked by the "like"d argument but it turns out it is acting more like "typeof=" and not "like=" at all. Because if we follow the semantics it reads as "make your argument asarray like the other thing" but it is actually doing, "make your argument an array with the other thing's type" which might not be an array after all.
- Again, if this is meant for downstream libraries (because that's
what I got out of the PR discussion, cupy, dask, and JAX were the only examples I could read) then hiding it in another function and writing with capital letters "this is not meant for numpy users" would be a much more convenient way to separate the target audience and regular users. numpy.astypedarray([[some data], [...]], type_of=x) or whatever else it may be would be quite clean and to the point with no ambiguous keywords.
I think, arriving to an agreement would be much faster if there is an
executive summary of who this is intended for and what the regular usage is. Because with no offense, all I see is "dispatch", "_array_function_" and a lot of technical details of which I am absolutely ignorant.
Finally as a minor point, I know we are mostly (ex-)academics but this
necessity of formal language on NEPs is self-imposed (probably PEPs are to blame) and not quite helping. It can be a bit more descriptive in my external opinion.
_______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@python.org https://mail.python.org/mailman/listinfo/numpy-discussion
_______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@python.org https://mail.python.org/mailman/listinfo/numpy-discussion
_______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@python.org https://mail.python.org/mailman/listinfo/numpy-discussion
On Fri, Aug 14, 2020 at 12:23 PM Peter Andreas Entschev <peter@entschev.com> wrote:
Hi all,
This thread has IMO drifted very far from its original purpose, due to that I decided to start a new thread specifically for the general NEP procedure discussed, please check your mail for "NEP Procedure Discussion" subject.
Thanks Peter. For future reference: better to just edit the thread subject, but not start over completely - people want to reply to previous content. I will copy over comments I'd like to reply to to the other thread by hand now.
On the topic of this thread, I'll try to rewrite NEP-35 to make it more accessible and ping back here once I have a PR for that.
Thanks! Cheers, Ralf Is there anything else that's pressing here? If there is and I
missed/forgot about it, please let me know.
Best, Peter
On Fri, Aug 14, 2020 at 5:00 AM Juan Nunez-Iglesias <jni@fastmail.com> wrote:
Hello everyone again!
A few clarifications about my proposal of external peer review:
- Yes, all this work is public and announced on the mailing list. However, I don’t think there’s a single person in this discussion or even this whole ecosystem that does not have a more immediately-pressing and also virtually infinite to-do list, so it’s unreasonable to expect that generally they would do more than glance at the stuff in the mailing list. In the peer review analogy, the mailing list is like the arXiv or Biorxiv stream — yep, anyone can see the stuff on there and comment, but most people just don’t have the time or attention to grab onto that. The only reason I stopped to comment here is Sebastian’s “Imma merge, YOLO!”, which had me raising my eyebrows real high. 😂 Especially for something that would expand the NumPy API!
- So, my proposal is that there needs to be an *editor* of NEPs who takes responsibility, once they are themselves satisfied with the NEP, for seeking out external reviewers and pinging them individually and asking them if they would be ok to review.
- A good friend who does screenwriting once told me, “don’t use all your proofreaders at once”. You want to get feedback, improve things, then feedback from a *totally independent* new person who can see the document with fresh eyes.
Obviously, all of the above slows things down. But “alone we go fast, together we go far”. The point of a NEP is to document critical decisions for the long term health of the project. If the documentation is insufficient, it defeats the whole purpose. Might as well just implement stuff and skip the whole NEP process. (Side note: Stephan, I for one would definitely appreciate an update to existing NEPs if there’s obvious ways they can be improved!)
I do think that NEP templates should be strict, and I don’t think that is incompatible with plain, jargon-free text. The NEP template and guidelines should specify that, and that the motivation should be understandable by a casual NumPy user — the kind described by Ilhan, for whom bare NumPy actually meets all their needs. Maybe they’ve also used PyTorch but they’ve never really had cause to mix them or write a program that worked with both kinds of arrays.
Ditto for backwards compatibility — everyone should be clear when their existing code is going to be broken. Actually NEP18 broke so much of my code, but its Backward compatibility section basically says all good! https://numpy.org/neps/nep-0018-array-function-protocol.html#backward-compat...
Anywho, as always, none of this is criticism to work done — I thank you all, and am eternally grateful for all the hard work everyone is doing to keep the ecosystem from fragmenting. I’m just hoping that this discussion can improve the process going forward!
And, yes, apologies to Peter, I know from repeated personal experience how frustrating it can be to have last-minute drive-by objections after months of consensus building! But I think in the end every time that happened the end result was better — I hope the same is true here! And yes, I’ll reiterate Ralf’s point: my concerns are about the NEP process itself rather than this one. I’ll summarise my proposal:
- strict NEP template. NEPs with missing sections will not be accepted. - sections Abstract, Motivation, and Backwards Compatibility should be understandable at a high level by casual users with ~zero background on the topic - enforce the above with at least two independent rounds of coordinated peer review.
Thank you,
Juan.
On 14 Aug 2020, at 5:29 am, Stephan Hoyer <shoyer@gmail.com> wrote:
On Thu, Aug 13, 2020 at 5:22 AM Ralf Gommers <ralf.gommers@gmail.com> wrote:
Thanks for raising these concerns Ilhan and Juan, and for answering Peter. Let me give my perspective as well.
To start with, this is not specifically about Peter's NEP and PR. NEP 35 simply follows the pattern set by previous PRs, and given its tight scope is less difficult to understand than other NEPs on such technical topics. Peter has done a lot of things right, and is close to the finish line.
On Thu, Aug 13, 2020 at 12:02 PM Peter Andreas Entschev < peter@entschev.com> wrote:
I think, arriving to an agreement would be much faster if there is an executive summary of who this is intended for and what the regular usage is. Because with no offense, all I see is "dispatch", "_array_function_" and a lot of technical details of which I am absolutely ignorant.
This is what I intended to do in the Usage Guidance [2] section. Could you elaborate on what more information you'd want to see there? Or is it just a matter of reorganizing the NEP a bit to try and summarize such things right at the top?
We adapted the NEP template [6] several times last year to try and improve this. And specified in there as well that NEP content set to the mailing list should only contain the sections: Abstract, Motivation and Scope, Usage and Impact, and Backwards compatibility. This to ensure we fully understand the "why" and "what" before the "how". Unfortunately that template and procedure hasn't been exercised much yet, only in NEP 38 [7] and partially in NEP 41 [8].
If we have long-time maintainers of SciPy (Ilhan and myself), scikit-image (Juan) and CuPy (Leo, on the PR review) all saying they don't understand the goals, relevance, target audience, or how they're supposed to use a new feature, that indicates that the people doing the writing and having the discussion are doing something wrong at a very fundamental level.
At this point I'm pretty disappointed in and tired of how we write and discuss NEPs on technical topics like dispatching, dtypes and the like. People literally refuse to write down concrete motivations, goals and non-goals, code that's problematic now and will be better/working post-NEP and usage examples before launching into extensive discussion of the gory details of the internals. I'm not sure what to do about it. Completely separate API and behavior proposals from implementation proposals? Make separate "API" and "internals" teams with the likes of Juan, Ilhan and Leo on the API team which then needs to approve every API change in new NEPs? Offer to co-write NEPs if someone is willing but doesn't understand how to go about it? Keep the current structure/process but veto further approvals until NEP authors get it right?
I think the NEP template is great, and we should try to be more diligent about following it!
My own NEP 37 (__array_module__) is probably a good example of poor presentation due to not following the template structure. It goes pretty deep into low-level motivation and some implementation details before usage examples.
Speaking just for myself, I would have appreciated a friendly nudge to use the template. Certainly I think it would be fine to require using the template for newly submitted NEPs. I did not remember about it when I started drafting NEP 37, and it definitely would have helped. I may still try to do a revision at some point to use the template structure.
I want to make an exception for merging the current NEP, for which the plan is to merge it as experimental to try in downstream PRs and get more experience. That does mean that master will be in an unreleasable state by the way, which is unusual and it'd be nice to get Chuck's explicit OK for that. But after that, I think we need a change here. I would like to hear what everyone thinks is the shape that change should take - any of my above suggestions, or something else?
Finally as a minor point, I know we are mostly (ex-)academics but this necessity of formal language on NEPs is self-imposed (probably PEPs are to blame) and not quite helping. It can be a bit more descriptive in my external opinion.
TBH, I don't really know how to solve that point, so if you have any specific suggestions, that's certainly welcome. I understand the frustration for a reader trying to understand all the details, with many being only described in NEP-18 [3], but we also strive to avoid rewriting things that are written elsewhere, which would also overburden those who are aware of what's being discussed.
I also share Ilhan’s concern (and I mentioned this in a previous NEP discussion) that NEPs are getting pretty inaccessible. In a sense these are difficult topics and readers should be expected to have *some* familiarity with the topics being discussed, but perhaps more effort should be put into the context/motivation/background of a NEP before accepting it. One way to ensure this might be to require a final proofreading step by someone who has not been involved at all in the discussions, like peer review does for papers.
Some variant of this proposal would be my preference.
Cheers, Ralf
[1] https://github.com/numpy/numpy/issues/14441#issuecomment-529969572 [2] https://numpy.org/neps/nep-0035-array-creation-dispatch-with-array-function.... [3] https://numpy.org/neps/nep-0018-array-function-protocol.html [4] https://numpy.org/neps/nep-0000.html#nep-workflow [5] https://mail.python.org/pipermail/numpy-discussion/2019-October/080176.html
[6] https://github.com/numpy/numpy/blob/master/doc/neps/nep-template.rst [7] https://github.com/numpy/numpy/blob/master/doc/neps/nep-0038-SIMD-optimizati... [8] https://github.com/numpy/numpy/blob/master/doc/neps/nep-0041-improved-dtype-...
On Thu, Aug 13, 2020 at 3:44 AM Juan Nunez-Iglesias <jni@fastmail.com> wrote:
I’ve generally been on the “let the NumPy devs worry about it” side
of things, but I do agree with Ilhan that `like=` is confusing and `typeof=` would be a much more appropriate name for that parameter.
I do think library writers are NumPy users and so I wouldn’t really
make that distinction, though. Users writing their own analysis code could very well be interested in writing code using numpy functions that will transparently work when the input is a CuPy array or whatever.
I also share Ilhan’s concern (and I mentioned this in a previous NEP
discussion) that NEPs are getting pretty inaccessible. In a sense these are difficult topics and readers should be expected to have *some* familiarity with the topics being discussed, but perhaps more effort should be put into the context/motivation/background of a NEP before accepting it. One way to ensure this might be to require a final proofreading step by someone who has not been involved at all in the discussions, like peer review does for papers.
Food for thought.
Juan.
On 13 Aug 2020, at 9:24 am, Ilhan Polat <ilhanpolat@gmail.com> wrote:
For what is worth, as a potential consumer in SciPy, it really
doesn't say anything (both in NEP and the PR) about how the regular users of NumPy will benefit from this. If only and only 3rd parties are going to benefit from it, I am not sure adding a new keyword to an already confusing function is the right thing to do.
Let me clarify,
- This is already a very (I mean extremely very) easy keyword name to
confuse with ones_like, zeros_like and by its nature any other interpretation. It is not signalling anything about the functionality that is being discussed. I would seriously consider reserving such obvious names for really obvious tasks. Because you would also expect the shape and ndim would be mimicked by the "like"d argument but it turns out it is acting more like "typeof=" and not "like=" at all. Because if we follow the semantics it reads as "make your argument asarray like the other thing" but it is actually doing, "make your argument an array with the other thing's type" which might not be an array after all.
- Again, if this is meant for downstream libraries (because that's
what I got out of the PR discussion, cupy, dask, and JAX were the only examples I could read) then hiding it in another function and writing with capital letters "this is not meant for numpy users" would be a much more convenient way to separate the target audience and regular users. numpy.astypedarray([[some data], [...]], type_of=x) or whatever else it may be would be quite clean and to the point with no ambiguous keywords.
I think, arriving to an agreement would be much faster if there is an
executive summary of who this is intended for and what the regular usage is. Because with no offense, all I see is "dispatch", "_array_function_" and a lot of technical details of which I am absolutely ignorant.
Finally as a minor point, I know we are mostly (ex-)academics but
this necessity of formal language on NEPs is self-imposed (probably PEPs are to blame) and not quite helping. It can be a bit more descriptive in my external opinion.
_______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@python.org https://mail.python.org/mailman/listinfo/numpy-discussion
_______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@python.org https://mail.python.org/mailman/listinfo/numpy-discussion
_______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@python.org https://mail.python.org/mailman/listinfo/numpy-discussion
_______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@python.org https://mail.python.org/mailman/listinfo/numpy-discussion
As per discussed, I've opened a PR https://github.com/numpy/numpy/pull/17093 attempting to clarify some of the writing and to follow the NEP Template. As suggested in the template, please find below the top part of NEP-35 (up to and including the Backward Compatibility section). Please feel free to comment and suggest improvements or point out what may still be unclear, personally I would prefer comments directly on the PR if possible. =========================================================== NEP 35 — Array Creation Dispatching With __array_function__ =========================================================== :Author: Peter Andreas Entschev <pentschev@nvidia.com> :Status: Draft :Type: Standards Track :Created: 2019-10-15 :Updated: 2020-08-17 :Resolution: Abstract -------- We propose the introduction of a new keyword argument ``like=`` to all array creation functions, this argument permits the creation of an array based on a non-NumPy reference array passed via that argument, resulting in an array defined by the downstream library implementing that type, which also implements the ``__array_function__`` protocol. With this we address one of that protocol's shortcomings, as described by NEP 18 [1]_. Motivation and Scope -------------------- Many are the libraries implementing the NumPy API, such as Dask for graph computing, CuPy for GPGPU computing, xarray for N-D labeled arrays, etc. All the libraries mentioned have yet another thing in common: they have also adopted the ``__array_function__`` protocol. The protocol defines a mechanism allowing a user to directly use the NumPy API as a dispatcher based on the input array type. In essence, dispatching means users are able to pass a downstream array, such as a Dask array, directly to one of NumPy's compute functions, and NumPy will be able to automatically recognize that and send the work back to Dask's implementation of that function, which will define the return value. For example: .. code:: python x = dask.array.arange(5) # Creates dask.array np.sum(a) # Returns dask.array Note above how we called Dask's implementation of ``sum`` via the NumPy namespace by calling ``np.sum``, and the same would apply if we had a CuPy array or any other array from a library that adopts ``__array_function__``. This allows writing code that is agnostic to the implementation library, thus users can write their code once and still be able to use different array implementations according to their needs. Unfortunately, ``__array_function__`` has limitations, one of them being array creation functions. In the example above, NumPy was able to call Dask's implementation because the input array was a Dask array. The same is not true for array creation functions, in the example the input of ``arange`` is simply the integer ``5``, not providing any information of the array type that should be the result, that's where a reference array passed by the ``like=`` argument proposed here can be of help, as it provides NumPy with the information required to create the expected type of array. The new ``like=`` keyword proposed is solely intended to identify the downstream library where to dispatch and the object is used only as reference, meaning that no modifications, copies or processing will be performed on that object. We expect that this functionality will be mostly useful to library developers, allowing them to create new arrays for internal usage based on arrays passed by the user, preventing unnecessary creation of NumPy arrays that will ultimately lead to an additional conversion into a downstream array type. Support for Python 2.7 has been dropped since NumPy 1.17, therefore we make use of the keyword-only argument standard described in PEP-3102 [2]_ to implement ``like=``, thus preventing it from being passed by position. .. _neps.like-kwarg.usage-and-impact: Usage and Impact ---------------- To understand the intended use for ``like=``, and before we move to more complex cases, consider the following illustrative example consisting only of NumPy and CuPy arrays: .. code:: python import numpy as np import cupy def my_pad(arr, padding): padding = np.array(padding, like=arr) return np.concatenate((padding, arr, padding)) my_pad(np.arange(5), [-1, -1]) # Returns np.ndarray my_pad(cupy.arange(5), [-1, -1]) # Returns cupy.core.core.ndarray Note in the ``my_pad`` function above how ``arr`` is used as a reference to dictate what array type padding should have, before concatenating the arrays to produce the result. On the other hand, if ``like=`` wasn't used, the NumPy case case would still work, but CuPy wouldn't allow this kind of automatic conversion, ultimately raising a ``TypeError: Only cupy arrays can be concatenated`` exception. Now we should look at how a library like Dask could benefit from ``like=``. Before we understand that, it's important to understand a bit about Dask basics and ensures correctness with ``__array_function__``. Note that Dask can compute different sorts of objects, like dataframes, bags and arrays, here we will focus strictly on arrays, which are the objects we can use ``__array_function__`` with. Dask uses a graph computing model, meaning it breaks down a large problem in many smaller problems and merge their results to reach the final result. To break the problem down into smaller ones, Dask also breaks arrays into smaller arrays, that it calls "chunks". A Dask array can thus consist of one or more chunks and they may be of different types. However, in the context of ``__array_function__``, Dask only allows chunks of the same type, for example, a Dask array can be formed of several NumPy arrays or several CuPy arrays, but not a mix of both. To avoid mismatched types during compute, Dask keeps an attribute ``_meta`` as part of its array throughout computation, this attribute is used to both predict the output type at graph creation time and to create any intermediary arrays that are necessary within some function's computation. Going back to our previous example, we can use ``_meta`` information to identify what kind of array we would use for padding, as seen below: .. code:: python import numpy as np import cupy import dask.array as da from dask.array.utils import meta_from_array def my_pad(arr, padding): padding = np.array(padding, like=meta_from_array(arr)) return np.concatenate((padding, arr, padding)) # Returns dask.array<concatenate, shape=(9,), dtype=int64, chunksize=(5,), chunktype=numpy.ndarray> my_pad(da.arange(5), [-1, -1]) # Returns dask.array<concatenate, shape=(9,), dtype=int64, chunksize=(5,), chunktype=cupy.ndarray> my_pad(da.from_array(cupy.arange(5)), [-1, -1]) Note how ``chunktype`` in the return value above changes from ``numpy.ndarray`` in the first ``my_pad`` call to ``cupy.ndarray`` in the second. To enable proper identification of the array type we use Dask's utility function ``meta_from_array``, which was introduced as part of the work to support ``__array_function__``, allowing Dask to handle ``_meta`` appropriately. That function is primarily targeted at the library's internal usage to ensure chunks are created with correct types. Without the ``like=`` argument, it would be impossible to ensure ``my_pad`` creates a padding array with a type matching that of the input array, which would cause cause a ``TypeError`` exception to be raised by CuPy, as discussed above would happen to the CuPy case alone. Backward Compatibility ---------------------- This proposal does not raise any backward compatibility issues within NumPy, given that it only introduces a new keyword argument to existing array creation functions with a default ``None`` value, thus not changing current behavior. On Sun, Aug 16, 2020 at 1:41 PM Ralf Gommers <ralf.gommers@gmail.com> wrote:
On Fri, Aug 14, 2020 at 12:23 PM Peter Andreas Entschev <peter@entschev.com> wrote:
Hi all,
This thread has IMO drifted very far from its original purpose, due to that I decided to start a new thread specifically for the general NEP procedure discussed, please check your mail for "NEP Procedure Discussion" subject.
Thanks Peter. For future reference: better to just edit the thread subject, but not start over completely - people want to reply to previous content. I will copy over comments I'd like to reply to to the other thread by hand now.
On the topic of this thread, I'll try to rewrite NEP-35 to make it more accessible and ping back here once I have a PR for that.
Thanks!
Cheers, Ralf
Is there anything else that's pressing here? If there is and I missed/forgot about it, please let me know.
Best, Peter
On Fri, Aug 14, 2020 at 5:00 AM Juan Nunez-Iglesias <jni@fastmail.com> wrote:
Hello everyone again!
A few clarifications about my proposal of external peer review:
- Yes, all this work is public and announced on the mailing list. However, I don’t think there’s a single person in this discussion or even this whole ecosystem that does not have a more immediately-pressing and also virtually infinite to-do list, so it’s unreasonable to expect that generally they would do more than glance at the stuff in the mailing list. In the peer review analogy, the mailing list is like the arXiv or Biorxiv stream — yep, anyone can see the stuff on there and comment, but most people just don’t have the time or attention to grab onto that. The only reason I stopped to comment here is Sebastian’s “Imma merge, YOLO!”, which had me raising my eyebrows real high. Especially for something that would expand the NumPy API!
- So, my proposal is that there needs to be an *editor* of NEPs who takes responsibility, once they are themselves satisfied with the NEP, for seeking out external reviewers and pinging them individually and asking them if they would be ok to review.
- A good friend who does screenwriting once told me, “don’t use all your proofreaders at once”. You want to get feedback, improve things, then feedback from a *totally independent* new person who can see the document with fresh eyes.
Obviously, all of the above slows things down. But “alone we go fast, together we go far”. The point of a NEP is to document critical decisions for the long term health of the project. If the documentation is insufficient, it defeats the whole purpose. Might as well just implement stuff and skip the whole NEP process. (Side note: Stephan, I for one would definitely appreciate an update to existing NEPs if there’s obvious ways they can be improved!)
I do think that NEP templates should be strict, and I don’t think that is incompatible with plain, jargon-free text. The NEP template and guidelines should specify that, and that the motivation should be understandable by a casual NumPy user — the kind described by Ilhan, for whom bare NumPy actually meets all their needs. Maybe they’ve also used PyTorch but they’ve never really had cause to mix them or write a program that worked with both kinds of arrays.
Ditto for backwards compatibility — everyone should be clear when their existing code is going to be broken. Actually NEP18 broke so much of my code, but its Backward compatibility section basically says all good! https://numpy.org/neps/nep-0018-array-function-protocol.html#backward-compat...
Anywho, as always, none of this is criticism to work done — I thank you all, and am eternally grateful for all the hard work everyone is doing to keep the ecosystem from fragmenting. I’m just hoping that this discussion can improve the process going forward!
And, yes, apologies to Peter, I know from repeated personal experience how frustrating it can be to have last-minute drive-by objections after months of consensus building! But I think in the end every time that happened the end result was better — I hope the same is true here! And yes, I’ll reiterate Ralf’s point: my concerns are about the NEP process itself rather than this one. I’ll summarise my proposal:
- strict NEP template. NEPs with missing sections will not be accepted. - sections Abstract, Motivation, and Backwards Compatibility should be understandable at a high level by casual users with ~zero background on the topic - enforce the above with at least two independent rounds of coordinated peer review.
Thank you,
Juan.
On 14 Aug 2020, at 5:29 am, Stephan Hoyer <shoyer@gmail.com> wrote:
On Thu, Aug 13, 2020 at 5:22 AM Ralf Gommers <ralf.gommers@gmail.com> wrote:
Thanks for raising these concerns Ilhan and Juan, and for answering Peter. Let me give my perspective as well.
To start with, this is not specifically about Peter's NEP and PR. NEP 35 simply follows the pattern set by previous PRs, and given its tight scope is less difficult to understand than other NEPs on such technical topics. Peter has done a lot of things right, and is close to the finish line.
On Thu, Aug 13, 2020 at 12:02 PM Peter Andreas Entschev <peter@entschev.com> wrote:
I think, arriving to an agreement would be much faster if there is an executive summary of who this is intended for and what the regular usage is. Because with no offense, all I see is "dispatch", "_array_function_" and a lot of technical details of which I am absolutely ignorant.
This is what I intended to do in the Usage Guidance [2] section. Could you elaborate on what more information you'd want to see there? Or is it just a matter of reorganizing the NEP a bit to try and summarize such things right at the top?
We adapted the NEP template [6] several times last year to try and improve this. And specified in there as well that NEP content set to the mailing list should only contain the sections: Abstract, Motivation and Scope, Usage and Impact, and Backwards compatibility. This to ensure we fully understand the "why" and "what" before the "how". Unfortunately that template and procedure hasn't been exercised much yet, only in NEP 38 [7] and partially in NEP 41 [8].
If we have long-time maintainers of SciPy (Ilhan and myself), scikit-image (Juan) and CuPy (Leo, on the PR review) all saying they don't understand the goals, relevance, target audience, or how they're supposed to use a new feature, that indicates that the people doing the writing and having the discussion are doing something wrong at a very fundamental level.
At this point I'm pretty disappointed in and tired of how we write and discuss NEPs on technical topics like dispatching, dtypes and the like. People literally refuse to write down concrete motivations, goals and non-goals, code that's problematic now and will be better/working post-NEP and usage examples before launching into extensive discussion of the gory details of the internals. I'm not sure what to do about it. Completely separate API and behavior proposals from implementation proposals? Make separate "API" and "internals" teams with the likes of Juan, Ilhan and Leo on the API team which then needs to approve every API change in new NEPs? Offer to co-write NEPs if someone is willing but doesn't understand how to go about it? Keep the current structure/process but veto further approvals until NEP authors get it right?
I think the NEP template is great, and we should try to be more diligent about following it!
My own NEP 37 (__array_module__) is probably a good example of poor presentation due to not following the template structure. It goes pretty deep into low-level motivation and some implementation details before usage examples.
Speaking just for myself, I would have appreciated a friendly nudge to use the template. Certainly I think it would be fine to require using the template for newly submitted NEPs. I did not remember about it when I started drafting NEP 37, and it definitely would have helped. I may still try to do a revision at some point to use the template structure.
I want to make an exception for merging the current NEP, for which the plan is to merge it as experimental to try in downstream PRs and get more experience. That does mean that master will be in an unreleasable state by the way, which is unusual and it'd be nice to get Chuck's explicit OK for that. But after that, I think we need a change here. I would like to hear what everyone thinks is the shape that change should take - any of my above suggestions, or something else?
Finally as a minor point, I know we are mostly (ex-)academics but this necessity of formal language on NEPs is self-imposed (probably PEPs are to blame) and not quite helping. It can be a bit more descriptive in my external opinion.
TBH, I don't really know how to solve that point, so if you have any specific suggestions, that's certainly welcome. I understand the frustration for a reader trying to understand all the details, with many being only described in NEP-18 [3], but we also strive to avoid rewriting things that are written elsewhere, which would also overburden those who are aware of what's being discussed.
I also share Ilhan’s concern (and I mentioned this in a previous NEP discussion) that NEPs are getting pretty inaccessible. In a sense these are difficult topics and readers should be expected to have *some* familiarity with the topics being discussed, but perhaps more effort should be put into the context/motivation/background of a NEP before accepting it. One way to ensure this might be to require a final proofreading step by someone who has not been involved at all in the discussions, like peer review does for papers.
Some variant of this proposal would be my preference.
Cheers, Ralf
[1] https://github.com/numpy/numpy/issues/14441#issuecomment-529969572 [2] https://numpy.org/neps/nep-0035-array-creation-dispatch-with-array-function.... [3] https://numpy.org/neps/nep-0018-array-function-protocol.html [4] https://numpy.org/neps/nep-0000.html#nep-workflow [5] https://mail.python.org/pipermail/numpy-discussion/2019-October/080176.html
[6] https://github.com/numpy/numpy/blob/master/doc/neps/nep-template.rst [7] https://github.com/numpy/numpy/blob/master/doc/neps/nep-0038-SIMD-optimizati... [8] https://github.com/numpy/numpy/blob/master/doc/neps/nep-0041-improved-dtype-...
On Thu, Aug 13, 2020 at 3:44 AM Juan Nunez-Iglesias <jni@fastmail.com> wrote:
I’ve generally been on the “let the NumPy devs worry about it” side of things, but I do agree with Ilhan that `like=` is confusing and `typeof=` would be a much more appropriate name for that parameter.
I do think library writers are NumPy users and so I wouldn’t really make that distinction, though. Users writing their own analysis code could very well be interested in writing code using numpy functions that will transparently work when the input is a CuPy array or whatever.
I also share Ilhan’s concern (and I mentioned this in a previous NEP discussion) that NEPs are getting pretty inaccessible. In a sense these are difficult topics and readers should be expected to have *some* familiarity with the topics being discussed, but perhaps more effort should be put into the context/motivation/background of a NEP before accepting it. One way to ensure this might be to require a final proofreading step by someone who has not been involved at all in the discussions, like peer review does for papers.
Food for thought.
Juan.
On 13 Aug 2020, at 9:24 am, Ilhan Polat <ilhanpolat@gmail.com> wrote:
For what is worth, as a potential consumer in SciPy, it really doesn't say anything (both in NEP and the PR) about how the regular users of NumPy will benefit from this. If only and only 3rd parties are going to benefit from it, I am not sure adding a new keyword to an already confusing function is the right thing to do.
Let me clarify,
- This is already a very (I mean extremely very) easy keyword name to confuse with ones_like, zeros_like and by its nature any other interpretation. It is not signalling anything about the functionality that is being discussed. I would seriously consider reserving such obvious names for really obvious tasks. Because you would also expect the shape and ndim would be mimicked by the "like"d argument but it turns out it is acting more like "typeof=" and not "like=" at all. Because if we follow the semantics it reads as "make your argument asarray like the other thing" but it is actually doing, "make your argument an array with the other thing's type" which might not be an array after all.
- Again, if this is meant for downstream libraries (because that's what I got out of the PR discussion, cupy, dask, and JAX were the only examples I could read) then hiding it in another function and writing with capital letters "this is not meant for numpy users" would be a much more convenient way to separate the target audience and regular users. numpy.astypedarray([[some data], [...]], type_of=x) or whatever else it may be would be quite clean and to the point with no ambiguous keywords.
I think, arriving to an agreement would be much faster if there is an executive summary of who this is intended for and what the regular usage is. Because with no offense, all I see is "dispatch", "_array_function_" and a lot of technical details of which I am absolutely ignorant.
Finally as a minor point, I know we are mostly (ex-)academics but this necessity of formal language on NEPs is self-imposed (probably PEPs are to blame) and not quite helping. It can be a bit more descriptive in my external opinion.
_______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@python.org https://mail.python.org/mailman/listinfo/numpy-discussion
_______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@python.org https://mail.python.org/mailman/listinfo/numpy-discussion
_______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@python.org https://mail.python.org/mailman/listinfo/numpy-discussion
_______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@python.org https://mail.python.org/mailman/listinfo/numpy-discussion
_______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@python.org https://mail.python.org/mailman/listinfo/numpy-discussion
participants (7)
-
Hameer Abbasi
-
Ilhan Polat
-
Juan Nunez-Iglesias
-
Peter Andreas Entschev
-
Ralf Gommers
-
Sebastian Berg
-
Stephan Hoyer