[Numpy-discussion] Experimental `like=` attribute for array creation functions
Peter Andreas Entschev
peter at entschev.com
Thu Aug 13 06:56:26 EDT 2020
> I am not sure adding a new keyword to an already confusing function is the right thing to do.
Could you clarify what is the confusing function in question?
> This is already a very (I mean extremely very) easy keyword name to confuse with ones_like, zeros_like and by its nature any other interpretation.
To be fair, the usage is the same. Therefore
empty_like(downstream_array, ...) and empty(downstream_array, ...,
like=downstream_array) should have the exact same behavior, which is
arguably redundant now.
> It is not signalling anything about the functionality that is being discussed. I would seriously consider reserving such obvious names for really obvious tasks. Because you would also expect the shape and ndim would be mimicked by the "like"d argument but it turns out it is acting more like "typeof=" and not "like=" at all.
I understand this can be confusing, and naming was one of the hardest
discussions as there's no clear unambiguous name to use for this
keyword, "like=" was simply the name that got closer to converging
during discussions. At the same time I think "typeof=" is perhaps a
better name than "like=", it could be very much confusing with
"dtype=", and that would possibly just shift the confusion.
> Again, if this is meant for downstream libraries (because that's what I got out of the PR discussion, cupy, dask, and JAX were the only examples I could read) then hiding it in another function and writing with capital letters "this is not meant for numpy users" would be a much more convenient way to separate the target audience and regular users.
The problem with this approach is that the __array_function__ protocol
relies on downstream libraries implementing functions with the same
signature (for example, Dask and CuPy both implement an "array"
function that matches NumPy). The purpose of __array_function__ and
NEP-35 is to introduce only minimal changes to both NumPy's API and
downstream libraries. Of course adding new functions for such cases
would work, but IMO it would defeat the purpose of __array_function__
in general as it would require a considerable amount of work in
downstream libraries, and we discussed this previously deciding that
an argument is better than many new functions [1].
> I think, arriving to an agreement would be much faster if there is an executive summary of who this is intended for and what the regular usage is. Because with no offense, all I see is "dispatch", "_array_function_" and a lot of technical details of which I am absolutely ignorant.
This is what I intended to do in the Usage Guidance [2] section. Could
you elaborate on what more information you'd want to see there? Or is
it just a matter of reorganizing the NEP a bit to try and summarize
such things right at the top?
> Finally as a minor point, I know we are mostly (ex-)academics but this necessity of formal language on NEPs is self-imposed (probably PEPs are to blame) and not quite helping. It can be a bit more descriptive in my external opinion.
TBH, I don't really know how to solve that point, so if you have any
specific suggestions, that's certainly welcome. I understand the
frustration for a reader trying to understand all the details, with
many being only described in NEP-18 [3], but we also strive to avoid
rewriting things that are written elsewhere, which would also
overburden those who are aware of what's being discussed.
> I’ve generally been on the “let the NumPy devs worry about it” side of things, but I do agree with Ilhan that `like=` is confusing and `typeof=` would be a much more appropriate name for that parameter.
To be clear, I have no strong opinion on renaming it, I'm fine either
way but I think it's unrealistic to expect that we find somewhat
short, unambiguous and properly descriptive names in a single name. If
the preference now shifts towards the "typeof=" name, we can change
it, but "like=" was really named after "empty_like" and similar
functions.
> I do think library writers are NumPy users and so I wouldn’t really make that distinction, though. Users writing their own analysis code could very well be interested in writing code using numpy functions that will transparently work when the input is a CuPy array or whatever.
I'm guessing this is somewhat of a loose definition of "library", to
some extent if you really need "like=" it means that you're writing
your own functions around the NumPy API (and that IMO is a library,
even if you call it something else), rather than just writing your
application on top of the existing NumPy API. I'm also happy to
rephrase that in the NEP if people feel it should be done.
> I also share Ilhan’s concern (and I mentioned this in a previous NEP discussion) that NEPs are getting pretty inaccessible. In a sense these are difficult topics and readers should be expected to have *some* familiarity with the topics being discussed, but perhaps more effort should be put into the context/motivation/background of a NEP before accepting it. One way to ensure this might be to require a final proofreading step by someone who has not been involved at all in the discussions, like peer review does for papers.
This is a good point, and we do always notify people over the mailing
list of new NEPs as per NEP-0 [4], which was done for NEP-35 [5]
(originally NEP-33, but renamed due to other open NEPs at that time),
unfortunately not many concerns were raised about that back then.
Best,
Peter
[1] https://github.com/numpy/numpy/issues/14441#issuecomment-529969572
[2] https://numpy.org/neps/nep-0035-array-creation-dispatch-with-array-function.html#usage-guidance
[3] https://numpy.org/neps/nep-0018-array-function-protocol.html
[4] https://numpy.org/neps/nep-0000.html#nep-workflow
[5] https://mail.python.org/pipermail/numpy-discussion/2019-October/080176.html
On Thu, Aug 13, 2020 at 3:44 AM Juan Nunez-Iglesias <jni at fastmail.com> wrote:
>
> I’ve generally been on the “let the NumPy devs worry about it” side of things, but I do agree with Ilhan that `like=` is confusing and `typeof=` would be a much more appropriate name for that parameter.
>
> I do think library writers are NumPy users and so I wouldn’t really make that distinction, though. Users writing their own analysis code could very well be interested in writing code using numpy functions that will transparently work when the input is a CuPy array or whatever.
>
> I also share Ilhan’s concern (and I mentioned this in a previous NEP discussion) that NEPs are getting pretty inaccessible. In a sense these are difficult topics and readers should be expected to have *some* familiarity with the topics being discussed, but perhaps more effort should be put into the context/motivation/background of a NEP before accepting it. One way to ensure this might be to require a final proofreading step by someone who has not been involved at all in the discussions, like peer review does for papers.
>
> Food for thought.
>
> Juan.
>
> On 13 Aug 2020, at 9:24 am, Ilhan Polat <ilhanpolat at gmail.com> wrote:
>
> For what is worth, as a potential consumer in SciPy, it really doesn't say anything (both in NEP and the PR) about how the regular users of NumPy will benefit from this. If only and only 3rd parties are going to benefit from it, I am not sure adding a new keyword to an already confusing function is the right thing to do.
>
> Let me clarify,
>
> - This is already a very (I mean extremely very) easy keyword name to confuse with ones_like, zeros_like and by its nature any other interpretation. It is not signalling anything about the functionality that is being discussed. I would seriously consider reserving such obvious names for really obvious tasks. Because you would also expect the shape and ndim would be mimicked by the "like"d argument but it turns out it is acting more like "typeof=" and not "like=" at all. Because if we follow the semantics it reads as "make your argument asarray like the other thing" but it is actually doing, "make your argument an array with the other thing's type" which might not be an array after all.
>
> - Again, if this is meant for downstream libraries (because that's what I got out of the PR discussion, cupy, dask, and JAX were the only examples I could read) then hiding it in another function and writing with capital letters "this is not meant for numpy users" would be a much more convenient way to separate the target audience and regular users. numpy.astypedarray([[some data], [...]], type_of=x) or whatever else it may be would be quite clean and to the point with no ambiguous keywords.
>
> I think, arriving to an agreement would be much faster if there is an executive summary of who this is intended for and what the regular usage is. Because with no offense, all I see is "dispatch", "_array_function_" and a lot of technical details of which I am absolutely ignorant.
>
> Finally as a minor point, I know we are mostly (ex-)academics but this necessity of formal language on NEPs is self-imposed (probably PEPs are to blame) and not quite helping. It can be a bit more descriptive in my external opinion.
>
> best,
> ilhan
>
>
>
>
>
>
>
> On Tue, Aug 11, 2020 at 12:18 AM Ralf Gommers <ralf.gommers at gmail.com> wrote:
>>
>>
>>
>> On Mon, Aug 10, 2020 at 8:37 PM Sebastian Berg <sebastian at sipsolutions.net> wrote:
>>>
>>> On Mon, 2020-08-10 at 17:35 +0200, Hameer Abbasi wrote:
>>> > Hi,
>>> >
>>> > We should have a higher-bandwidth meeting/communication for all
>>> > stakeholders, and particularly some library authors, to see what
>>> > would be good for them.
>>
>>
>> I'm not sure that helps. At this point there's little progress since the last meeting, I think the plan is unchanged: we need implementations of all the options on offer, and then try them out in PRs for scikit-learn, SciPy and perhaps another package who's maintainers are interested, to test like=, __array_module__ in realistic situations.
>>
>>
>>> >
>>> > We should definitely have language in the NEP that says it won’t be
>>> > in a release unless the NEP is accepted.
>>>
>>> In that case, I think the important part is to have language right now
>>> in the implementation, although that can refer to the NEP itself of
>>> course.
>>> You can't expect everyone who may be tempted to use it to actually read
>>> the NEP draft, at least not without pointing it out.
>>
>>
>> Agreed, I think the decision is on this list not in the NEP, and to make sure we won't forget we need an issue opened with the 1.20 milestone.
>>
>> Cheers,
>> Ralf
>>
>>>
>>> I will say that I think it is not very high risk, because I think
>>> annoying or not, the argument could be deprecated again with a
>>> transition short phase. Admittedly, that argument only works if we have
>>> a replacement solution.
>>>
>>> Cheers,
>>>
>>> Sebastian
>>>
>>>
>>> >
>>> > Best regards,
>>> > Hameer Abbasi
>>> >
>>> > --
>>> > Sent from Canary (https://canarymail.io)
>>> >
>>> > > On Monday, Aug 10, 2020 at 5:31 PM, Sebastian Berg <
>>> > > sebastian at sipsolutions.net (mailto:sebastian at sipsolutions.net)>
>>> > > wrote:
>>> > > Hi all,
>>> > >
>>> > > as a heads up that Peter Entschev has a PR open to add `like=` to
>>> > > most array creation functions, my current plan is to merge it soon
>>> > > as a preliminary API and bring it up again before the actual
>>> > > release (in a few months). This allows overriding for array-likes,
>>> > > e.g. it will allow:
>>> > >
>>> > >
>>> > > arr = np.asarray([3], like=dask_array)
>>> > > type(arr) is dask.array.Array
>>> > >
>>> > > This was proposed in NEP 35:
>>> > >
>>> > > https://numpy.org/neps/nep-0035-array-creation-dispatch-with-array-function.html
>>> > >
>>> > > Although that has not been accepted as of now, the PR is:
>>> > >
>>> > > https://github.com/numpy/numpy/pull/16935
>>> > >
>>> > >
>>> > > This was discussed in a smaller group, and is an attempt to see how
>>> > > we
>>> > > can make the array-function protocol viable to allow packages such
>>> > > as
>>> > > sklearn to work with non-NumPy arrays.
>>> > >
>>> > > As of now, this would be experimental and can revisit it before the
>>> > > actual NumPy release. We should probably discuss accepting NEP 35
>>> > > more. At this time, I hope that we can put in the functionality to
>>> > > facilitate this discussion, rather the other way around.
>>> > >
>>> > > If anyone feels nervous about this step, I would be happy to
>>> > > document
>>> > > that we will not include it in the next release unless the NEP is
>>> > > accepted first, or at least hide it behind an environment variable.
>>> > >
>>> > > Cheers,
>>> > >
>>> > > Sebastian
>>> > >
>>> > > _______________________________________________
>>> > > NumPy-Discussion mailing list
>>> > > NumPy-Discussion at python.org
>>> > > https://mail.python.org/mailman/listinfo/numpy-discussion
>>> >
>>> > _______________________________________________
>>> > NumPy-Discussion mailing list
>>> > NumPy-Discussion at python.org
>>> > https://mail.python.org/mailman/listinfo/numpy-discussion
>>>
>>> _______________________________________________
>>> NumPy-Discussion mailing list
>>> NumPy-Discussion at python.org
>>> https://mail.python.org/mailman/listinfo/numpy-discussion
>>
>> _______________________________________________
>> NumPy-Discussion mailing list
>> NumPy-Discussion at python.org
>> https://mail.python.org/mailman/listinfo/numpy-discussion
>
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at python.org
> https://mail.python.org/mailman/listinfo/numpy-discussion
>
>
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at python.org
> https://mail.python.org/mailman/listinfo/numpy-discussion
More information about the NumPy-Discussion
mailing list