copy="never" discussion and no deprecation cycle?
![](https://secure.gravatar.com/avatar/b4f6d4f8b501cb05fd054944a166a121.jpg?s=120&d=mm&r=g)
Hi all, (sorry for the length, details/discussion below) On the triage call, there seemed a preference to just try to skip the deprecation and introduce `copy="never"`, `copy="if_needed"`, and `copy="always"` (i.e. string options for the `copy` keyword argument). Strictly speaking, this is against the typical policy (one year of warning/errors). But nobody could think of a reasonable chance that anyone actually uses it. (For me just "policy" will be enough of an argument to just take it slow.) BUT: If nobody has *any* concerns at all, I think we may just end up introducing the change right away. The PR is: https://github.com/numpy/numpy/pull/19173 ## The Feature There is the idea to add `copy=never` (or similar). This would modify the existing `copy` argument to make it a 3-way decision: * `copy=always` or `copy=True` to force a copy * `copy=if_needed` or `copy=False` to prefer no-copy behavior * `copy=never` to error when no-copy behavior is not possible (this ensures that a view is returned) this would affect the functions: * np.array(object, copy=...) * arr.astype(new_dtype, copy=...) * np.reshape(arr, new_shape, copy=...), and the method arr.reshape() * np.meshgrid and possibly Where `reshape` currently does not have the option and would benefit by allowing for `arr.reshape(-1, copy=never)`, which would guarantee a view. ## The Options We have three options that are currently being discussed: 1. We introduce a new `np.CopyMode` or `np.<something>.Copy` Enum with values `np.CopyMode.NEVER`, `np.CopyMode.IF_NEEDED`, and `np.CopyMode.ALWAYS` * Plus: No compatibility concerns * Downside(?): This would be a first in NumPy, and is untypical API due to that. 2. We introduce `copy="never"`, `copy="if_needed"` and `copy="always"` as strings (all other strings will be a `TypeError`): * Problem: `copy="never"` currently means `copy=True` (the opposite) Which means new code has to take care when it may run on older NumPy versions. And in theory could make old code return the wrong thing. * Plus: Strings are the typical for options in NumPy currently. 3. Same as 2. But we take it very slow: Make strings an error right now and only introduce the new options after two releases as per typical deprecation policy. ## Discussion We discussed it briefly today in the triage call and we were leaning towards strings. I was honestly expecting to converge to option 3 to avoid compatibility issues (mainly surprises with `copy="never"` on older versions). But considering how weird it is to currently pass `copy="never"`, the question was whether we should not change it with a release note. The probability of someone currently passing exactly one of those three (and no other) strings seems exceedingly small. Personally, I don't have a much of an opinion. But if *nobody* voices any concern about just changing the meaning of the string inputs, I think the current default may be to just do it. Cheers, Sebastian
![](https://secure.gravatar.com/avatar/93a76a800ef6c5919baa8ba91120ee98.jpg?s=120&d=mm&r=g)
On Wed, Jun 16, 2021 at 1:01 PM Sebastian Berg <sebastian@sipsolutions.net> wrote:
2. We introduce `copy="never"`, `copy="if_needed"` and `copy="always"` as strings (all other strings will be a `TypeError`):
* Problem: `copy="never"` currently means `copy=True` (the opposite) Which means new code has to take care when it may run on older NumPy versions. And in theory could make old code return the wrong thing.
To me, this seems like a big problem. People try to use newer NumPy features on old versions of NumPy all the time. This works out OK if they get error messages, but we shouldn't add new features that silently do something else on old versions -- especially for recent old versions. In particular, both copy='if_needed' and copy='never' would mean copy='always' on old versions of NumPy. This seems bad -- basically the exact opposite of what the user explicitly requested. These sort of bugs can be quite challenging to track down. So in my opinion (1) and (3) are the only real options.
3. Same as 2. But we take it very slow: Make strings an error right now and only introduce the new options after two releases as per typical deprecation policy.
## Discussion
We discussed it briefly today in the triage call and we were leaning towards strings.
I was honestly expecting to converge to option 3 to avoid compatibility issues (mainly surprises with `copy="never"` on older versions). But considering how weird it is to currently pass `copy="never"`, the question was whether we should not change it with a release note.
The probability of someone currently passing exactly one of those three (and no other) strings seems exceedingly small.
Personally, I don't have a much of an opinion. But if *nobody* voices any concern about just changing the meaning of the string inputs, I think the current default may be to just do it.
Cheers,
Sebastian
_______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@python.org https://mail.python.org/mailman/listinfo/numpy-discussion
![](https://secure.gravatar.com/avatar/209654202cde8ec709dee0a4d23c717d.jpg?s=120&d=mm&r=g)
I agree with Stephan, but even 3 seems dangerous to me. Any code that wraps a numpy function and accepts a `copy` parameter (especially `__array_function__`) is likely to contain `if copy` somewhere, which would result in entirely (but likely silently) the wrong behavior for `copy="never"`. An important reason for the original `np.never_copy` suggestion the first time this was discussed is that it can overload `__bool__` to raise or return `False` and warn, which would make silent bad behavior visible one way or another. I think a short NEP might be in order here, just so we can make sure we've addressed everything that came up the previous time this was discussed. Eric. On Wed, Jun 16, 2021, 23:00 Stephan Hoyer <shoyer@gmail.com> wrote:
On Wed, Jun 16, 2021 at 1:01 PM Sebastian Berg <sebastian@sipsolutions.net> wrote:
2. We introduce `copy="never"`, `copy="if_needed"` and `copy="always"` as strings (all other strings will be a `TypeError`):
* Problem: `copy="never"` currently means `copy=True` (the opposite) Which means new code has to take care when it may run on older NumPy versions. And in theory could make old code return the wrong thing.
To me, this seems like a big problem.
People try to use newer NumPy features on old versions of NumPy all the time. This works out OK if they get error messages, but we shouldn't add new features that silently do something else on old versions -- especially for recent old versions.
In particular, both copy='if_needed' and copy='never' would mean copy='always' on old versions of NumPy. This seems bad -- basically the exact opposite of what the user explicitly requested. These sort of bugs can be quite challenging to track down.
So in my opinion (1) and (3) are the only real options.
3. Same as 2. But we take it very slow: Make strings an error right now and only introduce the new options after two releases as per typical deprecation policy.
## Discussion
We discussed it briefly today in the triage call and we were leaning towards strings.
I was honestly expecting to converge to option 3 to avoid compatibility issues (mainly surprises with `copy="never"` on older versions). But considering how weird it is to currently pass `copy="never"`, the question was whether we should not change it with a release note.
The probability of someone currently passing exactly one of those three (and no other) strings seems exceedingly small.
Personally, I don't have a much of an opinion. But if *nobody* voices any concern about just changing the meaning of the string inputs, I think the current default may be to just do it.
Cheers,
Sebastian
_______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@python.org https://mail.python.org/mailman/listinfo/numpy-discussion
_______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@python.org https://mail.python.org/mailman/listinfo/numpy-discussion
![](https://secure.gravatar.com/avatar/72f994ca072df3a3d2c3db8a137790fd.jpg?s=120&d=mm&r=g)
On 16/6/21 11:00 pm, Sebastian Berg wrote:
Hi all,
(sorry for the length, details/discussion below)
On the triage call, there seemed a preference to just try to skip the deprecation and introduce `copy="never"`, `copy="if_needed"`, and `copy="always"` (i.e. string options for the `copy` keyword argument).
Why this may be controversial: today someone could be calling |'||np.array(..., copy="never")', which would call| '|bool("never")', which would evaluate to 1, and would end up| doing the exact opposite of never. So their code is wrong, and they do not know it but are used to the error. If we change this, it would silently fix their code to do what they intended. Is that a correct reading of the problem? If so, I am in favor of the proposal to use string options in addition to boolean options. Matti
![](https://secure.gravatar.com/avatar/93a76a800ef6c5919baa8ba91120ee98.jpg?s=120&d=mm&r=g)
On Thu, Jun 17, 2021 at 1:29 PM Matti Picus <matti.picus@gmail.com> wrote:
On 16/6/21 11:00 pm, Sebastian Berg wrote:
Hi all,
(sorry for the length, details/discussion below)
On the triage call, there seemed a preference to just try to skip the deprecation and introduce `copy="never"`, `copy="if_needed"`, and `copy="always"` (i.e. string options for the `copy` keyword argument).
Why this may be controversial: today someone could be calling |'||np.array(..., copy="never")', which would call| '|bool("never")', which would evaluate to 1, and would end up| doing the exact opposite of never. So their code is wrong, and they do not know it but are used to the error. If we change this, it would silently fix their code to do what they intended.
Is that a correct reading of the problem?
If so, I am in favor of the proposal to use string options in addition to boolean options.
No, we aren't really concerned about users who write np.array(..., copy='never') today. This currently means np.array(..., copy=True), which is slightly unfortunate but not really a big deal. The big concern is users who will write np.array(..., copy='never') in the future, when it becomes supported by NumPy, but their code gets run on an older version of NumPy, in which case it silently works in a different way. This happens all the time. Even if we make copy='never' an error *today*, users will be encountering existing versions of NumPy for years into the future, so we won't be able to change the behavior of copy='never' for a very long time. Our deprecation policy says we would need to wait at least one year for this, but frankly I'm not sure that's enough for the possibility of silent bugs. 3-4 years might be more realistic. Eric's concerns about existing uses of "if copy" inside NEP 18 overloads is another good point, though there may be relatively few users of this feature today given that np.array() is only recently overridable (via "like"). Overall, I think using an enum is the happiest situation. It's a slightly awkward API, to be sure, but not very awkward in the scheme of things, and it's better than needing to wait a long time for a final resolution.
![](https://secure.gravatar.com/avatar/d9ac9213ada4a807322f99081296784b.jpg?s=120&d=mm&r=g)
On Thu, Jun 17, 2021, at 16:23, Stephan Hoyer wrote:
This happens all the time. Even if we make copy='never' an error *today*, users will be encountering existing versions of NumPy for years into the future, so we won't be able to change the behavior of copy='never' for a very long time. Our deprecation policy says we would need to wait at least one year for this, but frankly I'm not sure that's enough for the possibility of silent bugs. 3-4 years might be more realistic.
If we go the enum route, we may just as well deprecate string arguments at the same time so that we have the flexibility to introduce them again in the future. Stéfan
![](https://secure.gravatar.com/avatar/96dd777e397ab128fedab46af97a3a4a.jpg?s=120&d=mm&r=g)
On Fri, Jun 18, 2021 at 8:52 AM Stefan van der Walt <stefanv@berkeley.edu> wrote:
On Thu, Jun 17, 2021, at 16:23, Stephan Hoyer wrote:
This happens all the time. Even if we make copy='never' an error *today*, users will be encountering existing versions of NumPy for years into the future, so we won't be able to change the behavior of copy='never' for a very long time. Our deprecation policy says we would need to wait at least one year for this, but frankly I'm not sure that's enough for the possibility of silent bugs. 3-4 years might be more realistic.
If we go the enum route, we may just as well deprecate string arguments at the same time so that we have the flexibility to introduce them again in the future.
Stéfan
What if we made `copy=False` do what it says. I always thought the ambiguous behavior was just asking for trouble. For a couple of releases we could raise a FutureWarning when a copy was made in spite of `copy=False`, which might expose some bugs in existing code. Chuck
![](https://secure.gravatar.com/avatar/d9ac9213ada4a807322f99081296784b.jpg?s=120&d=mm&r=g)
On Sun, Jun 20, 2021, at 18:53, Charles R Harris wrote:
On Fri, Jun 18, 2021 at 8:52 AM Stefan van der Walt <stefanv@berkeley.edu> wrote:
__ On Thu, Jun 17, 2021, at 16:23, Stephan Hoyer wrote:
This happens all the time. Even if we make copy='never' an error *today*, users will be encountering existing versions of NumPy for years into the future, so we won't be able to change the behavior of copy='never' for a very long time. Our deprecation policy says we would need to wait at least one year for this, but frankly I'm not sure that's enough for the possibility of silent bugs. 3-4 years might be more realistic.
If we go the enum route, we may just as well deprecate string arguments at the same time so that we have the flexibility to introduce them again in the future.
That makes sense to me, but I think this would not preclude the enum from being introduced right now. If we make this change, the enum will become the only mechanism by which to get the behavior we currently have (copy-if-needed). Stéfan
![](https://secure.gravatar.com/avatar/2e1cb59230b4e566e6378acc92f65bcb.jpg?s=120&d=mm&r=g)
Hi, I have recently joined the mailing list and have gone through the previous discussions on this thread. I would like to share my analysis (advantages and disadvantages) of three possible alternatives (Enum, String, boolean) to support the proposed feature. *Enum* Advantages 1. Compatibility - Enums (currently, `np.CopyMode`) can be added to support the never copy feature without breaking any code which uses NumPy. Current values for `copy` arguments are `True` and `False` which can be easily mapped to two members of the above enum and the code will keep working as it used to be. Considering the large user base of NumPy, I think this is the most significant point to be considered. 2. Clarity and Consistency - Enums inherently provide consistency i.e., all the values of the copy argument will be of the same type and hence, one wouldn't have to worry much about using some special values just for the sake of prohibiting a deep copy. Also, Enums make the intention clear (np.CopyMode.ALWAYS, etc. already reflect the expected behaviour). Booleans like True and False are a bit cryptic in nature. In fact, the current behaviour of False is also a bit confusing. Enums can help in doing away with this issue without breaking anything which uses previous NumPy versions. 3. Code will break loudly - If anyone will try to use `np.CopyMode` on a previous version then the code will break loudly (AttributeError) rather than doing unpredictable things silently (fixing these is much more painful, especially in large code bases than updating the version). Disadvantages 1. Polluting Namespace - Enums do pollute the global namespace. Maybe it's an unavoidable thing which comes with the usage of Enums. 2. Inconsistent with APIs where strings are used - Many NumPy API use strings for supporting various options for an argument. For example, `np.linalg.qr` accepts strings for different modes. I think this would be the first time (if it happens) for an Enum to be used in such a scenario. *Strings* Advantages 1. Consistent with other NumPy APIs - As I said above, strings will keep things consistent across NumPy. 2. Clarity and Consistency - Strings too provide clarity of intention regarding the behaviour of the code. If we support strings for all the cases of copy argument then it would be consistent as well. 3. No pollution of namespace. I am not sure but supporting strings and booleans at least in new NumPy versions should be possible though doing that would not be as easy as Enums. Disadvantages 1. Silent and Unpredictable behaviour on previous NumPy versions - Since, strings can be interpreted as Booleans internally, if anyone passes any non-empty string, it will map to `True` and hence the code will always do a deep copy, irrespective of the argument. So, there would be cases, when this thing will go unnoticed by the user, the unwanted consequences of which I think shouldn't be ignored while making a choice for this feature. *Boolean (True/False/None)* Advantages 1. Easy to extend - As of now True and False are already supported. None can additionally be used to support never copy. Disadvantages 1. Silent behaviour in case of None and False - If someone passes None to some previous NumPy version then it may behave as False. Hence no error would be raised, but yeah the copy will be made only if needed. 2. Cryptic - The intention is not clearly reflected in these three values (in fact False is a bit relaxed in nature i.e., instead of never doing a copy it does only if needed which should have been the case with None). *Summary* To the best of my understanding, I think Booleans are not a good option when compared to String and Enums. Now, the choice is whether we are okay with unpredictable behaviour of user code in case of strings to reject Enums or we are okay with pollution of namespace to easily support previous API without breaking anything for future versions. Please let me know if I missed any important points. Thanks. On Mon, Jun 21, 2021 at 8:33 AM Stefan van der Walt <stefanv@berkeley.edu> wrote:
On Sun, Jun 20, 2021, at 18:53, Charles R Harris wrote:
On Fri, Jun 18, 2021 at 8:52 AM Stefan van der Walt <stefanv@berkeley.edu> wrote:
On Thu, Jun 17, 2021, at 16:23, Stephan Hoyer wrote:
This happens all the time. Even if we make copy='never' an error *today*, users will be encountering existing versions of NumPy for years into the future, so we won't be able to change the behavior of copy='never' for a very long time. Our deprecation policy says we would need to wait at least one year for this, but frankly I'm not sure that's enough for the possibility of silent bugs. 3-4 years might be more realistic.
If we go the enum route, we may just as well deprecate string arguments at the same time so that we have the flexibility to introduce them again in the future.
That makes sense to me, but I think this would not preclude the enum from being introduced right now. If we make this change, the enum will become the only mechanism by which to get the behavior we currently have (copy-if-needed).
Stéfan
_______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@python.org https://mail.python.org/mailman/listinfo/numpy-discussion
![](https://secure.gravatar.com/avatar/d9ac9213ada4a807322f99081296784b.jpg?s=120&d=mm&r=g)
On Sun, Jun 20, 2021, at 20:46, Gagandeep Singh wrote:
I have recently joined the mailing list and have gone through the previous discussions on this thread. I would like to share my analysis (advantages and disadvantages) of three possible alternatives (Enum, String, boolean) to support the proposed feature.
Thanks for this thorough analysis, Gagandeep! I'll throw one more heretical idea out there: `np.copy.IF_NEEDED`, `np.copy.ALWAYS`, `np.copy.NEVER`. This has the advantages of the enum, doesn't pollute the global namespace, and has an intuitive name. `np.array(x, copy=np.copy.ALWAYS)` It would be slightly more awkward to type, but is doable. A rough Python version sketch would be: class CopyFlag(enum.Enum): IF_NEEDED = 0 ALWAYS = 1 NEVER = 2 class NpCopy: IF_NEEDED : CopyFlag = CopyFlag.IF_NEEDED ALWAYS : CopyFlag = CopyFlag.ALWAYS NEVER : CopyFlag = CopyFlag.NEVER def __call__(self, x): return ...whatever copy returns... np.copy = NpCopy() Stéfan
![](https://secure.gravatar.com/avatar/209654202cde8ec709dee0a4d23c717d.jpg?s=120&d=mm&r=g)
Stefan, that sketch is more complicated than it needs to be - `np.copy` is a python function, so you can just attach the attributes directly! (although maybe there are implications for static typing) ``` class CopyFlag(enum.Enum): IF_NEEDED = 0 ALWAYS = 1 NEVER = 2 np.copy.IF_NEEDED = CopyFlag.IF_NEEDED np.copy.ALWAYS = CopyFlag.ALWAYS np.copy.NEVER = CopyFlag.NEVER ``` It would also work nicely for the `True/False/other` version that was proposed in the much older PR as `np.never_copy`: ``` class _CopyNever: def __bool__(self): raise ValueError np.copy.NEVER = _CopyNever() ``` All of these versions (and using the enum directly) seem fine to me. If we go down the enum route route, we probably want to add "new-style" versions of `np.CLIP` and friends that are true enums / live within a more obvious namespace. Eric On Mon, 21 Jun 2021 at 17:24, Stefan van der Walt <stefanv@berkeley.edu> wrote:
I have recently joined the mailing list and have gone through the
On Sun, Jun 20, 2021, at 20:46, Gagandeep Singh wrote: previous discussions on this thread. I would like to share my analysis (advantages and disadvantages) of three possible alternatives (Enum, String, boolean) to support the proposed feature.
Thanks for this thorough analysis, Gagandeep!
I'll throw one more heretical idea out there:
`np.copy.IF_NEEDED`, `np.copy.ALWAYS`, `np.copy.NEVER`.
This has the advantages of the enum, doesn't pollute the global namespace, and has an intuitive name.
`np.array(x, copy=np.copy.ALWAYS)`
It would be slightly more awkward to type, but is doable. A rough Python version sketch would be:
class CopyFlag(enum.Enum): IF_NEEDED = 0 ALWAYS = 1 NEVER = 2
class NpCopy: IF_NEEDED : CopyFlag = CopyFlag.IF_NEEDED ALWAYS : CopyFlag = CopyFlag.ALWAYS NEVER : CopyFlag = CopyFlag.NEVER
def __call__(self, x): return ...whatever copy returns...
np.copy = NpCopy()
Stéfan _______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@python.org https://mail.python.org/mailman/listinfo/numpy-discussion
![](https://secure.gravatar.com/avatar/9567056931ee2f26907b211b69d018b2.jpg?s=120&d=mm&r=g)
Stefan, that sketch is more complicated than it needs to be - `np.copy` is a python function, so you can just attach the attributes directly! (although maybe there are implications for static typing)
For the type annotations we can simply use something akin to Stéfans proposed `NpCopy` class; probably in combination with `Protocol`. It's a bit more work compared to annotating a normal python function, but it's quite easy nevertheless. Regards, Bas ________________________________ From: NumPy-Discussion <numpy-discussion-bounces+bas.vanbeek=hotmail.com@python.org> on behalf of Eric Wieser <wieser.eric+numpy@gmail.com> Sent: 21 June 2021 18:56 To: Discussion of Numerical Python <numpy-discussion@python.org> Subject: Re: [Numpy-discussion] copy="never" discussion and no deprecation cycle? Stefan, that sketch is more complicated than it needs to be - `np.copy` is a python function, so you can just attach the attributes directly! (although maybe there are implications for static typing) ``` class CopyFlag(enum.Enum): IF_NEEDED = 0 ALWAYS = 1 NEVER = 2 np.copy.IF_NEEDED = CopyFlag.IF_NEEDED np.copy.ALWAYS = CopyFlag.ALWAYS np.copy.NEVER = CopyFlag.NEVER ``` It would also work nicely for the `True/False/other` version that was proposed in the much older PR as `np.never_copy`: ``` class _CopyNever: def __bool__(self): raise ValueError np.copy.NEVER = _CopyNever() ``` All of these versions (and using the enum directly) seem fine to me. If we go down the enum route route, we probably want to add "new-style" versions of `np.CLIP` and friends that are true enums / live within a more obvious namespace. Eric On Mon, 21 Jun 2021 at 17:24, Stefan van der Walt <stefanv@berkeley.edu<mailto:stefanv@berkeley.edu>> wrote: On Sun, Jun 20, 2021, at 20:46, Gagandeep Singh wrote:
I have recently joined the mailing list and have gone through the previous discussions on this thread. I would like to share my analysis (advantages and disadvantages) of three possible alternatives (Enum, String, boolean) to support the proposed feature.
Thanks for this thorough analysis, Gagandeep! I'll throw one more heretical idea out there: `np.copy.IF_NEEDED`, `np.copy.ALWAYS`, `np.copy.NEVER`. This has the advantages of the enum, doesn't pollute the global namespace, and has an intuitive name. `np.array(x, copy=np.copy.ALWAYS)` It would be slightly more awkward to type, but is doable. A rough Python version sketch would be: class CopyFlag(enum.Enum): IF_NEEDED = 0 ALWAYS = 1 NEVER = 2 class NpCopy: IF_NEEDED : CopyFlag = CopyFlag.IF_NEEDED ALWAYS : CopyFlag = CopyFlag.ALWAYS NEVER : CopyFlag = CopyFlag.NEVER def __call__(self, x): return ...whatever copy returns... np.copy = NpCopy() Stéfan _______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@python.org<mailto:NumPy-Discussion@python.org> https://mail.python.org/mailman/listinfo/numpy-discussion
![](https://secure.gravatar.com/avatar/d9ac9213ada4a807322f99081296784b.jpg?s=120&d=mm&r=g)
Perhaps it is also worth going back to explore our original motivation for making this change. One reason was that Sebastian didn't like people doing `x.shape = ...`. Users do that, presumably, to trigger an error if a copy needs to be made. However, we can catch that already: x = np.reshape(y, ...) if np.may_share_memory(x, y): ... We can fix Sebastian's issue by introducing a `copy` keyword to `reshape`, which currently has none: x = np.reshape(y, copy='never') For consistency, it would be nice to have `np.array` support copy=never, but if there is no urgency we can take the long route towards an API that uses strings (consistent with the rest of NumPy). The arguments against string names *right now* is that, if users write code with `copy='if-needed'` it will not work correctly with old NumPy code, since old versions will evaluate `if-needed` to True. The assessment was that this happens frequently, but we should consider how frequently, and how big of an issue it is. So, I guess ultimately I am wondering if the change to `np.array` is needed right now, or whether we can get away without it for a while. Stéfan On Tue, Jun 22, 2021, at 15:21, bas van beek wrote:
Stefan, that sketch is more complicated than it needs to be - `np.copy` is a python function, so you can just attach the attributes directly! (although maybe there are implications for static typing)
For the type annotations we can simply use something akin to Stéfans proposed `NpCopy` class; probably in combination with `Protocol`. It's a bit more work compared to annotating a normal python function, but it's quite easy nevertheless.
Regards, Bas
*From:* NumPy-Discussion <numpy-discussion-bounces+bas.vanbeek=hotmail.com@python.org> on behalf of Eric Wieser <wieser.eric+numpy@gmail.com> *Sent:* 21 June 2021 18:56 *To:* Discussion of Numerical Python <numpy-discussion@python.org> *Subject:* Re: [Numpy-discussion] copy="never" discussion and no deprecation cycle?
Stefan, that sketch is more complicated than it needs to be - `np.copy` is a python function, so you can just attach the attributes directly! (although maybe there are implications for static typing) ``` class CopyFlag(enum.Enum): IF_NEEDED = 0 ALWAYS = 1 NEVER = 2
np.copy.IF_NEEDED = CopyFlag.IF_NEEDED np.copy.ALWAYS = CopyFlag.ALWAYS np.copy.NEVER = CopyFlag.NEVER ``` It would also work nicely for the `True/False/other` version that was proposed in the much older PR as `np.never_copy`: ``` class _CopyNever: def __bool__(self): raise ValueError
np.copy.NEVER = _CopyNever() ```
All of these versions (and using the enum directly) seem fine to me. If we go down the enum route route, we probably want to add "new-style" versions of `np.CLIP` and friends that are true enums / live within a more obvious namespace.
Eric
On Mon, 21 Jun 2021 at 17:24, Stefan van der Walt <stefanv@berkeley.edu> wrote:
On Sun, Jun 20, 2021, at 20:46, Gagandeep Singh wrote:
I have recently joined the mailing list and have gone through the previous discussions on this thread. I would like to share my analysis (advantages and disadvantages) of three possible alternatives (Enum, String, boolean) to support the proposed feature.
Thanks for this thorough analysis, Gagandeep!
I'll throw one more heretical idea out there:
`np.copy.IF_NEEDED`, `np.copy.ALWAYS`, `np.copy.NEVER`.
This has the advantages of the enum, doesn't pollute the global namespace, and has an intuitive name.
`np.array(x, copy=np.copy.ALWAYS)`
It would be slightly more awkward to type, but is doable. A rough Python version sketch would be:
class CopyFlag(enum.Enum): IF_NEEDED = 0 ALWAYS = 1 NEVER = 2
class NpCopy: IF_NEEDED : CopyFlag = CopyFlag.IF_NEEDED ALWAYS : CopyFlag = CopyFlag.ALWAYS NEVER : CopyFlag = CopyFlag.NEVER
def __call__(self, x): return ...whatever copy returns...
np.copy = NpCopy()
Stéfan _______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@python.org https://mail.python.org/mailman/listinfo/numpy-discussion
NumPy-Discussion mailing list NumPy-Discussion@python.org https://mail.python.org/mailman/listinfo/numpy-discussion
![](https://secure.gravatar.com/avatar/008b55030cffb9a4c4f7d8422e10343e.jpg?s=120&d=mm&r=g)
Personally I was a fan of the Enum approach. People dislike it because it is not “Pythonic”, but imho that is an accident of history because Enums only appeared (iirc) in Python 3.4. In fact, they are the right data structure for this particular problem, so for my money we should *make it* Pythonic by starting to use it everywhere where we have a finite list of choices. Juan.
On 24 Jun 2021, at 4:09 am, Stefan van der Walt <stefanv@berkeley.edu> wrote:
Perhaps it is also worth going back to explore our original motivation for making this change.
One reason was that Sebastian didn't like people doing `x.shape = ...`. Users do that, presumably, to trigger an error if a copy needs to be made. However, we can catch that already:
x = np.reshape(y, ...) if np.may_share_memory(x, y): ...
We can fix Sebastian's issue by introducing a `copy` keyword to `reshape`, which currently has none:
x = np.reshape(y, copy='never')
For consistency, it would be nice to have `np.array` support copy=never, but if there is no urgency we can take the long route towards an API that uses strings (consistent with the rest of NumPy).
The arguments against string names *right now* is that, if users write code with `copy='if-needed'` it will not work correctly with old NumPy code, since old versions will evaluate `if-needed` to True. The assessment was that this happens frequently, but we should consider how frequently, and how big of an issue it is.
So, I guess ultimately I am wondering if the change to `np.array` is needed right now, or whether we can get away without it for a while.
Stéfan
On Tue, Jun 22, 2021, at 15:21, bas van beek wrote:
Stefan, that sketch is more complicated than it needs to be - `np.copy` is a python function, so you can just attach the attributes directly! (although maybe there are implications for static typing)
For the type annotations we can simply use something akin to Stéfans proposed `NpCopy` class; probably in combination with `Protocol`. It's a bit more work compared to annotating a normal python function, but it's quite easy nevertheless.
Regards, Bas
*From:* NumPy-Discussion <numpy-discussion-bounces+bas.vanbeek=hotmail.com@python.org> on behalf of Eric Wieser <wieser.eric+numpy@gmail.com> *Sent:* 21 June 2021 18:56 *To:* Discussion of Numerical Python <numpy-discussion@python.org> *Subject:* Re: [Numpy-discussion] copy="never" discussion and no deprecation cycle?
Stefan, that sketch is more complicated than it needs to be - `np.copy` is a python function, so you can just attach the attributes directly! (although maybe there are implications for static typing) ``` class CopyFlag(enum.Enum): IF_NEEDED = 0 ALWAYS = 1 NEVER = 2
np.copy.IF_NEEDED = CopyFlag.IF_NEEDED np.copy.ALWAYS = CopyFlag.ALWAYS np.copy.NEVER = CopyFlag.NEVER ``` It would also work nicely for the `True/False/other` version that was proposed in the much older PR as `np.never_copy`: ``` class _CopyNever: def __bool__(self): raise ValueError
np.copy.NEVER = _CopyNever() ```
All of these versions (and using the enum directly) seem fine to me. If we go down the enum route route, we probably want to add "new-style" versions of `np.CLIP` and friends that are true enums / live within a more obvious namespace.
Eric
On Mon, 21 Jun 2021 at 17:24, Stefan van der Walt <stefanv@berkeley.edu> wrote:
On Sun, Jun 20, 2021, at 20:46, Gagandeep Singh wrote:
I have recently joined the mailing list and have gone through the previous discussions on this thread. I would like to share my analysis (advantages and disadvantages) of three possible alternatives (Enum, String, boolean) to support the proposed feature.
Thanks for this thorough analysis, Gagandeep!
I'll throw one more heretical idea out there:
`np.copy.IF_NEEDED`, `np.copy.ALWAYS`, `np.copy.NEVER`.
This has the advantages of the enum, doesn't pollute the global namespace, and has an intuitive name.
`np.array(x, copy=np.copy.ALWAYS)`
It would be slightly more awkward to type, but is doable. A rough Python version sketch would be:
class CopyFlag(enum.Enum): IF_NEEDED = 0 ALWAYS = 1 NEVER = 2
class NpCopy: IF_NEEDED : CopyFlag = CopyFlag.IF_NEEDED ALWAYS : CopyFlag = CopyFlag.ALWAYS NEVER : CopyFlag = CopyFlag.NEVER
def __call__(self, x): return ...whatever copy returns...
np.copy = NpCopy()
Stéfan _______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@python.org https://mail.python.org/mailman/listinfo/numpy-discussion
NumPy-Discussion mailing list NumPy-Discussion@python.org https://mail.python.org/mailman/listinfo/numpy-discussion
_______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@python.org https://mail.python.org/mailman/listinfo/numpy-discussion
![](https://secure.gravatar.com/avatar/d9ac9213ada4a807322f99081296784b.jpg?s=120&d=mm&r=g)
On Wed, Jun 23, 2021, at 18:01, Juan Nunez-Iglesias wrote:
Personally I was a fan of the Enum approach. People dislike it because it is not “Pythonic”, but imho that is an accident of history because Enums only appeared (iirc) in Python 3.4. In fact, they are the right data structure for this particular problem, so for my money we should *make it* Pythonic by starting to use it everywhere where we have a finite list of choices.
The enum definitely feels like the right abstraction. But the resulting API is clunky because of naming and top-level scarcity. Hence the suggestion to tag it onto np.copy, but there is an argument to be made for consistency by placing all enums under np.flags or similar. Still, np.flags.copy.IF_NEEDED gets long. Stéfan
![](https://secure.gravatar.com/avatar/697900d3a29858ea20cc109a2aee0af6.jpg?s=120&d=mm&r=g)
Why not both? The definition of the enum might live in a proper namespace location, but I see no reason why `np.copy.IF_NEEDED = np.flags.CopyFlgs.IF_NEEDED` can't be done (I mean, adding the enum members as attributes to the `np.copy()` function). Seems perfectly reasonable to me, and reads pretty nicely, too. It isn't like we are dropping support for the booleans, so those are still around for easy typing. Ben Root On Wed, Jun 23, 2021 at 10:26 PM Stefan van der Walt <stefanv@berkeley.edu> wrote:
On Wed, Jun 23, 2021, at 18:01, Juan Nunez-Iglesias wrote:
Personally I was a fan of the Enum approach. People dislike it because it is not “Pythonic”, but imho that is an accident of history because Enums only appeared (iirc) in Python 3.4. In fact, they are the right data structure for this particular problem, so for my money we should *make it* Pythonic by starting to use it everywhere where we have a finite list of choices.
The enum definitely feels like the right abstraction. But the resulting API is clunky because of naming and top-level scarcity.
Hence the suggestion to tag it onto np.copy, but there is an argument to be made for consistency by placing all enums under np.flags or similar.
Still, np.flags.copy.IF_NEEDED gets long.
Stéfan _______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@python.org https://mail.python.org/mailman/listinfo/numpy-discussion
![](https://secure.gravatar.com/avatar/2e1cb59230b4e566e6378acc92f65bcb.jpg?s=120&d=mm&r=g)
To me, adding enums as attributes of the `np.copy` function seems like a pretty good idea. This trick might resolve the only relatively important issue with Enums. Then, the benefits of Enum might outweigh the disadvantage of uncommon of usage of Enums in NumPy APIs. As an end user, I would like Enums rather than strings as the former would provide fixed number of choices (hence, easy debugging) as compared to the latter (in which case, infinite choices for passing strings and the code may work silently, imagine, passing, `if_neded` instead of `if_needed` and it working perfectly fine (silently). This thing has happened to me while using another library. On Thu, Jun 24, 2021 at 8:05 AM Benjamin Root <ben.v.root@gmail.com> wrote:
Why not both? The definition of the enum might live in a proper namespace location, but I see no reason why `np.copy.IF_NEEDED = np.flags.CopyFlgs.IF_NEEDED` can't be done (I mean, adding the enum members as attributes to the `np.copy()` function). Seems perfectly reasonable to me, and reads pretty nicely, too. It isn't like we are dropping support for the booleans, so those are still around for easy typing.
Ben Root
On Wed, Jun 23, 2021 at 10:26 PM Stefan van der Walt <stefanv@berkeley.edu> wrote:
On Wed, Jun 23, 2021, at 18:01, Juan Nunez-Iglesias wrote:
Personally I was a fan of the Enum approach. People dislike it because it is not “Pythonic”, but imho that is an accident of history because Enums only appeared (iirc) in Python 3.4. In fact, they are the right data structure for this particular problem, so for my money we should *make it* Pythonic by starting to use it everywhere where we have a finite list of choices.
The enum definitely feels like the right abstraction. But the resulting API is clunky because of naming and top-level scarcity.
Hence the suggestion to tag it onto np.copy, but there is an argument to be made for consistency by placing all enums under np.flags or similar.
Still, np.flags.copy.IF_NEEDED gets long.
Stéfan _______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@python.org https://mail.python.org/mailman/listinfo/numpy-discussion
_______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@python.org https://mail.python.org/mailman/listinfo/numpy-discussion
![](https://secure.gravatar.com/avatar/5f88830d19f9c83e2ddfd913496c5025.jpg?s=120&d=mm&r=g)
On Thu, Jun 24, 2021 at 6:12 AM Gagandeep Singh <gsingh@quansight.com> wrote:
To me, adding enums as attributes of the `np.copy` function seems like a pretty good idea. This trick might resolve the only relatively important issue with Enums. Then, the benefits of Enum might outweigh the disadvantage of uncommon of usage of Enums in NumPy APIs. As an end user, I would like Enums rather than strings as the former would provide fixed number of choices (hence, easy debugging) as compared to the latter (in which case, infinite choices for passing strings and the code may work silently, imagine, passing, `if_neded` instead of `if_needed` and it working perfectly fine (silently). This thing has happened to me while using another library.
On Thu, Jun 24, 2021 at 8:05 AM Benjamin Root <ben.v.root@gmail.com> wrote:
Why not both? The definition of the enum might live in a proper namespace location, but I see no reason why `np.copy.IF_NEEDED = np.flags.CopyFlgs.IF_NEEDED` can't be done (I mean, adding the enum members as attributes to the `np.copy()` function). Seems perfectly reasonable to me, and reads pretty nicely, too. It isn't like we are dropping support for the booleans, so those are still around for easy typing.
Ben Root
On Wed, Jun 23, 2021 at 10:26 PM Stefan van der Walt < stefanv@berkeley.edu> wrote:
On Wed, Jun 23, 2021, at 18:01, Juan Nunez-Iglesias wrote:
Personally I was a fan of the Enum approach. People dislike it because it is not “Pythonic”, but imho that is an accident of history because Enums only appeared (iirc) in Python 3.4. In fact, they are the right data structure for this particular problem, so for my money we should *make it* Pythonic by starting to use it everywhere where we have a finite list of choices.
The enum definitely feels like the right abstraction. But the resulting API is clunky because of naming and top-level scarcity.
I agree with this. Enums are nice _in theory_, but once you start using
Any well-designed function accepting strings should do input validation though, to raise an error in case of mis-spelling. them you quickly figure out they're clunky, plus the all-caps looks bad (I'd consider ignoring that style recommendation). For API design they don't make all that much sense compared to "here's a list of strings we accept, and everything else raises an informative error". The only reasons I can think of to use them are: 1. Cases like never-copy, when there's a reason to have an object we can add a method too (`__bool__` here) 2. There's a long list of options and we want to give users a way to explore or iterate over those, so a public object is useful. so cases where we'd otherwise use a class (instance) instead of documenting the string options. I can't think of many examples like this, padding modes for `scipy.ndimage.convolve` is the only one that comes to mind. In general I don't expect we'd need (m)any more. Hence I'd suggest adding a new namespace like `np.flags` is not a good idea. Right now all we need is a single object, if we end up going the enum route. For this one, I'd say it kinda looks like we do need one, so then let's just add one and be done with it, rather than inventing odd patterns like tacking enum members onto an existing function. Cheers, Ralf
Hence the suggestion to tag it onto np.copy, but there is an argument to be made for consistency by placing all enums under np.flags or similar.
Still, np.flags.copy.IF_NEEDED gets long.
Stéfan _______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@python.org https://mail.python.org/mailman/listinfo/numpy-discussion
_______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@python.org https://mail.python.org/mailman/listinfo/numpy-discussion
_______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@python.org https://mail.python.org/mailman/listinfo/numpy-discussion
![](https://secure.gravatar.com/avatar/d9ac9213ada4a807322f99081296784b.jpg?s=120&d=mm&r=g)
On Thu, Jun 24, 2021, at 01:03, Ralf Gommers wrote:
For this one, I'd say it kinda looks like we do need one, so then let's just add one and be done with it, rather than inventing odd patterns like tacking enum members onto an existing function.
There are two arguments on the table that resonate with me: 1. Chuck argues that the current `copy=False` behavior (which, in fact, means copy-if-needed) is nonsensical and should be fixed. 2. Ralf argues that strings are ultimately the interface we'd like to see. To achieve (1), we would need a deprecation cycle. During that deprecation cycle, we would need to provide a way to continue providing 'copy-if-needed' behavior. This can be achieved either with an enum or by accepting strings. Stephan argues that accepting strings will be harmful to new code running on old versions of NumPy. I would still like to get a sense of how often this happens, or if that is a hit we are willing to take. If we decide that the concern is a significant one, then we would have to go the enum route, at least for a while. However, I see no compelling reason to have that enum live in the top-level namespace though: it is for relatively advanced use, and it will be temporary. If we take the enum route, how do we get to (2)? We add a type check for a few releases and raise an error on string arguments (or, alternatively, handle 'always'/'never'/'if_needed' without advertising that functionality). Then, once we switch to string arguments, users will get an error (for old NumPy) or it will work as expected (for new NumPy). I didn't think so originally, but I suppose we are in NEP territory now. Stéfan
![](https://secure.gravatar.com/avatar/5f88830d19f9c83e2ddfd913496c5025.jpg?s=120&d=mm&r=g)
Let's see if we can finalize this. On Thu, Jun 24, 2021 at 9:23 PM Stefan van der Walt <stefanv@berkeley.edu> wrote:
On Thu, Jun 24, 2021, at 01:03, Ralf Gommers wrote:
For this one, I'd say it kinda looks like we do need one, so then let's just add one and be done with it, rather than inventing odd patterns like tacking enum members onto an existing function.
There are two arguments on the table that resonate with me:
1. Chuck argues that the current `copy=False` behavior (which, in fact, means copy-if-needed) is nonsensical and should be fixed. 2. Ralf argues that strings are ultimately the interface we'd like to see.
To achieve (1), we would need a deprecation cycle. During that deprecation cycle, we would need to provide a way to continue providing 'copy-if-needed' behavior. This can be achieved either with an enum or by accepting strings.
Stephan argues that accepting strings will be harmful to new code running on old versions of NumPy. I would still like to get a sense of how often this happens, or if that is a hit we are willing to take. If we decide that the concern is a significant one, then we would have to go the enum route, at least for a while. However, I see no compelling reason to have that enum live in the top-level namespace though: it is for relatively advanced use, and it will be temporary.
If we take the enum route, how do we get to (2)? We add a type check for a few releases and raise an error on string arguments (or, alternatively, handle 'always'/'never'/'if_needed' without advertising that functionality). Then, once we switch to string arguments, users will get an error (for old NumPy) or it will work as expected (for new NumPy).
What Stephan said in his last email seems right, just switch to strings at some point (probably after 3 years or so), and stop recommending the enum.
I didn't think so originally, but I suppose we are in NEP territory now.
I don't think so. We basically arrived at the solution, and there's a PR that is mostly done too. This really isn't that complicated that we should require a NEP. Cheers, Ralf
![](https://secure.gravatar.com/avatar/d9ac9213ada4a807322f99081296784b.jpg?s=120&d=mm&r=g)
On Sun, Jul 4, 2021, at 13:00, Ralf Gommers wrote:
I don't think so. We basically arrived at the solution, and there's a PR that is mostly done too. This really isn't that complicated that we should require a NEP.
Personally, I don't like np.CopyMode in the main namespace. If we can agree to stash it somewhere else, and tentatively aim to move to strings at point X in time for consistency with the rest of the API, I have no issue with going ahead. Stéfan
![](https://secure.gravatar.com/avatar/5f88830d19f9c83e2ddfd913496c5025.jpg?s=120&d=mm&r=g)
On Mon, Jul 5, 2021 at 3:53 AM Stefan van der Walt <stefanv@berkeley.edu> wrote:
On Sun, Jul 4, 2021, at 13:00, Ralf Gommers wrote:
I don't think so. We basically arrived at the solution, and there's a PR that is mostly done too. This really isn't that complicated that we should require a NEP.
Personally, I don't like np.CopyMode in the main namespace. If we can agree to stash it somewhere else, and tentatively aim to move to strings at point X in time for consistency with the rest of the API, I have no issue with going ahead.
I share your dislike, but I don't really see a better place where it doesn't make it even harder to spell, but I did just think of an alternative that may actually be quite reasonable: keep it private. The reason why Gagandeep started working on this is so we can have the never-copy behavior in the `numpy.array_api` namespace. For the `asarray` function there, the `copy` keyword is still boolean, with description: Whether or not to make a copy of the input. If True, always copies. If False, never copies for input which supports DLPack or the buffer protocol, and raises ValueError in case that would be necessary. If None , reuses existing memory buffer if possible, copies otherwise. Default: None. In the end I think that's better than strings, and way better than enums - we just can't have that in the main namespace, because we can't change what `False` does. Cheers, Ralf
![](https://secure.gravatar.com/avatar/d9ac9213ada4a807322f99081296784b.jpg?s=120&d=mm&r=g)
On Mon, Jul 5, 2021, at 00:42, Ralf Gommers wrote:
I share your dislike, but I don't really see a better place where it doesn't make it even harder to spell, but I did just think of an alternative that may actually be quite reasonable: keep it private.
That would be fine. We haven't had this feature requested for many years, so as long as it is available in some shape or form it should satisfy the advanced users who need it. It also doesn't force us into a decision we cannot reverse (adding to the top-level API).
The reason why Gagandeep started working on this is so we can have the never-copy behavior in the `numpy.array_api` namespace. For the `asarray` function there, the `copy` keyword is still boolean, with description:
Whether or not to make a copy of the input. If ` True`, always copies. If ` False`, never copies for input which supports DLPack or the buffer protocol, and raises ` ValueError`` `in case that would be necessary. If ` None ` , reuses existing memory buffer if possible, copies otherwise. Default: ` None`.
In the end I think that's better than strings, and way better than enums - we just can't have that in the main namespace, because we can't change what `False` does.
I agree <https://github.com/numpy/numpy/pull/19173#issuecomment-858226896> that this is a good API (although not everybody else does). <https://github.com/numpy/numpy/pull/19173#issuecomment-860314626> W.r.t. NumPy's API: it could be okay to change the behavior of copy=False to make it more strict (no copies ever), because then at least errors will be raised and we can provide a message with instructions on how to fix it. Stéfan
![](https://secure.gravatar.com/avatar/b4f6d4f8b501cb05fd054944a166a121.jpg?s=120&d=mm&r=g)
On Mon, 2021-07-05 at 11:17 -0700, Stefan van der Walt wrote:
On Mon, Jul 5, 2021, at 00:42, Ralf Gommers wrote:
I share your dislike, but I don't really see a better place where it doesn't make it even harder to spell, but I did just think of an alternative that may actually be quite reasonable: keep it private.
That would be fine. We haven't had this feature requested for many years, so as long as it is available in some shape or form it should satisfy the advanced users who need it. It also doesn't force us into a decision we cannot reverse (adding to the top-level API).
I am happy with (semi?)-private. Although, I would prefer a long-term goal we can work towards.
The reason why Gagandeep started working on this is so we can have the never-copy behavior in the `numpy.array_api` namespace. For the `asarray` function there, the `copy` keyword is still boolean, with description:
Whether or not to make a copy of the input. If ` True`, always copies. If ` False`, never copies for input which supports DLPack or the buffer protocol, and raises ` ValueError`` `in case that would be necessary. If ` None ` , reuses existing memory buffer if possible, copies otherwise. Default: ` None`.
In the end I think that's better than strings, and way better than enums - we just can't have that in the main namespace, because we can't change what `False` does.
If we can converge on this as an ideal API, should we really keep `copy=False` around without a warning? And if tag on a warning, maybe we may as well migrate NumPy itself (excruciatingly slow if necessary)? We seem to find some principle to dislike every single idea (I am probably forgetting a few): * Enums: * Namespace bloat * (Maybe clunky spelling) * Strings: * not strictly backward compatible (if accidentally used on old versions or potentially `__array_function__`.) * Slow to transition necessary * (Possibly not a good mix with `True/False` in general) * Transition `copy={True, False, None}`: * "Terrible API for a 3-way option" * some users have to update their code (libraries more than end-users, and libraries are easier to update). and I am honestly not sure that any of those is worrying. My preference would be to decide on the ideal API, and then move towards it. And if we don't think `CopyMode` is the right solution then it should be added only "semi-public": Probably with an underscore and documented to go away again, but allowing a pattern of: if np.__version__ > 1.22.0: if hasattr(np, "_CopyMode"): never_copy = np._CopyMode.NEVER else: never_copy = "never" else: # oops For libraries that need to work around transition difficulties. About a NEP: I am not sure we need one, although I am not opposed. It may make sense... Especially if whatever we converge on violates some written or unwritten "policy". However, I am wary to bring up a possible NEP if there is no clarity of where things are going. IMO, a NEP should be a concrete proposal, and that means that whoever writes it must have confidence in a proposal. If we transitioned from the brain-storming stage to a "formal decision making" one, then maybe a NEP is what we need. But, I currently don't know what the concrete proposal would be. Cheers, Sebastian
I agree < https://github.com/numpy/numpy/pull/19173#issuecomment-858226896> that this is a good API (although not everybody else does). < https://github.com/numpy/numpy/pull/19173#issuecomment-860314626>
W.r.t. NumPy's API: it could be okay to change the behavior of copy=False to make it more strict (no copies ever), because then at least errors will be raised and we can provide a message with instructions on how to fix it.
Stéfan _______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@python.org https://mail.python.org/mailman/listinfo/numpy-discussion
![](https://secure.gravatar.com/avatar/93a76a800ef6c5919baa8ba91120ee98.jpg?s=120&d=mm&r=g)
On Mon, Jul 5, 2021 at 11:19 AM Stefan van der Walt <stefanv@berkeley.edu> wrote:
The reason why Gagandeep started working on this is so we can have the never-copy behavior in the `numpy.array_api` namespace. For the `asarray` function there, the `copy` keyword is still boolean, with description:
Whether or not to make a copy of the input. If True, always copies. If False, never copies for input which supports DLPack or the buffer protocol, and raises ValueError in case that would be necessary. If None , reuses existing memory buffer if possible, copies otherwise. Default: None.
In the end I think that's better than strings, and way better than enums - we just can't have that in the main namespace, because we can't change what `False` does.
I agree <https://github.com/numpy/numpy/pull/19173#issuecomment-858226896> that this is a good API (although not everybody else does). <https://github.com/numpy/numpy/pull/19173#issuecomment-860314626>
W.r.t. NumPy's API: it could be okay to change the behavior of copy=False to make it more strict (no copies ever), because then at least errors will be raised and we can provide a message with instructions on how to fix it.
Resurfacing this discussion, since Sebastian asked me to comment. After some reflection, I think my favorite solution now is True/False/None, including a deprecation cycle to migrate existing users of copy=False to use copy=None. This is the simplest adaptation of the existing argument, and in many cases where users are writing copy=False they may actually not be intending the current "maybe copy" behavior. Strings would be appropriate if we were starting from scratch, but breaking backwards compatibility is very problematic. I do like enums, but I recognize that they are not currently used in NumPy/SciPy, so they feel a little out of place, and expanding NumPy's namespace to add more enums also has a cost. I don't think the meme I linked to is entirely appropriate, because these aren't just three arbitrary modes -- two of the cases here really are "yes" or "no" copy, and the other is "maybe", which is a pretty common meaning for "None" as a default value (and users will rarely be writing copy=None themselves).
![](https://secure.gravatar.com/avatar/93a76a800ef6c5919baa8ba91120ee98.jpg?s=120&d=mm&r=g)
On Thu, Jun 24, 2021 at 1:03 AM Ralf Gommers <ralf.gommers@gmail.com> wrote:
I agree with this. Enums are nice _in theory_, but once you start using them you quickly figure out they're clunky, plus the all-caps looks bad (I'd consider ignoring that style recommendation). For API design they don't make all that much sense compared to "here's a list of strings we accept, and everything else raises an informative error". The only reasons I can think of to use them are:
1. Cases like never-copy, when there's a reason to have an object we can add a method too (`__bool__` here) 2. There's a long list of options and we want to give users a way to explore or iterate over those, so a public object is useful. so cases where we'd otherwise use a class (instance) instead of documenting the string options. I can't think of many examples like this, padding modes for `scipy.ndimage.convolve` is the only one that comes to mind.
I think Enums are a very clean abstraction for capturing a discrete set of options in a type-safe way, both at runtime and with static checks. You also don't have to keep lists of strings in sync, which makes them a little easier to document. That said, I agree that in most cases the overall benefits are rather marginal. I don't think it's worth a mass migration of existing NumPy functions, which uses strings for categorical options. In this particular case, I think there is a clear advantage to using an enum, to avoid inadvertent bugs with old versions of NumPy.
In general I don't expect we'd need (m)any more. Hence I'd suggest adding a new namespace like `np.flags` is not a good idea. Right now all we need is a single object, if we end up going the enum route.
For this one, I'd say it kinda looks like we do need one, so then let's just add one and be done with it, rather than inventing odd patterns like tacking enum members onto an existing function.
I agree with both of these. If we're only going to add a couple of enums, it's not worth worrying about a couple of extra objects polluting NumPy's namespace. I would just add np.CopyMode, rather than inventing a new design pattern. At some point in the future, we might either: (1) switch the interface to use strings, in which case we would stop recommending/documenting CopyMode (like plenty of other top level objects in the NumPy namespace) (2) add many more enums, in which case we can consider assigning enums as function attributes or putting them in a namespace. But so far the only other enum I've heard suggested is np.ClipMode. Adding two enums to the NumPy namespace would hardly make a difference at this point, given how many objects are already there.
![](https://secure.gravatar.com/avatar/6a2a454191fa75d4114ed05836a0b924.jpg?s=120&d=mm&r=g)
Dear all, On 25/06/2021 02:12, Stephan Hoyer wrote: ...
I think Enums are a very clean abstraction for capturing a discrete set of options in a type-safe way, both at runtime and with static checks. You also don't have to keep lists of strings in sync, which makes them a little easier to document.
That said, I agree that in most cases the overall benefits are rather marginal. I don't think it's worth a mass migration of existing NumPy functions, which uses strings for categorical options.
In this particular case, I think there is a clear advantage to using an enum, to avoid inadvertent bugs with old versions of NumPy.
...
At some point in the future, we might either: (1) switch the interface to use strings, in which case we would stop recommending/documenting CopyMode (like plenty of other top level objects in the NumPy namespace) (2) add many more enums, in which case we can consider assigning enums as function attributes or putting them in a namespace. But so far the only other enum I've heard suggested is np.ClipMode. Adding two enums to the NumPy namespace would hardly make a difference at this point, given how many objects are already there.
I'm just an interested observer, but it seems that going the enum route is a clear "practicality beats purity" decision for this case. I really don't see the need to eventually move [back] to strings. Also, perhaps I missed a discussion of it in the thread, but aren't enums also better for typechecking? I actually prefer enums overall for a number of reasons, but I agree that they are not worth a "mass migration".
![](https://secure.gravatar.com/avatar/697900d3a29858ea20cc109a2aee0af6.jpg?s=120&d=mm&r=g)
One reason was that Sebastian didn't like people doing `x.shape = ...`. Users do that, presumably, to trigger an error if a copy needs to be made.
Users do that because it is 1) easier than every other option, and 2) I am pretty sure we were encouraged to do it this way for the past 10 years. The whole "it won't copy" business (to me at least) was an added bonus. Most of the time, I didn't want to copy anyway, so, sure! `x.shape = ...` has been around for a long time, and you are going to have a hard time convincing people to drop using such an easy-to-use property setter in favor of an approach that adds more typing and takes a bit more to read. There's also lots and lots of online tutorials, books, and stackoverflow snippets that have this usage pattern. I think the horse has long since left the barn, the chickens came to roost, and the cows came home...
We can fix Sebastian's issue by introducing a `copy` keyword to `reshape`, which currently has none:
This isn't a terrible idea to pursue, regardless of what I said above! Explicit is better than implicit, and giving programmers the opportunity to be explicit about what sort of copy semantics they intend in more places would improve the library going forward. I also like to highlight what Chuck said a few posts ago about the fact that `copy=False` does not really mean what people might think it means, and taking steps to address that might also be good for the library. Ben Root
participants (12)
-
Andrew Jaffe
-
bas van beek
-
Benjamin Root
-
Charles R Harris
-
Eric Wieser
-
Gagandeep Singh
-
Juan Nunez-Iglesias
-
Matti Picus
-
Ralf Gommers
-
Sebastian Berg
-
Stefan van der Walt
-
Stephan Hoyer