Hi,

I have recently joined the mailing list and have gone through the previous discussions on this thread. I would like to share my analysis (advantages and disadvantages) of three possible alternatives (Enum, String, boolean) to support the proposed feature.

Enum

Advantages

1. Compatibility - Enums (currently, `np.CopyMode`) can be added to support the never copy feature without breaking any code which uses NumPy. Current values for `copy` arguments are `True` and `False` which can be easily mapped to two members of the above enum and the code will keep working as it used to be. Considering the large user base of NumPy, I think this is the most significant point to be considered.
2. Clarity and Consistency - Enums inherently provide consistency i.e., all the values of the copy argument will be of the same type and hence, one wouldn't have to worry much about using some special values just for the sake of prohibiting a deep copy. Also, Enums make the intention clear (np.CopyMode.ALWAYS, etc. already reflect the expected behaviour). Booleans like True and False are a bit cryptic in nature. In fact, the current behaviour of False is also a bit confusing. Enums can help in doing away with this issue without breaking anything which uses previous NumPy versions.
3. Code will break loudly - If anyone will try to use `np.CopyMode` on a previous version then the code will break loudly (AttributeError) rather than doing unpredictable things silently (fixing these is much more painful, especially in large code bases than updating the version).

Disadvantages

1. Polluting Namespace - Enums do pollute the global namespace. Maybe it's an unavoidable thing which comes with the usage of Enums.
2. Inconsistent with APIs where strings are used - Many NumPy API use strings for supporting various options for an argument. For example, `np.linalg.qr` accepts strings for different modes. I think this would be the first time (if it happens) for an Enum to be used in such a scenario.

Strings

Advantages

1. Consistent with other NumPy APIs - As I said above, strings will keep things consistent across NumPy.
2. Clarity and Consistency - Strings too provide clarity of intention regarding the behaviour of the code. If we support strings for all the cases of copy argument then it would be consistent as well.
3. No pollution of namespace.

I am not sure but supporting strings and booleans at least in new NumPy versions should be possible though doing that would not be as easy as Enums.

Disadvantages

1. Silent and Unpredictable behaviour on previous NumPy versions - Since, strings can be interpreted as Booleans internally, if anyone passes any non-empty string, it will map to `True` and hence the code will always do a deep copy, irrespective of the argument. So, there would be cases, when this thing will go unnoticed by the user, the unwanted consequences of which I think shouldn't be ignored while making a choice for this feature.

Boolean (True/False/None)

Advantages

1. Easy to extend - As of now True and False are already supported. None can additionally be used to support never copy.

Disadvantages

1. Silent behaviour in case of None and False - If someone passes None to some previous NumPy version then it may behave as False. Hence no error would be raised, but yeah the copy will be made only if needed.
2. Cryptic - The intention is not clearly reflected in these three values (in fact False is a bit relaxed in nature i.e., instead of never doing a copy it does only if needed which should have been the case with None).

Summary

To the best of my understanding, I think Booleans are not a good option when compared to String and Enums. Now, the choice is whether we are okay with unpredictable behaviour of user code in case of strings to reject Enums or we are okay with pollution of namespace to easily support previous API without breaking anything for future versions.

Please let me know if I missed any important points. Thanks.


On Mon, Jun 21, 2021 at 8:33 AM Stefan van der Walt <stefanv@berkeley.edu> wrote:
On Sun, Jun 20, 2021, at 18:53, Charles R Harris wrote:

On Fri, Jun 18, 2021 at 8:52 AM Stefan van der Walt <stefanv@berkeley.edu> wrote:

On Thu, Jun 17, 2021, at 16:23, Stephan Hoyer wrote: 
This happens all the time. Even if we make copy='never' an error *today*, users will be encountering existing versions of NumPy for years into the future, so we won't be able to change the behavior of copy='never' for a very long time. Our deprecation policy says we would need to wait at least one year for this, but frankly I'm not sure that's enough for the possibility of silent bugs. 3-4 years might be more realistic.

If we go the enum route, we may just as well deprecate string arguments at the same time so that we have the flexibility to introduce them again in the future.

That makes sense to me, but I think this would not preclude the enum from being introduced right now. If we make this change, the enum will become the only mechanism by which to get the behavior we currently have (copy-if-needed). 

Stéfan

_______________________________________________
NumPy-Discussion mailing list
NumPy-Discussion@python.org
https://mail.python.org/mailman/listinfo/numpy-discussion