[Numpy-discussion] copy="never" discussion and no deprecation cycle?

Gagandeep Singh gsingh at quansight.com
Sun Jun 20 23:46:35 EDT 2021


Hi,

I have recently joined the mailing list and have gone through the previous
discussions on this thread. I would like to share my analysis (advantages
and disadvantages) of three possible alternatives (Enum, String, boolean)
to support the proposed feature.

*Enum*

Advantages

1. Compatibility - Enums (currently, `np.CopyMode`) can be added to support
the never copy feature without breaking any code which uses NumPy. Current
values for `copy` arguments are `True` and `False` which can be easily
mapped to two members of the above enum and the code will keep working as
it used to be. Considering the large user base of NumPy, I think this is
the most significant point to be considered.
2. Clarity and Consistency - Enums inherently provide consistency i.e., all
the values of the copy argument will be of the same type and hence, one
wouldn't have to worry much about using some special values just for the
sake of prohibiting a deep copy. Also, Enums make the intention clear
(np.CopyMode.ALWAYS, etc. already reflect the expected behaviour). Booleans
like True and False are a bit cryptic in nature. In fact, the current
behaviour of False is also a bit confusing. Enums can help in doing away
with this issue without breaking anything which uses previous NumPy
versions.
3. Code will break loudly - If anyone will try to use `np.CopyMode` on a
previous version then the code will break loudly (AttributeError) rather
than doing unpredictable things silently (fixing these is much more
painful, especially in large code bases than updating the version).

Disadvantages

1. Polluting Namespace - Enums do pollute the global namespace. Maybe it's
an unavoidable thing which comes with the usage of Enums.
2. Inconsistent with APIs where strings are used - Many NumPy API use
strings for supporting various options for an argument. For example,
`np.linalg.qr` accepts strings for different modes. I think this would be
the first time (if it happens) for an Enum to be used in such a scenario.

*Strings*

Advantages

1. Consistent with other NumPy APIs - As I said above, strings will keep
things consistent across NumPy.
2. Clarity and Consistency - Strings too provide clarity of intention
regarding the behaviour of the code. If we support strings for all the
cases of copy argument then it would be consistent as well.
3. No pollution of namespace.

I am not sure but supporting strings and booleans at least in new NumPy
versions should be possible though doing that would not be as easy as Enums.

Disadvantages

1. Silent and Unpredictable behaviour on previous NumPy versions - Since,
strings can be interpreted as Booleans internally, if anyone passes any
non-empty string, it will map to `True` and hence the code will always do a
deep copy, irrespective of the argument. So, there would be cases, when
this thing will go unnoticed by the user, the unwanted consequences of
which I think shouldn't be ignored while making a choice for this feature.

*Boolean (True/False/None)*

Advantages

1. Easy to extend - As of now True and False are already supported. None
can additionally be used to support never copy.

Disadvantages

1. Silent behaviour in case of None and False - If someone passes None to
some previous NumPy version then it may behave as False. Hence no error
would be raised, but yeah the copy will be made only if needed.
2. Cryptic - The intention is not clearly reflected in these three values
(in fact False is a bit relaxed in nature i.e., instead of never doing a
copy it does only if needed which should have been the case with None).

*Summary*

To the best of my understanding, I think Booleans are not a good option
when compared to String and Enums. Now, the choice is whether we are okay
with unpredictable behaviour of user code in case of strings to reject
Enums or we are okay with pollution of namespace to easily support previous
API without breaking anything for future versions.

Please let me know if I missed any important points. Thanks.


On Mon, Jun 21, 2021 at 8:33 AM Stefan van der Walt <stefanv at berkeley.edu>
wrote:

> On Sun, Jun 20, 2021, at 18:53, Charles R Harris wrote:
>
>
> On Fri, Jun 18, 2021 at 8:52 AM Stefan van der Walt <stefanv at berkeley.edu>
> wrote:
>
>
> On Thu, Jun 17, 2021, at 16:23, Stephan Hoyer wrote:
>
> This happens all the time. Even if we make copy='never' an error *today*,
> users will be encountering existing versions of NumPy for years into the
> future, so we won't be able to change the behavior of copy='never' for a
> very long time. Our deprecation policy says we would need to wait at least
> one year for this, but frankly I'm not sure that's enough for
> the possibility of silent bugs. 3-4 years might be more realistic.
>
>
> If we go the enum route, we may just as well deprecate string arguments at
> the same time so that we have the flexibility to introduce them again in
> the future.
>
>
> That makes sense to me, but I think this would not preclude the enum from
> being introduced right now. If we make this change, the enum will become
> the only mechanism by which to get the behavior we currently have
> (copy-if-needed).
>
> Stéfan
>
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at python.org
> https://mail.python.org/mailman/listinfo/numpy-discussion
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mail.python.org/pipermail/numpy-discussion/attachments/20210621/b81a16ea/attachment-0001.html>


More information about the NumPy-Discussion mailing list