backwards compatibility and deprecation policy NEP
![](https://secure.gravatar.com/avatar/5f88830d19f9c83e2ddfd913496c5025.jpg?s=120&d=mm&r=g)
Hi all, Here is a first draft of a NEP on backwards compatibility and deprecation policy. This I think mostly formalized what we've done for the last couple of years, however I'm sure opinions and wish lists will differ here. Pull request: https://github.com/numpy/numpy/pull/11596 Rendered version: https://github.com/rgommers/numpy/blob/nep-backcompat/doc/neps/nep-0023-back... Full text below (ducks). Cheers, Ralf ======================================================= NEP 23 - Backwards compatibility and deprecation policy ======================================================= :Author: Ralf Gommers <ralf.gommers@gmail.com> :Status: Draft :Type: Process :Created: 2018-07-14 :Resolution: <url> (required for Accepted | Rejected | Withdrawn) Abstract -------- In this NEP we describe NumPy's approach to backwards compatibility, its deprecation and removal policy, and the trade-offs and decision processes for individual cases where breaking backwards compatibility is considered. Detailed description -------------------- NumPy has a very large user base. Those users rely on NumPy being stable and the code they write that uses NumPy functionality to keep working. NumPy is also actively maintained and improved -- and sometimes improvements require, or are made much easier, by breaking backwards compatibility. Finally, there are trade-offs in stability for existing users vs. avoiding errors or having a better user experience for new users. These competing needs often give rise to heated debates and delays in accepting or rejecting contributions. This NEP tries to address that by providing a policy as well as examples and rationales for when it is or isn't a good idea to break backwards compatibility. General principles: - Aim not to break users' code unnecessarily. - Aim never to change code in ways that can result in users silently getting incorrect results from their previously working code. - Backwards incompatible changes can be made, provided the benefits outweigh the costs. - When assessing the costs, keep in mind that most users do not read the mailing list, do not look at deprecation warnings, and sometimes wait more than one or two years before upgrading from their old version. And that NumPy has many hundreds of thousands or even a couple of million users, so "no one will do or use this" is very likely incorrect. - Benefits include improved functionality, usability and performance (in order of importance), as well as lower maintenance cost and improved future extensibility. - Bug fixes are exempt from the backwards compatibility policy. However in case of serious impact on users (e.g. a downstream library doesn't build anymore), even bug fixes may have to be delayed for one or more releases. - The Python API and the C API will be treated in the same way. Examples ^^^^^^^^ We now discuss a number of concrete examples to illustrate typical issues and trade-offs. **Changing the behavior of a function** ``np.histogram`` is probably the most infamous example. First, a new keyword ``new=False`` was introduced, this was then switched over to None one release later, and finally it was removed again. Also, it has a ``normed`` keyword that had behavior that could be considered either suboptimal or broken (depending on ones opinion on the statistics). A new keyword ``density`` was introduced to replace it; ``normed`` started giving ``DeprecationWarning`` only in v.1.15.0. Evolution of ``histogram``:: def histogram(a, bins=10, range=None, normed=False): # v1.0.0 def histogram(a, bins=10, range=None, normed=False, weights=None, new=False): #v1.1.0 def histogram(a, bins=10, range=None, normed=False, weights=None, new=None): #v1.2.0 def histogram(a, bins=10, range=None, normed=False, weights=None): #v1.5.0 def histogram(a, bins=10, range=None, normed=False, weights=None, density=None): #v1.6.0 def histogram(a, bins=10, range=None, normed=None, weights=None, density=None): #v1.15.0 # v1.15.0 was the first release where `normed` started emitting # DeprecationWarnings The ``new`` keyword was planned from the start to be temporary; such a plan forces users to change their code more than once. Such keywords (there have been other instances proposed, e.g. ``legacy_index`` in `NEP 21 <http://www.numpy.org/neps/nep-0021-advanced-indexing.html>`_) are not desired. The right thing to have done here would probably have been to deprecate ``histogram`` and introduce a new function ``hist`` in its place. **Returning a view rather than a copy** The ``ndarray.diag`` method used to return a copy. A view would be better for both performance and design consistency. This change was warned about (``FutureWarning``) in v.8.0, and in v1.9.0 ``diag`` was changed to return a *read-only* view. The planned change to a writeable view in v1.10.0 was postponed due to backwards compatibility concerns, and is still an open issue (gh-7661). What should have happened instead: nothing. This change resulted in a lot of discussions and wasted effort, did not achieve its final goal, and was not that important in the first place. Finishing the change to a *writeable* view in the future is not desired, because it will result in users silently getting different results if they upgraded multiple versions or simply missed the warnings. **Disallowing indexing with floats** Indexing an array with floats is asking for something ambiguous, and can be a sign of a bug in user code. After some discussion, it was deemed a good idea to deprecate indexing with floats. This was first tried for the v1.8.0 release, however in pre-release testing it became clear that this would break many libraries that depend on NumPy. Therefore it was reverted before release, to give those libraries time to fix their code first. It was finally introduced for v1.11.0 and turned into a hard error for v1.12.0. This change was disruptive, however it did catch real bugs in e.g. SciPy and scikit-learn. Overall the change was worth the cost, and introducing it in master first to allow testing, then removing it again before a release, is a useful strategy. Similar recent deprecations also look like good examples of cleanups/improvements: - removing deprecated boolean indexing (gh-8312) - deprecating truth testing on empty arrays (gh-9718) - deprecating ``np.sum(generator)`` (gh-10670, one issue with this one is that its warning message is wrong - this should error in the future). **Removing the financial functions** The financial functions (e.g. ``np.pmt``) are badly named, are present in the main NumPy namespace, and don't really fit well with NumPy's scope. They were added in 2008 after `a discussion < https://mail.python.org/pipermail/numpy-discussion/2008-April/032353.html>`_ on the mailing list where opinion was divided (but a majority in favor). At the moment these functions don't cause a lot of overhead, however there are multiple issues and PRs a year for them which cost maintainer time to deal with. And they clutter up the ``numpy`` namespace. Discussion in 2013 happened on removing them again (gh-2880). This case is borderline, but given that they're clearly out of scope, deprecation and removal out of at least the main ``numpy`` namespace can be proposed. Alternatively, document clearly that new features for financial functions are unwanted, to keep the maintenance costs to a minimum. **Examples of features not added because of backwards compatibility** TODO: do we have good examples here? Possibly subclassing related? Removing complete submodules ^^^^^^^^^^^^^^^^^^^^^^^^^^^^ This year there have been suggestions to consider removing some or all of ``numpy.distutils``, ``numpy.f2py``, ``numpy.linalg``, and ``numpy.random``. The motivation was that all these cost maintenance effort, and that they slow down work on the core of Numpy (ndarrays, dtypes and ufuncs). The import on downstream libraries and users would be very large, and maintenance of these modules would still have to happen. Therefore this is simply not a good idea; removing these submodules should not happen even for a new major version of NumPy. Subclassing of ndarray ^^^^^^^^^^^^^^^^^^^^^^ Subclassing of ``ndarray`` is a pain point. ``ndarray`` was not (or at least not well) designed to be subclassed. Despite that, a lot of subclasses have been created even within the NumPy code base itself, and some of those (e.g. ``MaskedArray``, ``astropy.units.Quantity``) are quite popular. The main problems with subclasses are: - They make it hard to change ``ndarray`` in ways that would otherwise be backwards compatible. - Some of them change the behavior of ndarray methods, making it difficult to write code that accepts array duck-types. Subclassing ``ndarray`` has been officially discouraged for a long time. Of the most important subclasses, ``np.matrix`` will be deprecated (see gh-10142) and ``MaskedArray`` will be kept in NumPy (`NEP 17 <http://www.numpy.org/neps/nep-0017-split-out-maskedarray.html>`_). ``MaskedArray`` will ideally be rewritten in a way such that it uses only public NumPy APIs. For subclasses outside of NumPy, more work is needed to provide alternatives (e.g. mixins, see gh-9016 and gh-10446) or better support for custom dtypes (see gh-2899). Until that is done, subclasses need to be taken into account when making change to the NumPy code base. A future change in NumPy to not support subclassing will certainly need a major version increase. Policy ------ 1. Code changes that have the potential to silently change the results of a users' code must never be made (except in the case of clear bugs). 2. Code changes that break users' code (i.e. the user will see a clear exception) can be made, *provided the benefit is worth the cost* and suitable deprecation warnings have been raised first. 3. Deprecation warnings are in all cases warnings that functionality will be removed. If there is no intent to remove functionlity, then deprecation in documentation only or other types of warnings shall be used. 4. Deprecations for stylistic reasons (e.g. consistency between functions) are strongly discouraged. Deprecations: - shall include the version numbers of both when the functionality was deprecated and when it will be removed (either two releases after the warning is introduced, or in the next major version). - shall include information on alternatives to the deprecated functionality, or a reason for the deprecation if no clear alternative is available. - shall use ``VisibleDeprecationWarning`` rather than ``DeprecationWarning`` for cases of relevance to end users (as opposed to cases only relevant to libraries building on top of NumPy). - shall be listed in the release notes of the release where the deprecation happened. Removal of deprecated functionality: - shall be done after 2 releases (assuming a 6-monthly release cycle; if that changes, there shall be at least 1 year between deprecation and removal), unless the impact of the removal is such that a major version number increase is warranted. - shall be listed in the release notes of the release where the removal happened. Versioning: - removal of deprecated code can be done in any minor (but not bugfix) release. - for heavily used functionality (e.g. removal of ``np.matrix``, of a whole submodule, or significant changes to behavior for subclasses) the major version number shall be increased. In concrete cases where this policy needs to be applied, decisions are made according to the `NumPy governance model <https://docs.scipy.org/doc/numpy/dev/governance/index.html>`_. Functionality with more strict policies: - ``numpy.random`` has its own backwards compatibility policy, see `NEP 19 <http://www.numpy.org/neps/nep-0019-rng-policy.html>`_. - The file format for ``.npy`` and ``.npz`` files must not be changed in a backwards incompatible way. Alternatives ------------ **Being more agressive with deprecations.** The goal of being more agressive is to allow NumPy to move forward faster. This would avoid others inventing their own solutions (often in multiple places), as well as be a benefit to users without a legacy code base. We reject this alternative because of the place NumPy has in the scientific Python ecosystem - being fairly conservative is required in order to not increase the extra maintenance for downstream libraries and end users to an unacceptable level. **Semantic versioning.** This would change the versioning scheme for code removals; those could then only be done when the major version number is increased. Rationale for rejection: semantic versioning is relatively common in software engineering, however it is not at all common in the Python world. Also, it would mean that NumPy's version number simply starts to increase faster, which would be more confusing than helpful. gh-10156 contains more discussion on this alternative. Discussion ---------- TODO This section may just be a bullet list including links to any discussions regarding the NEP: - This includes links to mailing list threads or relevant GitHub issues. References and Footnotes ------------------------ .. [1] TODO Copyright --------- This document has been placed in the public domain. [1]_
![](https://secure.gravatar.com/avatar/1198e2d145718c841565712312e04227.jpg?s=120&d=mm&r=g)
Hello, Very well written article! It takes a lot of important things into account. I think a number of things should be mentioned, if only in the alternatives: - One major version number change, with lots of “major version change” deprecations grouped into it, along with an LTS release. - The possibility of another major version change (possibly the same one) where we re-write all portions that were agreed upon (via NEPs) to be re-written, with a longer LTS release (3 years? 5?). - I’m thinking this one could be similar to the Python 2 -> Python 3 transition. Note that this is different from having constant breakages, this will be a mostly one-time effort and one-time breakage. - We break the ABI, but not most of the C API. - We port at least bug fixes and possibly oft-requested functionality to the old version for a long time. - But we fix all of the little things that are agreed upon by the community to be “missing” or “wrong” in the current release. It may be a while before this is adopted but it’ll be really beneficial in the long run. - We ping the dev-discussions of most major downstream users (SciPy, all the scikits, Matplotlib, etc.) for their “pain points” and also if they think this is a good idea. This way, the amount of users included aren’t just those on the NumPy mailing list. - We enforce good practices in our code. For example, we will explicitly disallow subclassing from ndarray, we get rid of scalars, we fix the type system. This may sound radical (I myself think so), but consider that if we get rid of a large amount of technical debt on the onset, have a reputation for a clean code-base (rather than one that’s decades old), then we could onboard a lot more active developers and existing developers can also get a lot more work done. I may be getting ahead of myself on this, but feel free to leave your thoughts and opinions. Best regards, Hameer Abbasi Sent from Astro <https://www.helloastro.com> for Mac On 22. Jul 2018 at 01:48, Ralf Gommers <ralf.gommers@gmail.com> wrote: Hi all, Here is a first draft of a NEP on backwards compatibility and deprecation policy. This I think mostly formalized what we've done for the last couple of years, however I'm sure opinions and wish lists will differ here. Pull request: https://github.com/numpy/numpy/pull/11596 Rendered version: https://github.com/rgommers/numpy/blob/nep-backcompat/doc/neps/nep-0023-back... Full text below (ducks). Cheers, Ralf ======================================================= NEP 23 - Backwards compatibility and deprecation policy ======================================================= :Author: Ralf Gommers <ralf.gommers@gmail.com> :Status: Draft :Type: Process :Created: 2018-07-14 :Resolution: <url> (required for Accepted | Rejected | Withdrawn) Abstract -------- In this NEP we describe NumPy's approach to backwards compatibility, its deprecation and removal policy, and the trade-offs and decision processes for individual cases where breaking backwards compatibility is considered. Detailed description -------------------- NumPy has a very large user base. Those users rely on NumPy being stable and the code they write that uses NumPy functionality to keep working. NumPy is also actively maintained and improved -- and sometimes improvements require, or are made much easier, by breaking backwards compatibility. Finally, there are trade-offs in stability for existing users vs. avoiding errors or having a better user experience for new users. These competing needs often give rise to heated debates and delays in accepting or rejecting contributions. This NEP tries to address that by providing a policy as well as examples and rationales for when it is or isn't a good idea to break backwards compatibility. General principles: - Aim not to break users' code unnecessarily. - Aim never to change code in ways that can result in users silently getting incorrect results from their previously working code. - Backwards incompatible changes can be made, provided the benefits outweigh the costs. - When assessing the costs, keep in mind that most users do not read the mailing list, do not look at deprecation warnings, and sometimes wait more than one or two years before upgrading from their old version. And that NumPy has many hundreds of thousands or even a couple of million users, so "no one will do or use this" is very likely incorrect. - Benefits include improved functionality, usability and performance (in order of importance), as well as lower maintenance cost and improved future extensibility. - Bug fixes are exempt from the backwards compatibility policy. However in case of serious impact on users (e.g. a downstream library doesn't build anymore), even bug fixes may have to be delayed for one or more releases. - The Python API and the C API will be treated in the same way. Examples ^^^^^^^^ We now discuss a number of concrete examples to illustrate typical issues and trade-offs. **Changing the behavior of a function** ``np.histogram`` is probably the most infamous example. First, a new keyword ``new=False`` was introduced, this was then switched over to None one release later, and finally it was removed again. Also, it has a ``normed`` keyword that had behavior that could be considered either suboptimal or broken (depending on ones opinion on the statistics). A new keyword ``density`` was introduced to replace it; ``normed`` started giving ``DeprecationWarning`` only in v.1.15.0. Evolution of ``histogram``:: def histogram(a, bins=10, range=None, normed=False): # v1.0.0 def histogram(a, bins=10, range=None, normed=False, weights=None, new=False): #v1.1.0 def histogram(a, bins=10, range=None, normed=False, weights=None, new=None): #v1.2.0 def histogram(a, bins=10, range=None, normed=False, weights=None): #v1.5.0 def histogram(a, bins=10, range=None, normed=False, weights=None, density=None): #v1.6.0 def histogram(a, bins=10, range=None, normed=None, weights=None, density=None): #v1.15.0 # v1.15.0 was the first release where `normed` started emitting # DeprecationWarnings The ``new`` keyword was planned from the start to be temporary; such a plan forces users to change their code more than once. Such keywords (there have been other instances proposed, e.g. ``legacy_index`` in `NEP 21 <http://www.numpy.org/neps/nep-0021-advanced-indexing.html>`_) are not desired. The right thing to have done here would probably have been to deprecate ``histogram`` and introduce a new function ``hist`` in its place. **Returning a view rather than a copy** The ``ndarray.diag`` method used to return a copy. A view would be better for both performance and design consistency. This change was warned about (``FutureWarning``) in v.8.0, and in v1.9.0 ``diag`` was changed to return a *read-only* view. The planned change to a writeable view in v1.10.0 was postponed due to backwards compatibility concerns, and is still an open issue (gh-7661). What should have happened instead: nothing. This change resulted in a lot of discussions and wasted effort, did not achieve its final goal, and was not that important in the first place. Finishing the change to a *writeable* view in the future is not desired, because it will result in users silently getting different results if they upgraded multiple versions or simply missed the warnings. **Disallowing indexing with floats** Indexing an array with floats is asking for something ambiguous, and can be a sign of a bug in user code. After some discussion, it was deemed a good idea to deprecate indexing with floats. This was first tried for the v1.8.0 release, however in pre-release testing it became clear that this would break many libraries that depend on NumPy. Therefore it was reverted before release, to give those libraries time to fix their code first. It was finally introduced for v1.11.0 and turned into a hard error for v1.12.0. This change was disruptive, however it did catch real bugs in e.g. SciPy and scikit-learn. Overall the change was worth the cost, and introducing it in master first to allow testing, then removing it again before a release, is a useful strategy. Similar recent deprecations also look like good examples of cleanups/improvements: - removing deprecated boolean indexing (gh-8312) - deprecating truth testing on empty arrays (gh-9718) - deprecating ``np.sum(generator)`` (gh-10670, one issue with this one is that its warning message is wrong - this should error in the future). **Removing the financial functions** The financial functions (e.g. ``np.pmt``) are badly named, are present in the main NumPy namespace, and don't really fit well with NumPy's scope. They were added in 2008 after `a discussion < https://mail.python.org/pipermail/numpy-discussion/2008-April/032353.html>`_ on the mailing list where opinion was divided (but a majority in favor). At the moment these functions don't cause a lot of overhead, however there are multiple issues and PRs a year for them which cost maintainer time to deal with. And they clutter up the ``numpy`` namespace. Discussion in 2013 happened on removing them again (gh-2880). This case is borderline, but given that they're clearly out of scope, deprecation and removal out of at least the main ``numpy`` namespace can be proposed. Alternatively, document clearly that new features for financial functions are unwanted, to keep the maintenance costs to a minimum. **Examples of features not added because of backwards compatibility** TODO: do we have good examples here? Possibly subclassing related? Removing complete submodules ^^^^^^^^^^^^^^^^^^^^^^^^^^^^ This year there have been suggestions to consider removing some or all of ``numpy.distutils``, ``numpy.f2py``, ``numpy.linalg``, and ``numpy.random``. The motivation was that all these cost maintenance effort, and that they slow down work on the core of Numpy (ndarrays, dtypes and ufuncs). The import on downstream libraries and users would be very large, and maintenance of these modules would still have to happen. Therefore this is simply not a good idea; removing these submodules should not happen even for a new major version of NumPy. Subclassing of ndarray ^^^^^^^^^^^^^^^^^^^^^^ Subclassing of ``ndarray`` is a pain point. ``ndarray`` was not (or at least not well) designed to be subclassed. Despite that, a lot of subclasses have been created even within the NumPy code base itself, and some of those (e.g. ``MaskedArray``, ``astropy.units.Quantity``) are quite popular. The main problems with subclasses are: - They make it hard to change ``ndarray`` in ways that would otherwise be backwards compatible. - Some of them change the behavior of ndarray methods, making it difficult to write code that accepts array duck-types. Subclassing ``ndarray`` has been officially discouraged for a long time. Of the most important subclasses, ``np.matrix`` will be deprecated (see gh-10142) and ``MaskedArray`` will be kept in NumPy (`NEP 17 <http://www.numpy.org/neps/nep-0017-split-out-maskedarray.html>`_). ``MaskedArray`` will ideally be rewritten in a way such that it uses only public NumPy APIs. For subclasses outside of NumPy, more work is needed to provide alternatives (e.g. mixins, see gh-9016 and gh-10446) or better support for custom dtypes (see gh-2899). Until that is done, subclasses need to be taken into account when making change to the NumPy code base. A future change in NumPy to not support subclassing will certainly need a major version increase. Policy ------ 1. Code changes that have the potential to silently change the results of a users' code must never be made (except in the case of clear bugs). 2. Code changes that break users' code (i.e. the user will see a clear exception) can be made, *provided the benefit is worth the cost* and suitable deprecation warnings have been raised first. 3. Deprecation warnings are in all cases warnings that functionality will be removed. If there is no intent to remove functionlity, then deprecation in documentation only or other types of warnings shall be used. 4. Deprecations for stylistic reasons (e.g. consistency between functions) are strongly discouraged. Deprecations: - shall include the version numbers of both when the functionality was deprecated and when it will be removed (either two releases after the warning is introduced, or in the next major version). - shall include information on alternatives to the deprecated functionality, or a reason for the deprecation if no clear alternative is available. - shall use ``VisibleDeprecationWarning`` rather than ``DeprecationWarning`` for cases of relevance to end users (as opposed to cases only relevant to libraries building on top of NumPy). - shall be listed in the release notes of the release where the deprecation happened. Removal of deprecated functionality: - shall be done after 2 releases (assuming a 6-monthly release cycle; if that changes, there shall be at least 1 year between deprecation and removal), unless the impact of the removal is such that a major version number increase is warranted. - shall be listed in the release notes of the release where the removal happened. Versioning: - removal of deprecated code can be done in any minor (but not bugfix) release. - for heavily used functionality (e.g. removal of ``np.matrix``, of a whole submodule, or significant changes to behavior for subclasses) the major version number shall be increased. In concrete cases where this policy needs to be applied, decisions are made according to the `NumPy governance model <https://docs.scipy.org/doc/numpy/dev/governance/index.html>`_. Functionality with more strict policies: - ``numpy.random`` has its own backwards compatibility policy, see `NEP 19 <http://www.numpy.org/neps/nep-0019-rng-policy.html>`_. - The file format for ``.npy`` and ``.npz`` files must not be changed in a backwards incompatible way. Alternatives ------------ **Being more agressive with deprecations.** The goal of being more agressive is to allow NumPy to move forward faster. This would avoid others inventing their own solutions (often in multiple places), as well as be a benefit to users without a legacy code base. We reject this alternative because of the place NumPy has in the scientific Python ecosystem - being fairly conservative is required in order to not increase the extra maintenance for downstream libraries and end users to an unacceptable level. **Semantic versioning.** This would change the versioning scheme for code removals; those could then only be done when the major version number is increased. Rationale for rejection: semantic versioning is relatively common in software engineering, however it is not at all common in the Python world. Also, it would mean that NumPy's version number simply starts to increase faster, which would be more confusing than helpful. gh-10156 contains more discussion on this alternative. Discussion ---------- TODO This section may just be a bullet list including links to any discussions regarding the NEP: - This includes links to mailing list threads or relevant GitHub issues. References and Footnotes ------------------------ .. [1] TODO Copyright --------- This document has been placed in the public domain. [1]_ _______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@python.org https://mail.python.org/mailman/listinfo/numpy-discussion
![](https://secure.gravatar.com/avatar/5f88830d19f9c83e2ddfd913496c5025.jpg?s=120&d=mm&r=g)
On Sat, Jul 21, 2018 at 5:46 PM, Hameer Abbasi <einstein.edison@gmail.com> wrote:
- We break the ABI, but not most of the C API.
Good catch, I didn't mention ABI at all. My opinion: breaking ABI will
still require a major version change, but the bar for it is now lower. Basically what Travis was arguing for years ago, only today his argument is actually true due to conda and binary wheels on the 3 major platforms.
I think it sounds nice in theory, but given the history on large design changes/decisions I don't believe we are able to get things right on a first big rewrite. For example "fix the type system" - we all would like something better, but in the 5+ years that we've talked about it, no one has even put a complete design on paper. And for ones we did do like __numpy_ufunc__ we definitely needed a few iterations. That points to gradual evolution being a better model. Cheers. Ralf
![](https://secure.gravatar.com/avatar/851ff10fbb1363b7d6111ac60194cc1c.jpg?s=120&d=mm&r=g)
Agreed that changes better be gradual, and that we do not have the manpower to do otherwise (I was slightly shocked to see that my 94 commits in the last two years make me the fourth most prolific contributor in that period... And that is from the couple of hours a week I use while procrastinating on things related to my astronomy day job!) -- Marten
![](https://secure.gravatar.com/avatar/851ff10fbb1363b7d6111ac60194cc1c.jpg?s=120&d=mm&r=g)
the idea of disallowing subclasses. But I'll add to that reply a more general sentiment, that I think one of the problems has been to think that as one develops code, one thinks one knows in advance what users may want to do with it, what input makes sense, etc. But at least I have found that I am often wrong, that I'm not imaginative enough to know what people may want to do. So, my sense is that the best one can do is to make as few assumptions as possible, so avoid coercing, etc. And if the code gets to a position where it needs to guess what is meant, it should just fail. -- Marten
![](https://secure.gravatar.com/avatar/97c543aca1ac7bbcfb5279d0300c8330.jpg?s=120&d=mm&r=g)
On Sat, Jul 21, 2018 at 5:46 PM, Hameer Abbasi <einstein.edison@gmail.com> wrote:
I agree that this approach should probably be discussed in the NEP, specifically in the "rejected alternatives" section. It keeps coming up, and the reasons why it doesn't work for numpy are not obvious, so well-meaning people will keep bringing it up. It'd be helpful to have a single authoritative place to link to explaining why we don't do things that way. The beginning of the NEP should maybe also state up front that we follow a rolling-deprecations model where different breaking changes happen simultaneously on their own timelines. It's so obvious to me that I didn't notice it was missing, but this is a helpful reminder that it's not obvious to everyone :-). -n -- Nathaniel J. Smith -- https://vorpus.org
![](https://secure.gravatar.com/avatar/851ff10fbb1363b7d6111ac60194cc1c.jpg?s=120&d=mm&r=g)
Hi Ralf, Overall, this looks good. But I think the subclassing section is somewhat misleading in suggesting `ndarray` is not well designed to be subclassed. At least, for neither my work on Quantity nor that on MaskedArray, I've found that the design of `ndarray` itself was a problem. Instead, it was the functions that were, as most were not written with subclassing or duck typing in mind, but rather with the assumption that all input should be an array, and that somehow it is useful to pass anything users pass in through `asarray`. With then layers on top to avoid this in specific circumstances... But perhaps this is what you meant? (I would agree, though, that some ndarray subclasses have been designed poorly - especially, matrix, which then led to a problematic duck array in sparse - and that this has resulted in substantial hassle. Also, subclassing the subclasses is much more problematic that subclassing ndarray - MaskedArray being a particularly annoying example!) The subclassing section also notes that subclassing has been discouraged for a long time. Is that so? Over time, I've certainly had comments from Nathaniel and some others in discussions of PRs that go in that direction, which perhaps reflected some internal consensus I wasn't aware of, but the documentation does not seem to discourage it (check, e.g., the subclassing section [1]). I also think that it may be good to keep in mind that until `__array_ufunc__`, there wasn't much of a choice - support for duck arrays was even more half-hearted (hopefully to become much better with `__array_function__`). Overall, it seems to me that these days in the python eco-system subclassing is simply expected to work. Even within numpy there are other examples (e.g., ufuncs, dtypes) for which there has been quite a bit of discussion about the benefits subclasses would bring. All the best, Marten [1] https://docs.scipy.org/doc/numpy/user/basics.subclassing.html
![](https://secure.gravatar.com/avatar/5f88830d19f9c83e2ddfd913496c5025.jpg?s=120&d=mm&r=g)
Hi Marten, Thanks for the thoughtful reply. On Sat, Jul 21, 2018 at 6:39 PM, Marten van Kerkwijk < m.h.vankerkwijk@gmail.com> wrote:
You're completely right I think. We have had problems with subclasses for a long time, but that is due to mainly np.matrix being badly behaved, which then led to code everywhere using asarray, which then led to lots of issues with other subclasses. This basically meant subclasses were problematic, and hence most numpy devs would like to not see more subclasses.
I think yes there is some vague but not written down mostly-consensus, due to the dynamic with asarray above.
True. I think long term duck arrays are the way to go, because asarray is not going to disappear. But for now we just have to do the best we can dealing with subclasses. The subclassing doc [1] really needs an update on what the practical issues are.
I'm now thinking what to do with the subclassing section in the NEP. Best to completely remove? I was triggered to include it by some things Stephan said last week about subclasses being a blocker to adding new features. So if we keep the section, it may be helpful for you and Stephan to help shape that. Cheers, Ralf
![](https://secure.gravatar.com/avatar/851ff10fbb1363b7d6111ac60194cc1c.jpg?s=120&d=mm&r=g)
Hi Ralf,
Perhaps this history is in fact useful to mention? To learn from mistakes, it must be possible to know about them!
Before suggesting further specific text, might it make sense for the NEP to note that since subclassing will not go away, there is value in having at least one non-trivial, well-designed subclass in numpy? I think eventually MaskedArray might become that: it would be an internal check that subclasses can work with all numpy functions (there is no reason for duplication of functions in `np.ma`!). It also is an example of a container-type subclass that adds extra information to an ndarray (since that information is itself array-like, it is not necessarily a super-logical subclass, but it is there... and can thus serve as an example). A second subclass which we have not discussed, but which I think is used quite a bit (from my statistics of one...), is `np.memmap`. Useful if only for showing that a relatively quick hack can give you something quite helpful. All the best, Marten
![](https://secure.gravatar.com/avatar/93a76a800ef6c5919baa8ba91120ee98.jpg?s=120&d=mm&r=g)
On Sat, Jul 21, 2018 at 6:40 PM Marten van Kerkwijk < m.h.vankerkwijk@gmail.com> wrote:
I can't speak for Ralf, but yes, this is part of what I had in mind. I don't think you can separate "core" objects/methods from functions that act on them. Either the entire system is designed to handle subclassing through some well-defined interface or is it not. If you don't design a system for subclassing but allow it anyways (and it's impossible to prohibit problematically in Python), then you can easily end up with very fragile systems that are difficult to modify or extend. As Ralf noted in the NEP, "Some of them change the behavior of ndarray methods, making it difficult to write code that accepts array duck-types." These changes end up having implications for apparently unrelated functions (e.g., np.median needing to call np.mean internally to handle units properly). I don't think anyone really wants that sort of behavior or lock-in in NumPy itself, but of course that is the price we pay for not having well-defined interfaces :). Hopefully NEP-18 will change that, and eventually we will be able to remove hacks from NumPy that we added only because there weren't any better alternatives available. For the NEP itself, i would not mention "A future change in NumPy to not support subclassing," because it's not as if subclassing is suddenly not going to work as of a certain NumPy release. Certain types of subclasses (e.g., those that only add extra methods and/or metadata and do not modify any existing functionality) have never been a problem and will be fine to support indefinitely. Rather, we might state that "At some point in the future, the NumPy development team may no longer interested in maintaining workarounds for specific subclasses, because other interfaces for extending NumPy are believed to be more maintainable/preferred." Overall, it seems to me that these days in the python eco-system
subclassing is simply expected to work.
I don't think this is true. You can use subclassing on builtin types like dict, but just because you can do it doesn't mean it's a good idea. If you change built-in methods to work in different ways other things will break in unexpected ways (or simply not change, also in unexpected ways). Probably the only really safe way to subclass a dictionary is to define the __missing__() method and not change any other aspects of the public interface directly.
![](https://secure.gravatar.com/avatar/93a76a800ef6c5919baa8ba91120ee98.jpg?s=120&d=mm&r=g)
On Mon, Jul 23, 2018 at 1:45 PM Marten van Kerkwijk < m.h.vankerkwijk@gmail.com> wrote:
My hope would be that NumPy gets out of the business of officially providing interfaces like subclassing that are this hard to maintain. In general, we try to hold ourselves to a higher standard of stable code, and this sets up unfortunate conflicts between the needs of different NumPy users. It is just that one should not remove functionality without providing the
better alternative!
Totally agreed!
![](https://secure.gravatar.com/avatar/1198e2d145718c841565712312e04227.jpg?s=120&d=mm&r=g)
On 23. Jul 2018 at 19:46, Stephan Hoyer <shoyer@gmail.com> wrote: On Sat, Jul 21, 2018 at 6:40 PM Marten van Kerkwijk < m.h.vankerkwijk@gmail.com> wrote:
I can't speak for Ralf, but yes, this is part of what I had in mind. I don't think you can separate "core" objects/methods from functions that act on them. Either the entire system is designed to handle subclassing through some well-defined interface or is it not. If you don't design a system for subclassing but allow it anyways ( and it's impossible to prohibit problematically in Python This isn’t really true. Metaprogramming to the rescue I guess. https://stackoverflow.com/questions/16564198/pythons-equivalent-of-nets-seal... Best regards, Hameer Abbasi Sent from Astro <https://www.helloastro.com> for Mac ), then you can easily end up with very fragile systems that are difficult to modify or extend. As Ralf noted in the NEP, "Some of them change the behavior of ndarray methods, making it difficult to write code that accepts array duck-types." These changes end up having implications for apparently unrelated functions (e.g., np.median needing to call np.mean internally to handle units properly). I don't think anyone really wants that sort of behavior or lock-in in NumPy itself, but of course that is the price we pay for not having well-defined interfaces :). Hopefully NEP-18 will change that, and eventually we will be able to remove hacks from NumPy that we added only because there weren't any better alternatives available. For the NEP itself, i would not mention "A future change in NumPy to not support subclassing," because it's not as if subclassing is suddenly not going to work as of a certain NumPy release. Certain types of subclasses (e.g., those that only add extra methods and/or metadata and do not modify any existing functionality) have never been a problem and will be fine to support indefinitely. Rather, we might state that "At some point in the future, the NumPy development team may no longer interested in maintaining workarounds for specific subclasses, because other interfaces for extending NumPy are believed to be more maintainable/preferred." Overall, it seems to me that these days in the python eco-system
subclassing is simply expected to work.
I don't think this is true. You can use subclassing on builtin types like dict, but just because you can do it doesn't mean it's a good idea. If you change built-in methods to work in different ways other things will break in unexpected ways (or simply not change, also in unexpected ways). Probably the only really safe way to subclass a dictionary is to define the __missing__() method and not change any other aspects of the public interface directly. _______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@python.org https://mail.python.org/mailman/listinfo/numpy-discussion
![](https://secure.gravatar.com/avatar/851ff10fbb1363b7d6111ac60194cc1c.jpg?s=120&d=mm&r=g)
Hi Ralf, Maybe as a concrete example of something that has been discussed, for which your proposed text makes (I think) clear what should or should not be done: Many of us hate that `np.array` (like, sadly, many other numpy parts) auto-converts anything not obviously array-like to dtype=object, and it has been suggested we should no longer do this by default [1]. Given your NEP, I think you would disagree with that path, as it would quite obviously break user's code (we also get regular issues about object arrays, which show that they are used a lot in the wild). So, instead I guess one might go with a route where one could explicitly tell `dtype=object` was not wanted (say, `dtype="everything-but-object')? All the best, Marten [1] https://github.com/numpy/numpy/issues/5353
![](https://secure.gravatar.com/avatar/97c543aca1ac7bbcfb5279d0300c8330.jpg?s=120&d=mm&r=g)
On Sat, Jul 21, 2018 at 4:48 PM, Ralf Gommers <ralf.gommers@gmail.com> wrote:
Oh *awesome*, thanks for putting this together. I think this is a great start, but I'd structure it a bit differently. So let me just make a few high-level comments first and see what you think. Regarding the "general principles" and then "policy": to me these feel like more a brainstorming list, that hasn't been fully distilled down into principles yet. I would try to structure it to start with the overarching principles (changes need to benefit users more than they harm them, numpy is widely used so breaking changes should by default be assumed to be fairly harmful, decisions should be based on data and actual effects on users rather than e.g. appealing to the docs or abstract aesthetic principles, silently getting wrong answer is much worse than a loud error), then talk about some of the ways this plays out (if people are currently silently getting the wrong answer -- which is the definition of a bug, but also shows up in the index-by-float case -- then that's really bad; some of our tools for collecting data about how bad a breakage is include testing prominent downstreams ourselves, adding warnings or making .0 releases and seeing how people react, etc.), and then examples. Speaking of examples: I hate to say this because in general I think using examples is a great idea. But... I think you should delete most of these examples. The problem is scope creep: the goal for this NEP (IMO) should be to lay out the principles we use to think about these issues in general, but right now it comes across as trying to lay down a final resolution on lots of specific issues (including several where there are ongoing conversations). It ends up like trying to squish multiple NEPs into one, which makes it hard to discuss, and also distracts from the core purpose. My suggestion: keep just two examples, histogram and indexing-with-floats. These are safely done and dusted, totally uncontroversial (AFAIK), and the first is a good illustration of how one can try to be careful and do the right thing but still get it all wrong, while the second is a good example of (a) how we gathered data and decided that an actually pretty disruptive change was nonetheless worth it, and (b) how we had to manage it through multiple false starts. Regarding the actual policy: One alteration to current practice jumped out at me. This policy categorically rules out all changes that could cause currently working code to silently start doing something wrong, regardless of the specific circumstances. That's not how we actually do things right now. Instead, our policy in recent years has been that such changes are permitted in theory, but (a) the starting presumption is that this is super harmful to users so we need a *very* good reason to do it, and (b) if we do go ahead with it, then during the deprecation period we use a highly-visible FutureWarning (instead of the invisible-by-default DeprecationWarning). Personally I think the current policy strikes a better balance. You can see some examples of where we've used this by running 'git log -S FUTUREWARNING -S FutureWarning' -- it's things like a bad default for the rcond argument in lstsq, an obscure and error-prone corner case in indexing (0addc016ba), strange semantics for NaT (https://mail.scipy.org/pipermail/numpy-discussion/2015-October/073968.html), ... we could quibble about individual cases, but I think that taking these on a case-by-case basis is better than ruling them out categorically. And in any case, that is what we do now, so if you want to change this, it's something we should discuss and probably write down some rationale and such :-). Regarding the major version number thing: ugh do we really want to talk about this more. I'd probably leave it out of the NEP entirely. If it stays in, I think it needs a clearer description of what counts as a "major" change. There are some examples of things that do "sound" major, but... the rest of our policy is all about measuring disruption based on effects on users, and by that metric, the index-by-float removal was pretty major. My guess is that by the time we finally remove np.matrix, the actual disruption will be less than it was for removing index-by-float. (As it should be, since keeping index-by-float around was actively causing bugs in even well-maintained downstreams, in a way that np.matrix doesn't.) -n -- Nathaniel J. Smith -- https://vorpus.org
![](https://secure.gravatar.com/avatar/5f88830d19f9c83e2ddfd913496c5025.jpg?s=120&d=mm&r=g)
On Sat, Jul 21, 2018 at 7:15 PM, Nathaniel Smith <njs@pobox.com> wrote:
Thanks, I'll try and rework the general principles, you have some excellent points in here.
I'm not sure this is the best thing to do. I can remove a couple, but aiming to be "totally uncontroversial" is almost impossible given the topic of the NEP. The diag view example is important I think, it's the second most discussed backwards compatibility issue next to histogram. I'm happy to remove the statement on what should happen with it going forward though. Then, I think it's not unreasonable to draw a couple of hard lines. For example, removing complete submodules like linalg or random has ended up on some draft brainstorm roadmap list because someone (no idea who) put it there after a single meeting. Clearly the cost-benefit of that is such that there's no point even discussing that more, so I'd rather draw that line here than every time someone open an issue. Very recent example: https://github.com/numpy/numpy/issues/11457 (remove auto-import of numpy.testing).
You're right here. Thanks for the examples. I'll update this according to your suggestion, and propose to use one of the examples (rcond probably) to illustrate.
I think it has value to keep it, and that it's not really possible to come up with a very clear description of "major". In particular, I'd like every deprecation message to say "this deprecated feature will be removed by release X.Y.0". At the moment we don't do that, so if users see a message they don't know if a removal will happen next year, in the far future (2.0), or never. The major version thing is quite useful to signal our intent. Doesn't mean we need to exhaustively discuss when to do a 2.0 though, I agree that that's not a very useful discussion right now. Happy to remove this though if people don't like it. Other opinions? Cheers, Ralf
![](https://secure.gravatar.com/avatar/93a76a800ef6c5919baa8ba91120ee98.jpg?s=120&d=mm&r=g)
On Sun, Jul 22, 2018 at 12:28 PM Ralf Gommers <ralf.gommers@gmail.com> wrote:
I think a more realistic policy would be to say, "This feature was deprecated by release X.Y and may be removed as early as release X.Z." In general we have been conservative in terms of actually finalizing deprecations in NumPy, which I think is warranted given irregularity of our release cycle. It's hard to know exactly which release is going to come out a year or 18 months from when a deprecation starts.
Happy to remove this though if people don't like it. Other opinions?
I would also lean towards removing mention of any major version changes for NumPy.
![](https://secure.gravatar.com/avatar/93a76a800ef6c5919baa8ba91120ee98.jpg?s=120&d=mm&r=g)
On Sun, Jul 22, 2018 at 12:28 PM Ralf Gommers <ralf.gommers@gmail.com> wrote:
I'm happy to give the broader context here. This came up in the NumPy sprint in Berkeley back in May of this year. The existence of all of these submodules in NumPy is mostly a historical artifact, due to the previously poor state of Python packaging. Our thinking was that perhaps this could be revisited in this age of conda and manylinux wheels. This isn't to say that it would actually be a good idea to remove any of these submodules today. Separate modules bring both benefits and downsides. Benefits: - It can be easier to maintain projects separately rather than inside NumPy, e.g., bug fixes do not need to be tied to NumPy releases. - Separate modules could reduce the maintenance burden for NumPy itself, because energy gets focused on core features. - For projects for which a rewrite would be warranted (e.g., numpy.ma and scipy.sparse), it is *much* easier to innovate outside of NumPy/SciPy. - Packaging. As mentioned above, this is no longer as beneficial as it once way. Downsides: - It's harder to find separate packages than NumPy modules. - If the maintainers and maintenance processes are very similar, then separate projects can add unnecessary overhead. - Changing from bundled to separate packages imposes a significant cost upon their users (e.g., due to changed import paths). Coming back to the NEP: The import on downstream libraries and users would be very large, and
maintenance of these modules would still have to happen. Therefore this is simply
not a good idea; removing these submodules should not happen even for a new major version of NumPy.
I'm afraid I disagree pretty strongly here. There should absolutely be a high bar for removing submodules, but we should not rule out the possibility entirely. It is certainly true that modules need to be maintained for them to be remain usable, but I particularly object to the idea that this should be forced upon NumPy maintainers. Open source projects need to be maintained by their users, and if their users cannot devote energy to maintain them then the open source project deserves to die. This is just as true for NumPy submodules as for external packages. NumPy itself only has an obligation to maintain submodules if they are actively needed by the NumPy project and valued by active NumPy contributors. Otherwise, they should be maintained by users who care about them -- whether that means inside or outside NumPy. It serves nobody well to insist on NumPy developers maintaining projects that they don't use or care about. I like would suggest the following criteria for considering removing a NumPy submodule: 1. It cannot be relied upon by other portions of NumPy. 2. Either (a) the submodule imposes a significant maintenance burden upon the rest of NumPy that is not balanced by the level of dedicated contributions, or (b) much better alternatives exist outside of NumPy Preferably all three criteria should be satisfied.
![](https://secure.gravatar.com/avatar/5f88830d19f9c83e2ddfd913496c5025.jpg?s=120&d=mm&r=g)
On Mon, Jul 23, 2018 at 11:46 AM, Stephan Hoyer <shoyer@gmail.com> wrote:
That's true. Our thinking was that perhaps this could be revisited in this age of conda
That's certainly not a given though. Those things still need to be maintained, and splitting up packages increases overhead for e.g. doing releases. It's quite unclear if splitting would increase the developer pool. - For projects for which a rewrite would be warranted (e.g., numpy.ma and
scipy.sparse), it is *much* easier to innovate outside of NumPy/SciPy.
Agreed. That can happen and is already happening though (e.g. https://github.com/pydata/sparse). It doesn't have much to do with removing existing submodules. - Packaging. As mentioned above, this is no longer as beneficial as it once
way.
True, no longer as beneficial - that's not really a benefit though, packaging just works fine either way.
My thinking here is: given that we're not even willing to remove MaskedArray (NEP 17), for which the benefits of removing are a lot higher and the user base smaller, we are certainly not going to be removing random or linalg or distutils in the foreseeable future. So we may as well say that. Otherwise we have the discussions regularly (we actually just did have one for numpy.testing in gh-11457), which is just a waste of energy.
Nothing is "forced on you" as a NumPy maintainer - we are all individuals who do things voluntarily (okay, almost all - we have some funding now) and can choose to not spend any time on certain parts of NumPy. MaskedArray languished for quite a while before Marten and Eric spent a lot of time in improving it and closing lots of issues related to it. That can happen. Open source projects need to be maintained by their users, and if their
This is very developer-centric view. We have lots of users and also lots of no-longer-active contributors. The needs, interests and previous work put into NumPy of those groups of people matter. Otherwise, they should be maintained by users who care about them --
whether that means inside or outside NumPy. It serves nobody well to insist on NumPy developers maintaining projects that they don't use or care about.
To quote Nathaniel: "the rest of our policy is all about measuring disruption based on effects on users". That's absent from your criteria. Why I would like to keep this point in is: - the discussion does come up, see draft brainstorm roadmap list and gh-11457. - the outcome of such discussions is in practice 100% clear. - I would like to avoid having drawn out discussions each time (this eats up a lot of energy for me), and I *really* would like to avoid saying "I don't have time to discuss, but this is just not going to happen" or "consider it vetoed". - Hence: just write it down, so we can refer to it. Cheers, Ralf
![](https://secure.gravatar.com/avatar/93a76a800ef6c5919baa8ba91120ee98.jpg?s=120&d=mm&r=g)
On Tue, Jul 24, 2018 at 5:38 PM Ralf Gommers <ralf.gommers@gmail.com> wrote:
Yes, I suppose it is :). I tend to view NumPy's developers (interpreted somewhat broadly, including those who contribute to the project in other ways) as the ultimate representatives of NumPy's user base.
Yes, "Can be achieved with minimum disruption for users" would be appropriate to add as another top level criteria. Why I would like to keep this point in is:
I would rather we just say that the bar for deprecating or removing *any* functionality in NumPy is extremely high. np.matrix is probably the best example in recent times: - np.matrix is officially discouraged (which we prefer even to deprecation) - we *anticipate* deprecating it as soon as there's a viable alternative to scipy.sparse - even then, we will be very cautious about ever removing it, with the understanding that it is widely used As for updating this section of the NEP: - We could certainly note that to date NumPy has not removed any complete submodules (is this true?), and that these modules in particular, the cost-benefit ratio does not favor removal at this time. - Documenting the criteria we've come up with here, even though it hasn't been satisfied yet, might be helpful to demonstrate the high bar that is required. - I don't like rejecting the possibility of removing submodules entirely "simply not a good idea". It may become a good idea in the future, if some of the underlying facts change. I would also suggest highlighting two other strategies that NumPy uses in favor of deprecation/removal: - Official discouragement. Discouraging or deemphasizing in our docs is the preferred strategy for older APIs that still have well defined behavior but that are arguably less consistent with the rest of NumPy. Examples: isin vs in1d, stack/block vs hstack/dstack/vstack. - Benign neglect. This is our preferred strategy to removing submodules. Merely being in NumPy does not automatically guarantee that a module is well maintained, nor does it imply that a submodule is the best tool for the job. That's OK, as long as the incremental maintenance burden on the rest of NumPy is not too high.
![](https://secure.gravatar.com/avatar/96dd777e397ab128fedab46af97a3a4a.jpg?s=120&d=mm&r=g)
On Fri, Jul 27, 2018 at 12:02 PM, Stephan Hoyer <shoyer@gmail.com> wrote:
Not quite true. We removed the Numarray and Numeric compatibility modules. That broke Konrad Hinson's package.
It might help to make a cheat sheet listing discouraged functions together with their suggested replacements. Chuck
![](https://secure.gravatar.com/avatar/97c543aca1ac7bbcfb5279d0300c8330.jpg?s=120&d=mm&r=g)
On Sun, Jul 22, 2018 at 12:28 PM, Ralf Gommers <ralf.gommers@gmail.com> wrote:
Of course the NEP itself will have some things to discuss – but I think the discussion will be more productive if we can stay focused on the core part of the NEP, which is the general principles we use to evaluate each specific situation as it comes up. Look at how much of the discussion so far has gotten derailed onto topics like subclassing, submodules, etc.
It's the most discussed issue because it was the test case where we developed all these policies in the first place :-). I'm not sure it's particularly interesting aside from that, and that specific history ("let's come up with a transition plan for this feature that no-one actually cares about, b/c no-one cares about it so it's a good thing to use as a test case") is unlikely to be repeated.
I can see an argument for splitting random and linalg into their own modules, which numpy depends on and imports so that existing code doesn't break. E.g. this might let people install an old version of random if they needed to reproduce some old results, or help us merge numpy and scipy's linalg modules into a single package. I agree though that making 'np.linalg' start raising AttributeError is a total non-starter.
The problem is that "2.0" means a lot of different things to different people, not just "some future date to be determined", so using it that way will confuse people. Also, it's hard to predict when a deprecation will actually happen... it's very common that we adjust the schedule as we go (e.g. when we try to remove it and then discover it breaks everyone so we have to put it back for a while). I feel like it would be better to do this based on time -- like say "this will be removed <today + 1 year>" or something, and then it might take longer but not shorter? Re: version numbers, I actually think numpy should consider switching to calver [1]. We'd be giving up on being able to do a "2.0", but that's kind of a good thing -- if a change is too big to handle through our normal deprecation cycle, then it's probably too big to handle period. And "numpy 2018.3" gives you more information than our current scheme -- for example you could see at a glance that numpy 2012.1 is super out-of-date, and we could tell people that numpy 2019.1 will drop python 2 support. ...But that's a whole other discussion, and we shouldn't get derailed onto it here in this NEP's thread :-). [1] https://calver.org/ -n -- Nathaniel J. Smith -- https://vorpus.org
![](https://secure.gravatar.com/avatar/5f88830d19f9c83e2ddfd913496c5025.jpg?s=120&d=mm&r=g)
On Tue, Jul 24, 2018 at 8:07 PM, Nathaniel Smith <njs@pobox.com> wrote:
The subclassing discussion was actually illuminating and useful. Maybe it does deserve its own write-up somewhere though. Happy to remove that too. Would then like to put it somewhere else - in the docs, another NEP, ...? The submodules one I'd really like to keep.
Pretty sure that's not true, we had policies long before that plus it was not advertised as a test case for backwards compat (it's just an improvement that someone wanted to implement). But well, I don't care enough about this particular one to argue about it - I'll remove it. I'm not sure it's
Me too, that could happen. But that's unrelated to backwards compatibility. E.g. this might let people install an old version of
It is, hence why I say above that I'd like to keep that example.
This does make sense to me. -- like say
"this will be removed <today + 1 year>" or something, and then it might take longer but not shorter?
You can't practically do "today", should be <version number of next release when PR is merged + at least N years>. But yes that is useful, the point is to give a clear indication and it's then easy for the user to figure out when the earliest date is that the removal could happen. Given that this is clear and avoids the version number discussion, I'm happy to go with that and remove the major/minor version text. Cheers, Ralf
![](https://secure.gravatar.com/avatar/1198e2d145718c841565712312e04227.jpg?s=120&d=mm&r=g)
Hello, Very well written article! It takes a lot of important things into account. I think a number of things should be mentioned, if only in the alternatives: - One major version number change, with lots of “major version change” deprecations grouped into it, along with an LTS release. - The possibility of another major version change (possibly the same one) where we re-write all portions that were agreed upon (via NEPs) to be re-written, with a longer LTS release (3 years? 5?). - I’m thinking this one could be similar to the Python 2 -> Python 3 transition. Note that this is different from having constant breakages, this will be a mostly one-time effort and one-time breakage. - We break the ABI, but not most of the C API. - We port at least bug fixes and possibly oft-requested functionality to the old version for a long time. - But we fix all of the little things that are agreed upon by the community to be “missing” or “wrong” in the current release. It may be a while before this is adopted but it’ll be really beneficial in the long run. - We ping the dev-discussions of most major downstream users (SciPy, all the scikits, Matplotlib, etc.) for their “pain points” and also if they think this is a good idea. This way, the amount of users included aren’t just those on the NumPy mailing list. - We enforce good practices in our code. For example, we will explicitly disallow subclassing from ndarray, we get rid of scalars, we fix the type system. This may sound radical (I myself think so), but consider that if we get rid of a large amount of technical debt on the onset, have a reputation for a clean code-base (rather than one that’s decades old), then we could onboard a lot more active developers and existing developers can also get a lot more work done. I may be getting ahead of myself on this, but feel free to leave your thoughts and opinions. Best regards, Hameer Abbasi Sent from Astro <https://www.helloastro.com> for Mac On 22. Jul 2018 at 01:48, Ralf Gommers <ralf.gommers@gmail.com> wrote: Hi all, Here is a first draft of a NEP on backwards compatibility and deprecation policy. This I think mostly formalized what we've done for the last couple of years, however I'm sure opinions and wish lists will differ here. Pull request: https://github.com/numpy/numpy/pull/11596 Rendered version: https://github.com/rgommers/numpy/blob/nep-backcompat/doc/neps/nep-0023-back... Full text below (ducks). Cheers, Ralf ======================================================= NEP 23 - Backwards compatibility and deprecation policy ======================================================= :Author: Ralf Gommers <ralf.gommers@gmail.com> :Status: Draft :Type: Process :Created: 2018-07-14 :Resolution: <url> (required for Accepted | Rejected | Withdrawn) Abstract -------- In this NEP we describe NumPy's approach to backwards compatibility, its deprecation and removal policy, and the trade-offs and decision processes for individual cases where breaking backwards compatibility is considered. Detailed description -------------------- NumPy has a very large user base. Those users rely on NumPy being stable and the code they write that uses NumPy functionality to keep working. NumPy is also actively maintained and improved -- and sometimes improvements require, or are made much easier, by breaking backwards compatibility. Finally, there are trade-offs in stability for existing users vs. avoiding errors or having a better user experience for new users. These competing needs often give rise to heated debates and delays in accepting or rejecting contributions. This NEP tries to address that by providing a policy as well as examples and rationales for when it is or isn't a good idea to break backwards compatibility. General principles: - Aim not to break users' code unnecessarily. - Aim never to change code in ways that can result in users silently getting incorrect results from their previously working code. - Backwards incompatible changes can be made, provided the benefits outweigh the costs. - When assessing the costs, keep in mind that most users do not read the mailing list, do not look at deprecation warnings, and sometimes wait more than one or two years before upgrading from their old version. And that NumPy has many hundreds of thousands or even a couple of million users, so "no one will do or use this" is very likely incorrect. - Benefits include improved functionality, usability and performance (in order of importance), as well as lower maintenance cost and improved future extensibility. - Bug fixes are exempt from the backwards compatibility policy. However in case of serious impact on users (e.g. a downstream library doesn't build anymore), even bug fixes may have to be delayed for one or more releases. - The Python API and the C API will be treated in the same way. Examples ^^^^^^^^ We now discuss a number of concrete examples to illustrate typical issues and trade-offs. **Changing the behavior of a function** ``np.histogram`` is probably the most infamous example. First, a new keyword ``new=False`` was introduced, this was then switched over to None one release later, and finally it was removed again. Also, it has a ``normed`` keyword that had behavior that could be considered either suboptimal or broken (depending on ones opinion on the statistics). A new keyword ``density`` was introduced to replace it; ``normed`` started giving ``DeprecationWarning`` only in v.1.15.0. Evolution of ``histogram``:: def histogram(a, bins=10, range=None, normed=False): # v1.0.0 def histogram(a, bins=10, range=None, normed=False, weights=None, new=False): #v1.1.0 def histogram(a, bins=10, range=None, normed=False, weights=None, new=None): #v1.2.0 def histogram(a, bins=10, range=None, normed=False, weights=None): #v1.5.0 def histogram(a, bins=10, range=None, normed=False, weights=None, density=None): #v1.6.0 def histogram(a, bins=10, range=None, normed=None, weights=None, density=None): #v1.15.0 # v1.15.0 was the first release where `normed` started emitting # DeprecationWarnings The ``new`` keyword was planned from the start to be temporary; such a plan forces users to change their code more than once. Such keywords (there have been other instances proposed, e.g. ``legacy_index`` in `NEP 21 <http://www.numpy.org/neps/nep-0021-advanced-indexing.html>`_) are not desired. The right thing to have done here would probably have been to deprecate ``histogram`` and introduce a new function ``hist`` in its place. **Returning a view rather than a copy** The ``ndarray.diag`` method used to return a copy. A view would be better for both performance and design consistency. This change was warned about (``FutureWarning``) in v.8.0, and in v1.9.0 ``diag`` was changed to return a *read-only* view. The planned change to a writeable view in v1.10.0 was postponed due to backwards compatibility concerns, and is still an open issue (gh-7661). What should have happened instead: nothing. This change resulted in a lot of discussions and wasted effort, did not achieve its final goal, and was not that important in the first place. Finishing the change to a *writeable* view in the future is not desired, because it will result in users silently getting different results if they upgraded multiple versions or simply missed the warnings. **Disallowing indexing with floats** Indexing an array with floats is asking for something ambiguous, and can be a sign of a bug in user code. After some discussion, it was deemed a good idea to deprecate indexing with floats. This was first tried for the v1.8.0 release, however in pre-release testing it became clear that this would break many libraries that depend on NumPy. Therefore it was reverted before release, to give those libraries time to fix their code first. It was finally introduced for v1.11.0 and turned into a hard error for v1.12.0. This change was disruptive, however it did catch real bugs in e.g. SciPy and scikit-learn. Overall the change was worth the cost, and introducing it in master first to allow testing, then removing it again before a release, is a useful strategy. Similar recent deprecations also look like good examples of cleanups/improvements: - removing deprecated boolean indexing (gh-8312) - deprecating truth testing on empty arrays (gh-9718) - deprecating ``np.sum(generator)`` (gh-10670, one issue with this one is that its warning message is wrong - this should error in the future). **Removing the financial functions** The financial functions (e.g. ``np.pmt``) are badly named, are present in the main NumPy namespace, and don't really fit well with NumPy's scope. They were added in 2008 after `a discussion < https://mail.python.org/pipermail/numpy-discussion/2008-April/032353.html>`_ on the mailing list where opinion was divided (but a majority in favor). At the moment these functions don't cause a lot of overhead, however there are multiple issues and PRs a year for them which cost maintainer time to deal with. And they clutter up the ``numpy`` namespace. Discussion in 2013 happened on removing them again (gh-2880). This case is borderline, but given that they're clearly out of scope, deprecation and removal out of at least the main ``numpy`` namespace can be proposed. Alternatively, document clearly that new features for financial functions are unwanted, to keep the maintenance costs to a minimum. **Examples of features not added because of backwards compatibility** TODO: do we have good examples here? Possibly subclassing related? Removing complete submodules ^^^^^^^^^^^^^^^^^^^^^^^^^^^^ This year there have been suggestions to consider removing some or all of ``numpy.distutils``, ``numpy.f2py``, ``numpy.linalg``, and ``numpy.random``. The motivation was that all these cost maintenance effort, and that they slow down work on the core of Numpy (ndarrays, dtypes and ufuncs). The import on downstream libraries and users would be very large, and maintenance of these modules would still have to happen. Therefore this is simply not a good idea; removing these submodules should not happen even for a new major version of NumPy. Subclassing of ndarray ^^^^^^^^^^^^^^^^^^^^^^ Subclassing of ``ndarray`` is a pain point. ``ndarray`` was not (or at least not well) designed to be subclassed. Despite that, a lot of subclasses have been created even within the NumPy code base itself, and some of those (e.g. ``MaskedArray``, ``astropy.units.Quantity``) are quite popular. The main problems with subclasses are: - They make it hard to change ``ndarray`` in ways that would otherwise be backwards compatible. - Some of them change the behavior of ndarray methods, making it difficult to write code that accepts array duck-types. Subclassing ``ndarray`` has been officially discouraged for a long time. Of the most important subclasses, ``np.matrix`` will be deprecated (see gh-10142) and ``MaskedArray`` will be kept in NumPy (`NEP 17 <http://www.numpy.org/neps/nep-0017-split-out-maskedarray.html>`_). ``MaskedArray`` will ideally be rewritten in a way such that it uses only public NumPy APIs. For subclasses outside of NumPy, more work is needed to provide alternatives (e.g. mixins, see gh-9016 and gh-10446) or better support for custom dtypes (see gh-2899). Until that is done, subclasses need to be taken into account when making change to the NumPy code base. A future change in NumPy to not support subclassing will certainly need a major version increase. Policy ------ 1. Code changes that have the potential to silently change the results of a users' code must never be made (except in the case of clear bugs). 2. Code changes that break users' code (i.e. the user will see a clear exception) can be made, *provided the benefit is worth the cost* and suitable deprecation warnings have been raised first. 3. Deprecation warnings are in all cases warnings that functionality will be removed. If there is no intent to remove functionlity, then deprecation in documentation only or other types of warnings shall be used. 4. Deprecations for stylistic reasons (e.g. consistency between functions) are strongly discouraged. Deprecations: - shall include the version numbers of both when the functionality was deprecated and when it will be removed (either two releases after the warning is introduced, or in the next major version). - shall include information on alternatives to the deprecated functionality, or a reason for the deprecation if no clear alternative is available. - shall use ``VisibleDeprecationWarning`` rather than ``DeprecationWarning`` for cases of relevance to end users (as opposed to cases only relevant to libraries building on top of NumPy). - shall be listed in the release notes of the release where the deprecation happened. Removal of deprecated functionality: - shall be done after 2 releases (assuming a 6-monthly release cycle; if that changes, there shall be at least 1 year between deprecation and removal), unless the impact of the removal is such that a major version number increase is warranted. - shall be listed in the release notes of the release where the removal happened. Versioning: - removal of deprecated code can be done in any minor (but not bugfix) release. - for heavily used functionality (e.g. removal of ``np.matrix``, of a whole submodule, or significant changes to behavior for subclasses) the major version number shall be increased. In concrete cases where this policy needs to be applied, decisions are made according to the `NumPy governance model <https://docs.scipy.org/doc/numpy/dev/governance/index.html>`_. Functionality with more strict policies: - ``numpy.random`` has its own backwards compatibility policy, see `NEP 19 <http://www.numpy.org/neps/nep-0019-rng-policy.html>`_. - The file format for ``.npy`` and ``.npz`` files must not be changed in a backwards incompatible way. Alternatives ------------ **Being more agressive with deprecations.** The goal of being more agressive is to allow NumPy to move forward faster. This would avoid others inventing their own solutions (often in multiple places), as well as be a benefit to users without a legacy code base. We reject this alternative because of the place NumPy has in the scientific Python ecosystem - being fairly conservative is required in order to not increase the extra maintenance for downstream libraries and end users to an unacceptable level. **Semantic versioning.** This would change the versioning scheme for code removals; those could then only be done when the major version number is increased. Rationale for rejection: semantic versioning is relatively common in software engineering, however it is not at all common in the Python world. Also, it would mean that NumPy's version number simply starts to increase faster, which would be more confusing than helpful. gh-10156 contains more discussion on this alternative. Discussion ---------- TODO This section may just be a bullet list including links to any discussions regarding the NEP: - This includes links to mailing list threads or relevant GitHub issues. References and Footnotes ------------------------ .. [1] TODO Copyright --------- This document has been placed in the public domain. [1]_ _______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@python.org https://mail.python.org/mailman/listinfo/numpy-discussion
![](https://secure.gravatar.com/avatar/5f88830d19f9c83e2ddfd913496c5025.jpg?s=120&d=mm&r=g)
On Sat, Jul 21, 2018 at 5:46 PM, Hameer Abbasi <einstein.edison@gmail.com> wrote:
- We break the ABI, but not most of the C API.
Good catch, I didn't mention ABI at all. My opinion: breaking ABI will
still require a major version change, but the bar for it is now lower. Basically what Travis was arguing for years ago, only today his argument is actually true due to conda and binary wheels on the 3 major platforms.
I think it sounds nice in theory, but given the history on large design changes/decisions I don't believe we are able to get things right on a first big rewrite. For example "fix the type system" - we all would like something better, but in the 5+ years that we've talked about it, no one has even put a complete design on paper. And for ones we did do like __numpy_ufunc__ we definitely needed a few iterations. That points to gradual evolution being a better model. Cheers. Ralf
![](https://secure.gravatar.com/avatar/851ff10fbb1363b7d6111ac60194cc1c.jpg?s=120&d=mm&r=g)
Agreed that changes better be gradual, and that we do not have the manpower to do otherwise (I was slightly shocked to see that my 94 commits in the last two years make me the fourth most prolific contributor in that period... And that is from the couple of hours a week I use while procrastinating on things related to my astronomy day job!) -- Marten
![](https://secure.gravatar.com/avatar/851ff10fbb1363b7d6111ac60194cc1c.jpg?s=120&d=mm&r=g)
the idea of disallowing subclasses. But I'll add to that reply a more general sentiment, that I think one of the problems has been to think that as one develops code, one thinks one knows in advance what users may want to do with it, what input makes sense, etc. But at least I have found that I am often wrong, that I'm not imaginative enough to know what people may want to do. So, my sense is that the best one can do is to make as few assumptions as possible, so avoid coercing, etc. And if the code gets to a position where it needs to guess what is meant, it should just fail. -- Marten
![](https://secure.gravatar.com/avatar/97c543aca1ac7bbcfb5279d0300c8330.jpg?s=120&d=mm&r=g)
On Sat, Jul 21, 2018 at 5:46 PM, Hameer Abbasi <einstein.edison@gmail.com> wrote:
I agree that this approach should probably be discussed in the NEP, specifically in the "rejected alternatives" section. It keeps coming up, and the reasons why it doesn't work for numpy are not obvious, so well-meaning people will keep bringing it up. It'd be helpful to have a single authoritative place to link to explaining why we don't do things that way. The beginning of the NEP should maybe also state up front that we follow a rolling-deprecations model where different breaking changes happen simultaneously on their own timelines. It's so obvious to me that I didn't notice it was missing, but this is a helpful reminder that it's not obvious to everyone :-). -n -- Nathaniel J. Smith -- https://vorpus.org
![](https://secure.gravatar.com/avatar/851ff10fbb1363b7d6111ac60194cc1c.jpg?s=120&d=mm&r=g)
Hi Ralf, Overall, this looks good. But I think the subclassing section is somewhat misleading in suggesting `ndarray` is not well designed to be subclassed. At least, for neither my work on Quantity nor that on MaskedArray, I've found that the design of `ndarray` itself was a problem. Instead, it was the functions that were, as most were not written with subclassing or duck typing in mind, but rather with the assumption that all input should be an array, and that somehow it is useful to pass anything users pass in through `asarray`. With then layers on top to avoid this in specific circumstances... But perhaps this is what you meant? (I would agree, though, that some ndarray subclasses have been designed poorly - especially, matrix, which then led to a problematic duck array in sparse - and that this has resulted in substantial hassle. Also, subclassing the subclasses is much more problematic that subclassing ndarray - MaskedArray being a particularly annoying example!) The subclassing section also notes that subclassing has been discouraged for a long time. Is that so? Over time, I've certainly had comments from Nathaniel and some others in discussions of PRs that go in that direction, which perhaps reflected some internal consensus I wasn't aware of, but the documentation does not seem to discourage it (check, e.g., the subclassing section [1]). I also think that it may be good to keep in mind that until `__array_ufunc__`, there wasn't much of a choice - support for duck arrays was even more half-hearted (hopefully to become much better with `__array_function__`). Overall, it seems to me that these days in the python eco-system subclassing is simply expected to work. Even within numpy there are other examples (e.g., ufuncs, dtypes) for which there has been quite a bit of discussion about the benefits subclasses would bring. All the best, Marten [1] https://docs.scipy.org/doc/numpy/user/basics.subclassing.html
![](https://secure.gravatar.com/avatar/5f88830d19f9c83e2ddfd913496c5025.jpg?s=120&d=mm&r=g)
Hi Marten, Thanks for the thoughtful reply. On Sat, Jul 21, 2018 at 6:39 PM, Marten van Kerkwijk < m.h.vankerkwijk@gmail.com> wrote:
You're completely right I think. We have had problems with subclasses for a long time, but that is due to mainly np.matrix being badly behaved, which then led to code everywhere using asarray, which then led to lots of issues with other subclasses. This basically meant subclasses were problematic, and hence most numpy devs would like to not see more subclasses.
I think yes there is some vague but not written down mostly-consensus, due to the dynamic with asarray above.
True. I think long term duck arrays are the way to go, because asarray is not going to disappear. But for now we just have to do the best we can dealing with subclasses. The subclassing doc [1] really needs an update on what the practical issues are.
I'm now thinking what to do with the subclassing section in the NEP. Best to completely remove? I was triggered to include it by some things Stephan said last week about subclasses being a blocker to adding new features. So if we keep the section, it may be helpful for you and Stephan to help shape that. Cheers, Ralf
![](https://secure.gravatar.com/avatar/851ff10fbb1363b7d6111ac60194cc1c.jpg?s=120&d=mm&r=g)
Hi Ralf,
Perhaps this history is in fact useful to mention? To learn from mistakes, it must be possible to know about them!
Before suggesting further specific text, might it make sense for the NEP to note that since subclassing will not go away, there is value in having at least one non-trivial, well-designed subclass in numpy? I think eventually MaskedArray might become that: it would be an internal check that subclasses can work with all numpy functions (there is no reason for duplication of functions in `np.ma`!). It also is an example of a container-type subclass that adds extra information to an ndarray (since that information is itself array-like, it is not necessarily a super-logical subclass, but it is there... and can thus serve as an example). A second subclass which we have not discussed, but which I think is used quite a bit (from my statistics of one...), is `np.memmap`. Useful if only for showing that a relatively quick hack can give you something quite helpful. All the best, Marten
![](https://secure.gravatar.com/avatar/93a76a800ef6c5919baa8ba91120ee98.jpg?s=120&d=mm&r=g)
On Sat, Jul 21, 2018 at 6:40 PM Marten van Kerkwijk < m.h.vankerkwijk@gmail.com> wrote:
I can't speak for Ralf, but yes, this is part of what I had in mind. I don't think you can separate "core" objects/methods from functions that act on them. Either the entire system is designed to handle subclassing through some well-defined interface or is it not. If you don't design a system for subclassing but allow it anyways (and it's impossible to prohibit problematically in Python), then you can easily end up with very fragile systems that are difficult to modify or extend. As Ralf noted in the NEP, "Some of them change the behavior of ndarray methods, making it difficult to write code that accepts array duck-types." These changes end up having implications for apparently unrelated functions (e.g., np.median needing to call np.mean internally to handle units properly). I don't think anyone really wants that sort of behavior or lock-in in NumPy itself, but of course that is the price we pay for not having well-defined interfaces :). Hopefully NEP-18 will change that, and eventually we will be able to remove hacks from NumPy that we added only because there weren't any better alternatives available. For the NEP itself, i would not mention "A future change in NumPy to not support subclassing," because it's not as if subclassing is suddenly not going to work as of a certain NumPy release. Certain types of subclasses (e.g., those that only add extra methods and/or metadata and do not modify any existing functionality) have never been a problem and will be fine to support indefinitely. Rather, we might state that "At some point in the future, the NumPy development team may no longer interested in maintaining workarounds for specific subclasses, because other interfaces for extending NumPy are believed to be more maintainable/preferred." Overall, it seems to me that these days in the python eco-system
subclassing is simply expected to work.
I don't think this is true. You can use subclassing on builtin types like dict, but just because you can do it doesn't mean it's a good idea. If you change built-in methods to work in different ways other things will break in unexpected ways (or simply not change, also in unexpected ways). Probably the only really safe way to subclass a dictionary is to define the __missing__() method and not change any other aspects of the public interface directly.
![](https://secure.gravatar.com/avatar/93a76a800ef6c5919baa8ba91120ee98.jpg?s=120&d=mm&r=g)
On Mon, Jul 23, 2018 at 1:45 PM Marten van Kerkwijk < m.h.vankerkwijk@gmail.com> wrote:
My hope would be that NumPy gets out of the business of officially providing interfaces like subclassing that are this hard to maintain. In general, we try to hold ourselves to a higher standard of stable code, and this sets up unfortunate conflicts between the needs of different NumPy users. It is just that one should not remove functionality without providing the
better alternative!
Totally agreed!
![](https://secure.gravatar.com/avatar/1198e2d145718c841565712312e04227.jpg?s=120&d=mm&r=g)
On 23. Jul 2018 at 19:46, Stephan Hoyer <shoyer@gmail.com> wrote: On Sat, Jul 21, 2018 at 6:40 PM Marten van Kerkwijk < m.h.vankerkwijk@gmail.com> wrote:
I can't speak for Ralf, but yes, this is part of what I had in mind. I don't think you can separate "core" objects/methods from functions that act on them. Either the entire system is designed to handle subclassing through some well-defined interface or is it not. If you don't design a system for subclassing but allow it anyways ( and it's impossible to prohibit problematically in Python This isn’t really true. Metaprogramming to the rescue I guess. https://stackoverflow.com/questions/16564198/pythons-equivalent-of-nets-seal... Best regards, Hameer Abbasi Sent from Astro <https://www.helloastro.com> for Mac ), then you can easily end up with very fragile systems that are difficult to modify or extend. As Ralf noted in the NEP, "Some of them change the behavior of ndarray methods, making it difficult to write code that accepts array duck-types." These changes end up having implications for apparently unrelated functions (e.g., np.median needing to call np.mean internally to handle units properly). I don't think anyone really wants that sort of behavior or lock-in in NumPy itself, but of course that is the price we pay for not having well-defined interfaces :). Hopefully NEP-18 will change that, and eventually we will be able to remove hacks from NumPy that we added only because there weren't any better alternatives available. For the NEP itself, i would not mention "A future change in NumPy to not support subclassing," because it's not as if subclassing is suddenly not going to work as of a certain NumPy release. Certain types of subclasses (e.g., those that only add extra methods and/or metadata and do not modify any existing functionality) have never been a problem and will be fine to support indefinitely. Rather, we might state that "At some point in the future, the NumPy development team may no longer interested in maintaining workarounds for specific subclasses, because other interfaces for extending NumPy are believed to be more maintainable/preferred." Overall, it seems to me that these days in the python eco-system
subclassing is simply expected to work.
I don't think this is true. You can use subclassing on builtin types like dict, but just because you can do it doesn't mean it's a good idea. If you change built-in methods to work in different ways other things will break in unexpected ways (or simply not change, also in unexpected ways). Probably the only really safe way to subclass a dictionary is to define the __missing__() method and not change any other aspects of the public interface directly. _______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@python.org https://mail.python.org/mailman/listinfo/numpy-discussion
![](https://secure.gravatar.com/avatar/851ff10fbb1363b7d6111ac60194cc1c.jpg?s=120&d=mm&r=g)
Hi Ralf, Maybe as a concrete example of something that has been discussed, for which your proposed text makes (I think) clear what should or should not be done: Many of us hate that `np.array` (like, sadly, many other numpy parts) auto-converts anything not obviously array-like to dtype=object, and it has been suggested we should no longer do this by default [1]. Given your NEP, I think you would disagree with that path, as it would quite obviously break user's code (we also get regular issues about object arrays, which show that they are used a lot in the wild). So, instead I guess one might go with a route where one could explicitly tell `dtype=object` was not wanted (say, `dtype="everything-but-object')? All the best, Marten [1] https://github.com/numpy/numpy/issues/5353
![](https://secure.gravatar.com/avatar/97c543aca1ac7bbcfb5279d0300c8330.jpg?s=120&d=mm&r=g)
On Sat, Jul 21, 2018 at 4:48 PM, Ralf Gommers <ralf.gommers@gmail.com> wrote:
Oh *awesome*, thanks for putting this together. I think this is a great start, but I'd structure it a bit differently. So let me just make a few high-level comments first and see what you think. Regarding the "general principles" and then "policy": to me these feel like more a brainstorming list, that hasn't been fully distilled down into principles yet. I would try to structure it to start with the overarching principles (changes need to benefit users more than they harm them, numpy is widely used so breaking changes should by default be assumed to be fairly harmful, decisions should be based on data and actual effects on users rather than e.g. appealing to the docs or abstract aesthetic principles, silently getting wrong answer is much worse than a loud error), then talk about some of the ways this plays out (if people are currently silently getting the wrong answer -- which is the definition of a bug, but also shows up in the index-by-float case -- then that's really bad; some of our tools for collecting data about how bad a breakage is include testing prominent downstreams ourselves, adding warnings or making .0 releases and seeing how people react, etc.), and then examples. Speaking of examples: I hate to say this because in general I think using examples is a great idea. But... I think you should delete most of these examples. The problem is scope creep: the goal for this NEP (IMO) should be to lay out the principles we use to think about these issues in general, but right now it comes across as trying to lay down a final resolution on lots of specific issues (including several where there are ongoing conversations). It ends up like trying to squish multiple NEPs into one, which makes it hard to discuss, and also distracts from the core purpose. My suggestion: keep just two examples, histogram and indexing-with-floats. These are safely done and dusted, totally uncontroversial (AFAIK), and the first is a good illustration of how one can try to be careful and do the right thing but still get it all wrong, while the second is a good example of (a) how we gathered data and decided that an actually pretty disruptive change was nonetheless worth it, and (b) how we had to manage it through multiple false starts. Regarding the actual policy: One alteration to current practice jumped out at me. This policy categorically rules out all changes that could cause currently working code to silently start doing something wrong, regardless of the specific circumstances. That's not how we actually do things right now. Instead, our policy in recent years has been that such changes are permitted in theory, but (a) the starting presumption is that this is super harmful to users so we need a *very* good reason to do it, and (b) if we do go ahead with it, then during the deprecation period we use a highly-visible FutureWarning (instead of the invisible-by-default DeprecationWarning). Personally I think the current policy strikes a better balance. You can see some examples of where we've used this by running 'git log -S FUTUREWARNING -S FutureWarning' -- it's things like a bad default for the rcond argument in lstsq, an obscure and error-prone corner case in indexing (0addc016ba), strange semantics for NaT (https://mail.scipy.org/pipermail/numpy-discussion/2015-October/073968.html), ... we could quibble about individual cases, but I think that taking these on a case-by-case basis is better than ruling them out categorically. And in any case, that is what we do now, so if you want to change this, it's something we should discuss and probably write down some rationale and such :-). Regarding the major version number thing: ugh do we really want to talk about this more. I'd probably leave it out of the NEP entirely. If it stays in, I think it needs a clearer description of what counts as a "major" change. There are some examples of things that do "sound" major, but... the rest of our policy is all about measuring disruption based on effects on users, and by that metric, the index-by-float removal was pretty major. My guess is that by the time we finally remove np.matrix, the actual disruption will be less than it was for removing index-by-float. (As it should be, since keeping index-by-float around was actively causing bugs in even well-maintained downstreams, in a way that np.matrix doesn't.) -n -- Nathaniel J. Smith -- https://vorpus.org
![](https://secure.gravatar.com/avatar/5f88830d19f9c83e2ddfd913496c5025.jpg?s=120&d=mm&r=g)
On Sat, Jul 21, 2018 at 7:15 PM, Nathaniel Smith <njs@pobox.com> wrote:
Thanks, I'll try and rework the general principles, you have some excellent points in here.
I'm not sure this is the best thing to do. I can remove a couple, but aiming to be "totally uncontroversial" is almost impossible given the topic of the NEP. The diag view example is important I think, it's the second most discussed backwards compatibility issue next to histogram. I'm happy to remove the statement on what should happen with it going forward though. Then, I think it's not unreasonable to draw a couple of hard lines. For example, removing complete submodules like linalg or random has ended up on some draft brainstorm roadmap list because someone (no idea who) put it there after a single meeting. Clearly the cost-benefit of that is such that there's no point even discussing that more, so I'd rather draw that line here than every time someone open an issue. Very recent example: https://github.com/numpy/numpy/issues/11457 (remove auto-import of numpy.testing).
You're right here. Thanks for the examples. I'll update this according to your suggestion, and propose to use one of the examples (rcond probably) to illustrate.
I think it has value to keep it, and that it's not really possible to come up with a very clear description of "major". In particular, I'd like every deprecation message to say "this deprecated feature will be removed by release X.Y.0". At the moment we don't do that, so if users see a message they don't know if a removal will happen next year, in the far future (2.0), or never. The major version thing is quite useful to signal our intent. Doesn't mean we need to exhaustively discuss when to do a 2.0 though, I agree that that's not a very useful discussion right now. Happy to remove this though if people don't like it. Other opinions? Cheers, Ralf
![](https://secure.gravatar.com/avatar/93a76a800ef6c5919baa8ba91120ee98.jpg?s=120&d=mm&r=g)
On Sun, Jul 22, 2018 at 12:28 PM Ralf Gommers <ralf.gommers@gmail.com> wrote:
I think a more realistic policy would be to say, "This feature was deprecated by release X.Y and may be removed as early as release X.Z." In general we have been conservative in terms of actually finalizing deprecations in NumPy, which I think is warranted given irregularity of our release cycle. It's hard to know exactly which release is going to come out a year or 18 months from when a deprecation starts.
Happy to remove this though if people don't like it. Other opinions?
I would also lean towards removing mention of any major version changes for NumPy.
![](https://secure.gravatar.com/avatar/93a76a800ef6c5919baa8ba91120ee98.jpg?s=120&d=mm&r=g)
On Sun, Jul 22, 2018 at 12:28 PM Ralf Gommers <ralf.gommers@gmail.com> wrote:
I'm happy to give the broader context here. This came up in the NumPy sprint in Berkeley back in May of this year. The existence of all of these submodules in NumPy is mostly a historical artifact, due to the previously poor state of Python packaging. Our thinking was that perhaps this could be revisited in this age of conda and manylinux wheels. This isn't to say that it would actually be a good idea to remove any of these submodules today. Separate modules bring both benefits and downsides. Benefits: - It can be easier to maintain projects separately rather than inside NumPy, e.g., bug fixes do not need to be tied to NumPy releases. - Separate modules could reduce the maintenance burden for NumPy itself, because energy gets focused on core features. - For projects for which a rewrite would be warranted (e.g., numpy.ma and scipy.sparse), it is *much* easier to innovate outside of NumPy/SciPy. - Packaging. As mentioned above, this is no longer as beneficial as it once way. Downsides: - It's harder to find separate packages than NumPy modules. - If the maintainers and maintenance processes are very similar, then separate projects can add unnecessary overhead. - Changing from bundled to separate packages imposes a significant cost upon their users (e.g., due to changed import paths). Coming back to the NEP: The import on downstream libraries and users would be very large, and
maintenance of these modules would still have to happen. Therefore this is simply
not a good idea; removing these submodules should not happen even for a new major version of NumPy.
I'm afraid I disagree pretty strongly here. There should absolutely be a high bar for removing submodules, but we should not rule out the possibility entirely. It is certainly true that modules need to be maintained for them to be remain usable, but I particularly object to the idea that this should be forced upon NumPy maintainers. Open source projects need to be maintained by their users, and if their users cannot devote energy to maintain them then the open source project deserves to die. This is just as true for NumPy submodules as for external packages. NumPy itself only has an obligation to maintain submodules if they are actively needed by the NumPy project and valued by active NumPy contributors. Otherwise, they should be maintained by users who care about them -- whether that means inside or outside NumPy. It serves nobody well to insist on NumPy developers maintaining projects that they don't use or care about. I like would suggest the following criteria for considering removing a NumPy submodule: 1. It cannot be relied upon by other portions of NumPy. 2. Either (a) the submodule imposes a significant maintenance burden upon the rest of NumPy that is not balanced by the level of dedicated contributions, or (b) much better alternatives exist outside of NumPy Preferably all three criteria should be satisfied.
![](https://secure.gravatar.com/avatar/5f88830d19f9c83e2ddfd913496c5025.jpg?s=120&d=mm&r=g)
On Mon, Jul 23, 2018 at 11:46 AM, Stephan Hoyer <shoyer@gmail.com> wrote:
That's true. Our thinking was that perhaps this could be revisited in this age of conda
That's certainly not a given though. Those things still need to be maintained, and splitting up packages increases overhead for e.g. doing releases. It's quite unclear if splitting would increase the developer pool. - For projects for which a rewrite would be warranted (e.g., numpy.ma and
scipy.sparse), it is *much* easier to innovate outside of NumPy/SciPy.
Agreed. That can happen and is already happening though (e.g. https://github.com/pydata/sparse). It doesn't have much to do with removing existing submodules. - Packaging. As mentioned above, this is no longer as beneficial as it once
way.
True, no longer as beneficial - that's not really a benefit though, packaging just works fine either way.
My thinking here is: given that we're not even willing to remove MaskedArray (NEP 17), for which the benefits of removing are a lot higher and the user base smaller, we are certainly not going to be removing random or linalg or distutils in the foreseeable future. So we may as well say that. Otherwise we have the discussions regularly (we actually just did have one for numpy.testing in gh-11457), which is just a waste of energy.
Nothing is "forced on you" as a NumPy maintainer - we are all individuals who do things voluntarily (okay, almost all - we have some funding now) and can choose to not spend any time on certain parts of NumPy. MaskedArray languished for quite a while before Marten and Eric spent a lot of time in improving it and closing lots of issues related to it. That can happen. Open source projects need to be maintained by their users, and if their
This is very developer-centric view. We have lots of users and also lots of no-longer-active contributors. The needs, interests and previous work put into NumPy of those groups of people matter. Otherwise, they should be maintained by users who care about them --
whether that means inside or outside NumPy. It serves nobody well to insist on NumPy developers maintaining projects that they don't use or care about.
To quote Nathaniel: "the rest of our policy is all about measuring disruption based on effects on users". That's absent from your criteria. Why I would like to keep this point in is: - the discussion does come up, see draft brainstorm roadmap list and gh-11457. - the outcome of such discussions is in practice 100% clear. - I would like to avoid having drawn out discussions each time (this eats up a lot of energy for me), and I *really* would like to avoid saying "I don't have time to discuss, but this is just not going to happen" or "consider it vetoed". - Hence: just write it down, so we can refer to it. Cheers, Ralf
![](https://secure.gravatar.com/avatar/93a76a800ef6c5919baa8ba91120ee98.jpg?s=120&d=mm&r=g)
On Tue, Jul 24, 2018 at 5:38 PM Ralf Gommers <ralf.gommers@gmail.com> wrote:
Yes, I suppose it is :). I tend to view NumPy's developers (interpreted somewhat broadly, including those who contribute to the project in other ways) as the ultimate representatives of NumPy's user base.
Yes, "Can be achieved with minimum disruption for users" would be appropriate to add as another top level criteria. Why I would like to keep this point in is:
I would rather we just say that the bar for deprecating or removing *any* functionality in NumPy is extremely high. np.matrix is probably the best example in recent times: - np.matrix is officially discouraged (which we prefer even to deprecation) - we *anticipate* deprecating it as soon as there's a viable alternative to scipy.sparse - even then, we will be very cautious about ever removing it, with the understanding that it is widely used As for updating this section of the NEP: - We could certainly note that to date NumPy has not removed any complete submodules (is this true?), and that these modules in particular, the cost-benefit ratio does not favor removal at this time. - Documenting the criteria we've come up with here, even though it hasn't been satisfied yet, might be helpful to demonstrate the high bar that is required. - I don't like rejecting the possibility of removing submodules entirely "simply not a good idea". It may become a good idea in the future, if some of the underlying facts change. I would also suggest highlighting two other strategies that NumPy uses in favor of deprecation/removal: - Official discouragement. Discouraging or deemphasizing in our docs is the preferred strategy for older APIs that still have well defined behavior but that are arguably less consistent with the rest of NumPy. Examples: isin vs in1d, stack/block vs hstack/dstack/vstack. - Benign neglect. This is our preferred strategy to removing submodules. Merely being in NumPy does not automatically guarantee that a module is well maintained, nor does it imply that a submodule is the best tool for the job. That's OK, as long as the incremental maintenance burden on the rest of NumPy is not too high.
![](https://secure.gravatar.com/avatar/96dd777e397ab128fedab46af97a3a4a.jpg?s=120&d=mm&r=g)
On Fri, Jul 27, 2018 at 12:02 PM, Stephan Hoyer <shoyer@gmail.com> wrote:
Not quite true. We removed the Numarray and Numeric compatibility modules. That broke Konrad Hinson's package.
It might help to make a cheat sheet listing discouraged functions together with their suggested replacements. Chuck
![](https://secure.gravatar.com/avatar/97c543aca1ac7bbcfb5279d0300c8330.jpg?s=120&d=mm&r=g)
On Sun, Jul 22, 2018 at 12:28 PM, Ralf Gommers <ralf.gommers@gmail.com> wrote:
Of course the NEP itself will have some things to discuss – but I think the discussion will be more productive if we can stay focused on the core part of the NEP, which is the general principles we use to evaluate each specific situation as it comes up. Look at how much of the discussion so far has gotten derailed onto topics like subclassing, submodules, etc.
It's the most discussed issue because it was the test case where we developed all these policies in the first place :-). I'm not sure it's particularly interesting aside from that, and that specific history ("let's come up with a transition plan for this feature that no-one actually cares about, b/c no-one cares about it so it's a good thing to use as a test case") is unlikely to be repeated.
I can see an argument for splitting random and linalg into their own modules, which numpy depends on and imports so that existing code doesn't break. E.g. this might let people install an old version of random if they needed to reproduce some old results, or help us merge numpy and scipy's linalg modules into a single package. I agree though that making 'np.linalg' start raising AttributeError is a total non-starter.
The problem is that "2.0" means a lot of different things to different people, not just "some future date to be determined", so using it that way will confuse people. Also, it's hard to predict when a deprecation will actually happen... it's very common that we adjust the schedule as we go (e.g. when we try to remove it and then discover it breaks everyone so we have to put it back for a while). I feel like it would be better to do this based on time -- like say "this will be removed <today + 1 year>" or something, and then it might take longer but not shorter? Re: version numbers, I actually think numpy should consider switching to calver [1]. We'd be giving up on being able to do a "2.0", but that's kind of a good thing -- if a change is too big to handle through our normal deprecation cycle, then it's probably too big to handle period. And "numpy 2018.3" gives you more information than our current scheme -- for example you could see at a glance that numpy 2012.1 is super out-of-date, and we could tell people that numpy 2019.1 will drop python 2 support. ...But that's a whole other discussion, and we shouldn't get derailed onto it here in this NEP's thread :-). [1] https://calver.org/ -n -- Nathaniel J. Smith -- https://vorpus.org
![](https://secure.gravatar.com/avatar/5f88830d19f9c83e2ddfd913496c5025.jpg?s=120&d=mm&r=g)
On Tue, Jul 24, 2018 at 8:07 PM, Nathaniel Smith <njs@pobox.com> wrote:
The subclassing discussion was actually illuminating and useful. Maybe it does deserve its own write-up somewhere though. Happy to remove that too. Would then like to put it somewhere else - in the docs, another NEP, ...? The submodules one I'd really like to keep.
Pretty sure that's not true, we had policies long before that plus it was not advertised as a test case for backwards compat (it's just an improvement that someone wanted to implement). But well, I don't care enough about this particular one to argue about it - I'll remove it. I'm not sure it's
Me too, that could happen. But that's unrelated to backwards compatibility. E.g. this might let people install an old version of
It is, hence why I say above that I'd like to keep that example.
This does make sense to me. -- like say
"this will be removed <today + 1 year>" or something, and then it might take longer but not shorter?
You can't practically do "today", should be <version number of next release when PR is merged + at least N years>. But yes that is useful, the point is to give a clear indication and it's then easy for the user to figure out when the earliest date is that the removal could happen. Given that this is clear and avoids the version number discussion, I'm happy to go with that and remove the major/minor version text. Cheers, Ralf
participants (6)
-
Charles R Harris
-
Hameer Abbasi
-
Marten van Kerkwijk
-
Nathaniel Smith
-
Ralf Gommers
-
Stephan Hoyer