updated backwards compatibility and deprecation policy NEP
![](https://secure.gravatar.com/avatar/5f88830d19f9c83e2ddfd913496c5025.jpg?s=120&d=mm&r=g)
Hi all, Here is a long overdue update of the draft NEP about backwards compatibility and deprecation policy: https://github.com/numpy/numpy/pull/18097 - This is NEP 23: https://numpy.org/neps/nep-0023-backwards-compatibility.html - Link to the previous mailing list discussion: https://mail.python.org/pipermail/numpy-discussion/2018-July/078432.html It would be nice to get this NEP to Accepted status. Main changes are: - Removed all examples that people objected to - Removed all content regarding versioning - Restructured sections, and added "Strategies related to deprecations" (using suggestions by @njsmith and @shoyer). - Added concrete examples of deprecations, and a more thorough description of how to go about adding warnings incl. Sphinx directives, using `stacklevel`, etc. As always, feedback here or on the PR is very welcome! Cheers, Ralf Abstract -------- In this NEP we describe NumPy's approach to backwards compatibility, its deprecation and removal policy, and the trade-offs and decision processes for individual cases where breaking backwards compatibility is considered. Motivation and Scope -------------------- NumPy has a very large user base. Those users rely on NumPy being stable and the code they write that uses NumPy functionality to keep working. NumPy is also actively maintained and improved -- and sometimes improvements require, or are made much easier by, breaking backwards compatibility. Finally, there are trade-offs in stability for existing users vs. avoiding errors or having a better user experience for new users. These competing needs often give rise to long debates and to delays in accepting or rejecting contributions. This NEP tries to address that by providing a policy as well as examples and rationales for when it is or isn't a good idea to break backwards compatibility. In scope for this NEP are: - Principles of NumPy's approach to backwards compatibility. - How to deprecate functionality, and when to remove already deprecated functionality. - Decision making process for deprecations and removals. Out of scope are: - Making concrete decisions about deprecations of particular functionality. - NumPy's versioning scheme. General principles ------------------ When considering proposed changes that are backwards incompatible, the main principles the NumPy developers use when making a decision are: 1. Changes need to benefit users more than they harm them. 2. NumPy is widely used so breaking changes should by default be assumed to be fairly harmful. 3. Decisions should be based on data and actual effects on users and downstream packages rather than, e.g., appealing to the docs or for stylistic reasons. 4. Silently getting a wrong answer is much worse than getting a loud error. When assessing the costs of proposed changes, keep in mind that most users do not read the mailing list, do not look at deprecation warnings, and sometimes wait more than one or two years before upgrading from their old version. And that NumPy has millions of users, so "no one will do or use this" is very likely incorrect. Benefits include improved functionality, usability and performance, as well as lower maintenance cost and improved future extensibility. Fixes for clear bugs are exempt from this backwards compatibility policy. However in case of serious impact on users (e.g. a downstream library doesn't build anymore or would start giving incorrect results), even bug fixes may have to be delayed for one or more releases. Strategies related to deprecations ---------------------------------- Getting hard data on the impact of a deprecation of often difficult. Strategies that can be used to assess such impact include: - Use a code search engine ([1]_) or static ([2]_) or dynamic ([3]_) code analysis tools to determine where and how the functionality is used. - Testing prominent downstream libraries against a development build of NumPy containing the proposed change to get real-world data on its impact. - Making a change in master and reverting it, if needed, before a release. We do encourage other packages to test against NumPy's master branch, so this often turns up issues quickly. If the impact is unclear or significant, it is often good to consider alternatives to deprecations. For example discouraging use in documentation only, or moving the documentation for the functionality to a less prominent place or even removing it completely. Commenting on open issues related to it that they are low-prio or labeling them as "wontfix" will also be a signal to users, and reduce the maintenance effort needing to be spent. Implementing deprecations and removals -------------------------------------- Deprecation warnings are necessary in all cases where functionality will eventually be removed. If there is no intent to remove functionality, then it should not be deprecated either. A "please don't use this" in the documentation or other type of warning should be used instead. Deprecations: - shall include the version number of the release in which the functionality was deprecated. - shall include information on alternatives to the deprecated functionality, or a reason for the deprecation if no clear alternative is available. - shall use ``VisibleDeprecationWarning`` rather than ``DeprecationWarning`` for cases of relevance to end users. For cases only relevant to downstream libraries, a regular ``DeprecationWarning`` is fine. *Rationale: regular deprecation warnings are invisible by default; library authors should be aware how deprecations work and test for them, but we can't expect this from all users.* - shall be listed in the release notes of the release where the deprecation is first present. - shall set a ``stacklevel``, so the warning appears to come from the correct place. - shall be mentioned in the documentation for the functionality. A ``.. deprecated::`` directive can be used for this. Examples of good deprecation warnings: .. code-block:: python warnings.warn('np.asscalar(a) is deprecated since NumPy 1.16.0, use ' 'a.item() instead', DeprecationWarning, stacklevel=3) warnings.warn("Importing from numpy.testing.utils is deprecated " "since 1.15.0, import from numpy.testing instead.", DeprecationWarning, stacklevel=2) # A change in NumPy 1.14.0 for Python 3 loadtxt/genfromtext, slightly # tweaked in this NEP (original didn't have version number). warnings.warn( "Reading unicode strings without specifying the encoding " "argument is deprecated since NumPy 1.14.0. Set the encoding, " "use None for the system default.", np.VisibleDeprecationWarning, stacklevel=2) Removal of deprecated functionality: - shall be done after at least 2 releases (assuming the current 6-monthly release cycle; if that changes, there shall be at least 1 year between deprecation and removal). - shall be listed in the release notes of the release where the removal happened. - can be done in any minor (but not bugfix) release. For backwards incompatible changes that aren't "deprecate and remove" but for which code will start behaving differently, a ``FutureWarning`` should be used. Release notes, mentioning version number and using ``stacklevel`` should be done in the same way as for deprecation warnings. A ``.. versionchanged::`` directive can be used in the documentation to indicate when the behavior changed: .. code-block:: python def argsort(self, axis=np._NoValue, ...): """ Parameters ---------- axis : int, optional Axis along which to sort. If None, the default, the flattened array is used. .. versionchanged:: 1.13.0 Previously, the default was documented to be -1, but that was in error. At some future date, the default will change to -1, as originally intended. Until then, the axis should be given explicitly when ``arr.ndim > 1``, to avoid a FutureWarning. """ ... warnings.warn( "In the future the default for argsort will be axis=-1, not the " "current None, to match its documentation and np.argsort. " "Explicitly pass -1 or None to silence this warning.", MaskedArrayFutureWarning, stacklevel=3) Decision making ~~~~~~~~~~~~~~~ In concrete cases where this policy needs to be applied, decisions are made according to the `NumPy governance model <https://docs.scipy.org/doc/numpy/dev/governance/index.html>`_. All deprecations must be proposed on the mailing list, in order to give everyone with an interest in NumPy development to be able to comment. Removal of deprecated functionality does not need discussion on the mailing list. Functionality with more strict deprecation policies ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ - ``numpy.random`` has its own backwards compatibility policy, see `NEP 19 <http://www.numpy.org/neps/nep-0019-rng-policy.html>`_. - The file format for ``.npy`` and ``.npz`` files must not be changed in a backwards incompatible way. Example cases ------------- We now discuss a few concrete examples from NumPy's history to illustrate typical issues and trade-offs. **Changing the behavior of a function** ``np.histogram`` is probably the most infamous example. First, a new keyword ``new=False`` was introduced, this was then switched over to None one release later, and finally it was removed again. Also, it has a ``normed`` keyword that had behavior that could be considered either suboptimal or broken (depending on ones opinion on the statistics). A new keyword ``density`` was introduced to replace it; ``normed`` started giving ``DeprecationWarning`` only in v.1.15.0. Evolution of ``histogram``:: def histogram(a, bins=10, range=None, normed=False): # v1.0.0 def histogram(a, bins=10, range=None, normed=False, weights=None, new=False): #v1.1.0 def histogram(a, bins=10, range=None, normed=False, weights=None, new=None): #v1.2.0 def histogram(a, bins=10, range=None, normed=False, weights=None): #v1.5.0 def histogram(a, bins=10, range=None, normed=False, weights=None, density=None): #v1.6.0 def histogram(a, bins=10, range=None, normed=None, weights=None, density=None): #v1.15.0 # v1.15.0 was the first release where `normed` started emitting # DeprecationWarnings The ``new`` keyword was planned from the start to be temporary. Such a plan forces users to change their code more than once, which is almost never the right thing to do. Instead, a better approach here would have been to deprecate ``histogram`` and introduce a new function ``hist`` in its place. **Disallowing indexing with floats** Indexing an array with floats is asking for something ambiguous, and can be a sign of a bug in user code. After some discussion, it was deemed a good idea to deprecate indexing with floats. This was first tried for the v1.8.0 release, however in pre-release testing it became clear that this would break many libraries that depend on NumPy. Therefore it was reverted before release, to give those libraries time to fix their code first. It was finally introduced for v1.11.0 and turned into a hard error for v1.12.0. This change was disruptive, however it did catch real bugs in, e.g., SciPy and scikit-learn. Overall the change was worth the cost, and introducing it in master first to allow testing, then removing it again before a release, is a useful strategy. Similar deprecations that also look like good examples of cleanups/improvements: - removing deprecated boolean indexing (in 2016, see `gh-8312 < https://github.com/numpy/numpy/pull/8312>`__) - deprecating truth testing on empty arrays (in 2017, see `gh-9718 < https://github.com/numpy/numpy/pull/9718>`__) **Removing the financial functions** The financial functions (e.g. ``np.pmt``) had short non-descriptive names, were present in the main NumPy namespace, and didn't really fit well within NumPy's scope. They were added in 2008 after `a discussion < https://mail.python.org/pipermail/numpy-discussion/2008-April/032353.html>`_ on the mailing list where opinion was divided (but a majority in favor). The financial functions didn't cause a lot of overhead, however there were still multiple issues and PRs a year for them which cost maintainer time to deal with. And they cluttered up the ``numpy`` namespace. Discussion on removing them happened in 2013 (gh-2880, rejected) and then again in 2019 (:ref:`NEP32`, accepted without significant complaints). Given that they were clearly outside of NumPy's scope, moving them to a separate ``numpy-financial`` package and removing them from NumPy after a deprecation period made sense. Alternatives ------------ **Being more aggressive with deprecations.** The goal of being more aggressive is to allow NumPy to move forward faster. This would avoid others inventing their own solutions (often in multiple places), as well as be a benefit to users without a legacy code base. We reject this alternative because of the place NumPy has in the scientific Python ecosystem - being fairly conservative is required in order to not increase the extra maintenance for downstream libraries and end users to an unacceptable level. Discussion ---------- - `Mailing list discussion on the first version of this NEP in 2018 < https://mail.python.org/pipermail/numpy-discussion/2018-July/078432.html>`__ References and Footnotes ------------------------ - `Issue requesting semantic versioning < https://github.com/numpy/numpy/issues/10156>`__ .. [1] https://searchcode.com/ .. [2] https://github.com/Quansight-Labs/python-api-inspect .. [3] https://github.com/data-apis/python-record-api
![](https://secure.gravatar.com/avatar/81e62cb212edf2a8402c842b120d9f31.jpg?s=120&d=mm&r=g)
Hi Ralf, This reads really nice. Thanks to everyone who contributed. Before nitpicking here and there, and sticking my head for others, is this is a finished discussion and only stylistic feedback is expected? Also is it preferred here or in the PR? GitHub is really not designed for extended discussions and here it if there are two subjects are discussed simultaneously it just becomes difficult to follow (maybe it's a bias due to my dislike of mailing lists). One of the less mentioned point is about what the tipping point is for the benefits outweighing the compatibility breakage sin and how to get a feeling for it. Because for a typical user, every break is just a break. Nobody will squint their eyes to see the reasoning behind it downstream. Thus this is more of a declaration of "yes as maintainers we are ready for facing the consequences but it had to be done because such and such". I am not asking to initiate a power discussion ala "who has the mod hammer" but rather what constitutes as a valid business case for a breakage proposal. A few generic lines about that would go a long way. Because we are in the same situation with scipy.linalg in which, what to do is crystal clear but how to do it without breaking anything is herding the cats hence I am genuinely curious how to go about this. Best, ilhan On Wed, Dec 30, 2020 at 3:07 PM Ralf Gommers <ralf.gommers@gmail.com> wrote:
![](https://secure.gravatar.com/avatar/5f88830d19f9c83e2ddfd913496c5025.jpg?s=120&d=mm&r=g)
On Wed, Dec 30, 2020 at 4:05 PM Ilhan Polat <ilhanpolat@gmail.com> wrote:
It's not. I removed everything that was controversial last time around and only added things that are basically the way we already do things, so I don't really expect major issues. But it is a big rewrite, so some discussion is certainly expected. Also is it preferred here or in the PR? GitHub is really not designed for
Agreed. The idea is we post NEPs to the mailing list, and major issues get discussed here. If there's smaller comments like stylistic things or small errors, commenting on the PR for those is much easier.
That's very hard to describe, since it relies so much on previous experience and qualitative judgements. That's the main reason why I had more examples before, but they just led to more discussion about those examples - so that didn't quite have the intended effect. I am not asking to initiate a power discussion ala "who has the mod hammer"
If anyone has a good proposal, that'd be great. But I find it hard to come up with those few lines right now. Cheers, Ralf
![](https://secure.gravatar.com/avatar/b4f6d4f8b501cb05fd054944a166a121.jpg?s=120&d=mm&r=g)
On Wed, 2020-12-30 at 16:27 +0100, Ralf Gommers wrote:
Thanks Ralf, I will look at it more carefully only next year probably.
<snip>
One thing I thought could be useful here is to use quality management/assurance techniques that are typical (at least in Europe to my knowledge) for pretty much every product development (I do _not_ mean software specific Q&A, which has a different ISO, that I doubt helps). I only took a short course and used this very little. I am sure there are many here with industry experience where the use of Q&A is every day work. One concept from there is to create a risk/danger and probability assessment, which can be ad-hoc for your product. An example just to make something up: Developing a chair, possibility: leg breaks. Likelyhood (based on current design): [Moderate] someone puts something too heavy on it Danger: Serious risk of large injury [high] You then say that (rows and columns are danger and likelyhood): low moderate high low OK OK not OK moderate OK not OK not OK high not OK not OK not OK (low danger could for example be a splinter, its OK if it happens sometimes, but you don't want it to happen to many customers.) Now in the above case, you get into the "not OK" column, so you will try to mitigate (reinforce the chair, print a maximum weight on it, maybe you also have to discuss it away as an unavoidable risk). For us, this would translate to number of users affected and how badly they (can be) affected, probably. And since I don't like a "this change can never happen", the lower part of the triangle would probably just be "requires a NEP" (In my opinion, I realize that some things are probably truly impossible, but in that case a NEP won't fly either). This is just an idea that I think could very much be helpful. The table needs to be filled with rough examples, but it would be completely fine to not even fill it completely IMO. There are also tricky things like a two release policy (which could be part of a "mitigation", lowering the likelyhood or danger but I am not certain it fits well). (I think the example tables usually had 4 columns/rows, but I don't remember) This felt very ad-hoc to me when I first learned about it and of course it is not always clear if something is low or moderate risk. But I do like that it gives *some* formalization. Note that IIRC the ISO standard does not even attempt to say what categories a specific product development should use. (I think this is all ISO 9000, but I am did not double check and just to note, ISO norms are fairly expensive unless you live in India.) Cheers, Sebastian
![](https://secure.gravatar.com/avatar/b4f6d4f8b501cb05fd054944a166a121.jpg?s=120&d=mm&r=g)
On Wed, 2020-12-30 at 11:43 -0600, Sebastian Berg wrote:
I am not sure anyone finds this interesting or if fits to the NEP specifically [1], but I truly think it can be useful (although maybe it doesn't need to be formalized). So I fleshed it out: https://hackmd.io/WuS1rCzrTYOTgzUfRJUOnw (also pasted it below) My reasoning for suggesting it is that a process/formalism (no matter how ridiculous it may seem at first) for how to assess the impact of a backward compatible change can be helpful by: conceptualizing, clearly separating backward incompatible impact assessment from benefits assessment, making it easier to follow a decision/thought processes, and allowing some nuance [2]. I actually believe that it can help with difficult decisions, even if only applied occasionally, and that it is not a burden because it provides fairly steps. Will it be useful often? Maybe not. But every time there is a proposal and we pause and hesitate because it is unclear whether it is worth the backcompat impact, I think this can provide a way to discuss it and come to a decision as objectively as possible. (And no, I do not think that any of the categories or mitigation strategies are an exact science.) Cheers, Sebastian [1] This is additional to the proposed promises such as a two releases of deprecations and discussing most/all deprecations on the mailing list, which are unrelated. It is rather to provide a formalism where currently only the examples give points of reference. [2] There is a reason that also the Python version is short and intentionally fuzzy: https://www.python.org/dev/peps/pep-0387/ and https://discuss.python.org/t/pep-387-backwards-compatibilty-policy/4421 There are just few definite rules that can be formalized, so a framework for diligent assessment seems the best we can do (if we want to). Assessing impactHere “impact” means how unmodified code may be negatively affected by a change ignoring any deprecation period. To get an idea about how much impact a change has, try to list all potential impacts. This will often be just a single item (user of function x has to replace it with y), but it could be multiple different ones. After listing all potential impacts rank them on the following two scales (do not yet think about how to make the transition easier): 1. Severity (How bad is the impact for an affected user?)Minor: A performance regression or change in (undocumented) warning/error category will fall here. This type of change would normally not require a deprecation cycle or special consideration.Typical: Code must be updated to avoid an error, the update is simple to do in a way that works both on existing and future NumPy versions.Severe: Code will error or crash, and there is no simple work around or fix.Critical: Code returns incorrect results. A change requiring massive effort may fall here. A hard crash (e.g. segfault) in itself is typically not critical. 2. Likelihood (How many users does the change affect?)Rare: Change has very few impacted users (or even no known users after a code search). The normal assumption is that there is always someone affected, but a rarely used keyword argument of an already rarely used function will fall here.Limited: Change is in a rarely used function or function argument. Another possibility is that it affects only a small group of very advanced users.Common: Change affects a bigger audience or multiple large downstream libraries.Ubiquitous: Change affects a large fraction of NumPy users. The categories will not always be perfectly clear. That is OK. Rather than establishing precise guidelines, the purpose is a structured processes that can be reviewed. When the impact is exceptionally difficult to assess, it is often feasible to try a change on the development branch while signalling willigness to revert it. Downstream libraries test against it (and the release candidate) which gives a chance to correct an originally optimistic assessment. After assessing each impact, it will fall somewhere on the following table: Severity\LikelyhoodRareLimitedCommonUbiquitousMinorokokok?Typicalok?no?Severeno?noCriticalno?nonono Note that all changes should normally follow the two release deprecation warning policy (except “minor” ones). The “no” fields means a change is clearly unacceptable, although a NEP can always overrule it. This table only assesses the “impact”. It does not assess how the impact compares to the benefits of the proposed change. This must be favourable no matter how small the impact is. However, by assessing the impact, it will be easier to weigh it against the benefit. (Note that the table is not symmetric. An impact with “critical” severity is unlikely to be considered even when no known users are impacted.) Mitigation and arguing of benefitsAny change falling outside the “ok” fields requires careful consideration. When an impact is larger, you can try to mitigate it and “move” on the table. Some possible reasons for this are: * A avoidable warning for at least two releases (the policy for any change that modifies behaviour) reduces a change one category (usually from “typical” to “minor” severity). * The severity category may be reduced by creating an easy work around (i.e. to move it from “sever” to “typical”). * Sometimes a change may break working code, but also fix existing bugs, this can offset the severity. In extreme cases, this may warrant classifying a change as a bug-fix. * For particularly noisy changes (i.e. ubiquitous category) considering fixing downstream packages, delay the warning (or use a PendingDeprecationWarning). Simply prolonging the the deprecation period is also an option. This reduces how many users struggle with the change and smoothens the transition. * Exceptionally clear documentation and communication could be used to ensure that the impact is more acceptable. This may not be enough to move a “category” by itself, but also helps. After mitigation, the benefits can be assessed: * Any benefit of the change can be argued to “offset” the impact. If this is necessary, a broad community discussion on the mailing list is required. It should be clear that this does not actually “mitigate” the impact but rather argues that the benefit outweighs it. These are not a fixed set of rules, but provide a framework to assess and then try to mitigate the impact of a proposed change to an acceptable level. Arguing that a benefit can overcome multiple “impact” categories will require exceptionally large benefits, and most likely a NEP. For example a change with an initial impact classification of “severe” and “ubiquitous” is unlikely to even be considered unless the severity can be reduced. Many deprecations will fall somewhere below or equal to a “typical and limited” impact (i.e. removal of an uncommon function argument). They recieve a deprecation warning to make the impact acceptable with a brief discussiong that the change itself is worthwhile (i.e. the API is much cleaner afterwards). Any more disruptive change requires broad community discussion. This needs at least a discussion on the NumPy mailing list and it is likely that the person proposing it will be asked to write a NEP. Summary and reasoning for this processessThe aim of this process and table is to provide a loose formalism with the goal of: * Diligence: Following this process ensures detailed assessment of its impact without being distracted by the benefits. This is achieved by following well defined steps:Listing each potential impact (usually one).Assessing the severity.Assessing the likelihood.Discussing what steps are/can be taken to lower the impact ignoring any benefits.If the impact is not low at this point, this should prompt considering and listing of alternatives.Argue that the benefits outweigh the remaining impact. (This is a distinct step: the original impact assessment stands as it was.) * Transparency: Using this process for difficult decisions makes it easier for the reviewer and community to follow how a decision was made and criticize it. * Nuance: When the it is clear that an impact is larger than typical with will prompt more care and thought. In some cases it may also clarify that a change is lower impact than expected on first sight. * Experience: Using a similar formalism for many changes makes it easier to learn from past decisions by providing an approach to compare and conceptualize them. We aim to follow these steps in the future for difficult decisions. In general, any reviewer and community member may ask for this process to be followed for a proposed change, if the change is difficult, it will be worth the effort. If it is very low impact it will be quick to clarify why. NOTE: At this time the process is new and is expected to require clarification. ExamplesIt should be stressed again, that the categories will rarely be clear and intentially are categorized with some uncertainty below. Even unclear categories can help in forming a more clear idea of a change. HistogramThe “histogram” example doesn’t really add much with respect to this process. But noting the duplicate effort/impact would move probably move it into a more severe category than most deprecations. That makes it a more difficult decision and indicates that careful thought should be spend on alternatives. Integer indexing requirement * Severity: Typical–Severe (although fairly easy, users often had to do many changes) * Likelihood: Ubiquitous How ubiquitous it really was became probably only clear after the (rc?) release. The change would now probably go through a NEP as it initially falls into the lower right part of the table. To get into the “acceptable” part of the table we note that: 1. Real bugs were caught in the processes (argued to reduce severity) 2. The deprecation was delayed and longer than normally (argued to mitigate the number of affected users by giving much more time) Even with these considerations, it still has a larger impact and clearly requires careful thought and community discussion about the benefits. Removing financial functions * Severity: Severe (on the high end) * Likelihood: Limited (maybe common) While not used by a large user base (limited), the removal is disurptive (severe). The change ultimately required a NEP, since it is not easy to weigh the maintainence advantage of removing the functions against the impact to their users. The NEP included the reduction of the severity by providing a work- around: A pip installable package as a drop-in replacement (reducing the severity). For heavy users of these functions this will still be more severe than most deprecations, but it lowered the impact assessment enough to consider the benefit of removal to outweigh the impact.
![](https://secure.gravatar.com/avatar/5f88830d19f9c83e2ddfd913496c5025.jpg?s=120&d=mm&r=g)
On Sat, Jan 2, 2021 at 3:55 AM Sebastian Berg <sebastian@sipsolutions.net> wrote:
Thanks for thinking about this Sebastian. I used to use such a risk management approach fairly regularly, and it can be useful. In general it's something you do for a larger design change or new product, rather than for an individual change. It helps get an overview of the main risks, and prompts thinking about risks you may have missed.
I'd be happy to try it. It does feel a bit too much to put all that content into the NEP though. Maybe we can just add a more brief "assess severity and likelihood and severity of your proposed change, and include that assessment when proposing a deprecation. See <here> for more details". And then we can link to a wiki page or separate doc page, that we can then easily update without it being a NEP revision. Cheers, Ralf
![](https://secure.gravatar.com/avatar/b4f6d4f8b501cb05fd054944a166a121.jpg?s=120&d=mm&r=g)
On Sat, 2021-01-02 at 18:06 +0100, Ralf Gommers wrote:
Yeah, I guess its for new products mostly, to compile many risks and make it easier to compare them. And yes, we do not have "many" risks, unless you would compile this for the complete changelog between one or more versions.
Yes, it adds a lot of content and I don't want to force it on anyone or into the NEP, in that sense it is more brainstorming than a very concrete proposal. And I am also fine with just dropping it, whatever others think is useful. I fleshed it out a bit, because I actually think it ends up representing fairly well how I currently try to approach this and I think it may be useful when a proposal gets stuck because it is unclear whether it is worth the pain/risk. Cheers, Sebastian
![](https://secure.gravatar.com/avatar/5f88830d19f9c83e2ddfd913496c5025.jpg?s=120&d=mm&r=g)
Hi all, The update PR was merged after a lot more review on GitHub. I propose we change the status of this NEP to Accepted. We'll merge a PR to do so unless there are objections within the next five days. Cheers, Ralf On Wed, Dec 30, 2020 at 3:05 PM Ralf Gommers <ralf.gommers@gmail.com> wrote:
![](https://secure.gravatar.com/avatar/d9ac9213ada4a807322f99081296784b.jpg?s=120&d=mm&r=g)
On Tue, Jan 26, 2021, at 00:25, Ralf Gommers wrote:
The update PR was merged after a lot more review on GitHub. I propose we change the status of this NEP to Accepted. We'll merge a PR to do so unless there are objections within the next five days.
Thanks for the heads-up, Ralf. I am happy to have the NEP accepted. Stéfan
![](https://secure.gravatar.com/avatar/81e62cb212edf2a8402c842b120d9f31.jpg?s=120&d=mm&r=g)
Hi Ralf, This reads really nice. Thanks to everyone who contributed. Before nitpicking here and there, and sticking my head for others, is this is a finished discussion and only stylistic feedback is expected? Also is it preferred here or in the PR? GitHub is really not designed for extended discussions and here it if there are two subjects are discussed simultaneously it just becomes difficult to follow (maybe it's a bias due to my dislike of mailing lists). One of the less mentioned point is about what the tipping point is for the benefits outweighing the compatibility breakage sin and how to get a feeling for it. Because for a typical user, every break is just a break. Nobody will squint their eyes to see the reasoning behind it downstream. Thus this is more of a declaration of "yes as maintainers we are ready for facing the consequences but it had to be done because such and such". I am not asking to initiate a power discussion ala "who has the mod hammer" but rather what constitutes as a valid business case for a breakage proposal. A few generic lines about that would go a long way. Because we are in the same situation with scipy.linalg in which, what to do is crystal clear but how to do it without breaking anything is herding the cats hence I am genuinely curious how to go about this. Best, ilhan On Wed, Dec 30, 2020 at 3:07 PM Ralf Gommers <ralf.gommers@gmail.com> wrote:
![](https://secure.gravatar.com/avatar/5f88830d19f9c83e2ddfd913496c5025.jpg?s=120&d=mm&r=g)
On Wed, Dec 30, 2020 at 4:05 PM Ilhan Polat <ilhanpolat@gmail.com> wrote:
It's not. I removed everything that was controversial last time around and only added things that are basically the way we already do things, so I don't really expect major issues. But it is a big rewrite, so some discussion is certainly expected. Also is it preferred here or in the PR? GitHub is really not designed for
Agreed. The idea is we post NEPs to the mailing list, and major issues get discussed here. If there's smaller comments like stylistic things or small errors, commenting on the PR for those is much easier.
That's very hard to describe, since it relies so much on previous experience and qualitative judgements. That's the main reason why I had more examples before, but they just led to more discussion about those examples - so that didn't quite have the intended effect. I am not asking to initiate a power discussion ala "who has the mod hammer"
If anyone has a good proposal, that'd be great. But I find it hard to come up with those few lines right now. Cheers, Ralf
![](https://secure.gravatar.com/avatar/b4f6d4f8b501cb05fd054944a166a121.jpg?s=120&d=mm&r=g)
On Wed, 2020-12-30 at 16:27 +0100, Ralf Gommers wrote:
Thanks Ralf, I will look at it more carefully only next year probably.
<snip>
One thing I thought could be useful here is to use quality management/assurance techniques that are typical (at least in Europe to my knowledge) for pretty much every product development (I do _not_ mean software specific Q&A, which has a different ISO, that I doubt helps). I only took a short course and used this very little. I am sure there are many here with industry experience where the use of Q&A is every day work. One concept from there is to create a risk/danger and probability assessment, which can be ad-hoc for your product. An example just to make something up: Developing a chair, possibility: leg breaks. Likelyhood (based on current design): [Moderate] someone puts something too heavy on it Danger: Serious risk of large injury [high] You then say that (rows and columns are danger and likelyhood): low moderate high low OK OK not OK moderate OK not OK not OK high not OK not OK not OK (low danger could for example be a splinter, its OK if it happens sometimes, but you don't want it to happen to many customers.) Now in the above case, you get into the "not OK" column, so you will try to mitigate (reinforce the chair, print a maximum weight on it, maybe you also have to discuss it away as an unavoidable risk). For us, this would translate to number of users affected and how badly they (can be) affected, probably. And since I don't like a "this change can never happen", the lower part of the triangle would probably just be "requires a NEP" (In my opinion, I realize that some things are probably truly impossible, but in that case a NEP won't fly either). This is just an idea that I think could very much be helpful. The table needs to be filled with rough examples, but it would be completely fine to not even fill it completely IMO. There are also tricky things like a two release policy (which could be part of a "mitigation", lowering the likelyhood or danger but I am not certain it fits well). (I think the example tables usually had 4 columns/rows, but I don't remember) This felt very ad-hoc to me when I first learned about it and of course it is not always clear if something is low or moderate risk. But I do like that it gives *some* formalization. Note that IIRC the ISO standard does not even attempt to say what categories a specific product development should use. (I think this is all ISO 9000, but I am did not double check and just to note, ISO norms are fairly expensive unless you live in India.) Cheers, Sebastian
![](https://secure.gravatar.com/avatar/b4f6d4f8b501cb05fd054944a166a121.jpg?s=120&d=mm&r=g)
On Wed, 2020-12-30 at 11:43 -0600, Sebastian Berg wrote:
I am not sure anyone finds this interesting or if fits to the NEP specifically [1], but I truly think it can be useful (although maybe it doesn't need to be formalized). So I fleshed it out: https://hackmd.io/WuS1rCzrTYOTgzUfRJUOnw (also pasted it below) My reasoning for suggesting it is that a process/formalism (no matter how ridiculous it may seem at first) for how to assess the impact of a backward compatible change can be helpful by: conceptualizing, clearly separating backward incompatible impact assessment from benefits assessment, making it easier to follow a decision/thought processes, and allowing some nuance [2]. I actually believe that it can help with difficult decisions, even if only applied occasionally, and that it is not a burden because it provides fairly steps. Will it be useful often? Maybe not. But every time there is a proposal and we pause and hesitate because it is unclear whether it is worth the backcompat impact, I think this can provide a way to discuss it and come to a decision as objectively as possible. (And no, I do not think that any of the categories or mitigation strategies are an exact science.) Cheers, Sebastian [1] This is additional to the proposed promises such as a two releases of deprecations and discussing most/all deprecations on the mailing list, which are unrelated. It is rather to provide a formalism where currently only the examples give points of reference. [2] There is a reason that also the Python version is short and intentionally fuzzy: https://www.python.org/dev/peps/pep-0387/ and https://discuss.python.org/t/pep-387-backwards-compatibilty-policy/4421 There are just few definite rules that can be formalized, so a framework for diligent assessment seems the best we can do (if we want to). Assessing impactHere “impact” means how unmodified code may be negatively affected by a change ignoring any deprecation period. To get an idea about how much impact a change has, try to list all potential impacts. This will often be just a single item (user of function x has to replace it with y), but it could be multiple different ones. After listing all potential impacts rank them on the following two scales (do not yet think about how to make the transition easier): 1. Severity (How bad is the impact for an affected user?)Minor: A performance regression or change in (undocumented) warning/error category will fall here. This type of change would normally not require a deprecation cycle or special consideration.Typical: Code must be updated to avoid an error, the update is simple to do in a way that works both on existing and future NumPy versions.Severe: Code will error or crash, and there is no simple work around or fix.Critical: Code returns incorrect results. A change requiring massive effort may fall here. A hard crash (e.g. segfault) in itself is typically not critical. 2. Likelihood (How many users does the change affect?)Rare: Change has very few impacted users (or even no known users after a code search). The normal assumption is that there is always someone affected, but a rarely used keyword argument of an already rarely used function will fall here.Limited: Change is in a rarely used function or function argument. Another possibility is that it affects only a small group of very advanced users.Common: Change affects a bigger audience or multiple large downstream libraries.Ubiquitous: Change affects a large fraction of NumPy users. The categories will not always be perfectly clear. That is OK. Rather than establishing precise guidelines, the purpose is a structured processes that can be reviewed. When the impact is exceptionally difficult to assess, it is often feasible to try a change on the development branch while signalling willigness to revert it. Downstream libraries test against it (and the release candidate) which gives a chance to correct an originally optimistic assessment. After assessing each impact, it will fall somewhere on the following table: Severity\LikelyhoodRareLimitedCommonUbiquitousMinorokokok?Typicalok?no?Severeno?noCriticalno?nonono Note that all changes should normally follow the two release deprecation warning policy (except “minor” ones). The “no” fields means a change is clearly unacceptable, although a NEP can always overrule it. This table only assesses the “impact”. It does not assess how the impact compares to the benefits of the proposed change. This must be favourable no matter how small the impact is. However, by assessing the impact, it will be easier to weigh it against the benefit. (Note that the table is not symmetric. An impact with “critical” severity is unlikely to be considered even when no known users are impacted.) Mitigation and arguing of benefitsAny change falling outside the “ok” fields requires careful consideration. When an impact is larger, you can try to mitigate it and “move” on the table. Some possible reasons for this are: * A avoidable warning for at least two releases (the policy for any change that modifies behaviour) reduces a change one category (usually from “typical” to “minor” severity). * The severity category may be reduced by creating an easy work around (i.e. to move it from “sever” to “typical”). * Sometimes a change may break working code, but also fix existing bugs, this can offset the severity. In extreme cases, this may warrant classifying a change as a bug-fix. * For particularly noisy changes (i.e. ubiquitous category) considering fixing downstream packages, delay the warning (or use a PendingDeprecationWarning). Simply prolonging the the deprecation period is also an option. This reduces how many users struggle with the change and smoothens the transition. * Exceptionally clear documentation and communication could be used to ensure that the impact is more acceptable. This may not be enough to move a “category” by itself, but also helps. After mitigation, the benefits can be assessed: * Any benefit of the change can be argued to “offset” the impact. If this is necessary, a broad community discussion on the mailing list is required. It should be clear that this does not actually “mitigate” the impact but rather argues that the benefit outweighs it. These are not a fixed set of rules, but provide a framework to assess and then try to mitigate the impact of a proposed change to an acceptable level. Arguing that a benefit can overcome multiple “impact” categories will require exceptionally large benefits, and most likely a NEP. For example a change with an initial impact classification of “severe” and “ubiquitous” is unlikely to even be considered unless the severity can be reduced. Many deprecations will fall somewhere below or equal to a “typical and limited” impact (i.e. removal of an uncommon function argument). They recieve a deprecation warning to make the impact acceptable with a brief discussiong that the change itself is worthwhile (i.e. the API is much cleaner afterwards). Any more disruptive change requires broad community discussion. This needs at least a discussion on the NumPy mailing list and it is likely that the person proposing it will be asked to write a NEP. Summary and reasoning for this processessThe aim of this process and table is to provide a loose formalism with the goal of: * Diligence: Following this process ensures detailed assessment of its impact without being distracted by the benefits. This is achieved by following well defined steps:Listing each potential impact (usually one).Assessing the severity.Assessing the likelihood.Discussing what steps are/can be taken to lower the impact ignoring any benefits.If the impact is not low at this point, this should prompt considering and listing of alternatives.Argue that the benefits outweigh the remaining impact. (This is a distinct step: the original impact assessment stands as it was.) * Transparency: Using this process for difficult decisions makes it easier for the reviewer and community to follow how a decision was made and criticize it. * Nuance: When the it is clear that an impact is larger than typical with will prompt more care and thought. In some cases it may also clarify that a change is lower impact than expected on first sight. * Experience: Using a similar formalism for many changes makes it easier to learn from past decisions by providing an approach to compare and conceptualize them. We aim to follow these steps in the future for difficult decisions. In general, any reviewer and community member may ask for this process to be followed for a proposed change, if the change is difficult, it will be worth the effort. If it is very low impact it will be quick to clarify why. NOTE: At this time the process is new and is expected to require clarification. ExamplesIt should be stressed again, that the categories will rarely be clear and intentially are categorized with some uncertainty below. Even unclear categories can help in forming a more clear idea of a change. HistogramThe “histogram” example doesn’t really add much with respect to this process. But noting the duplicate effort/impact would move probably move it into a more severe category than most deprecations. That makes it a more difficult decision and indicates that careful thought should be spend on alternatives. Integer indexing requirement * Severity: Typical–Severe (although fairly easy, users often had to do many changes) * Likelihood: Ubiquitous How ubiquitous it really was became probably only clear after the (rc?) release. The change would now probably go through a NEP as it initially falls into the lower right part of the table. To get into the “acceptable” part of the table we note that: 1. Real bugs were caught in the processes (argued to reduce severity) 2. The deprecation was delayed and longer than normally (argued to mitigate the number of affected users by giving much more time) Even with these considerations, it still has a larger impact and clearly requires careful thought and community discussion about the benefits. Removing financial functions * Severity: Severe (on the high end) * Likelihood: Limited (maybe common) While not used by a large user base (limited), the removal is disurptive (severe). The change ultimately required a NEP, since it is not easy to weigh the maintainence advantage of removing the functions against the impact to their users. The NEP included the reduction of the severity by providing a work- around: A pip installable package as a drop-in replacement (reducing the severity). For heavy users of these functions this will still be more severe than most deprecations, but it lowered the impact assessment enough to consider the benefit of removal to outweigh the impact.
![](https://secure.gravatar.com/avatar/5f88830d19f9c83e2ddfd913496c5025.jpg?s=120&d=mm&r=g)
On Sat, Jan 2, 2021 at 3:55 AM Sebastian Berg <sebastian@sipsolutions.net> wrote:
Thanks for thinking about this Sebastian. I used to use such a risk management approach fairly regularly, and it can be useful. In general it's something you do for a larger design change or new product, rather than for an individual change. It helps get an overview of the main risks, and prompts thinking about risks you may have missed.
I'd be happy to try it. It does feel a bit too much to put all that content into the NEP though. Maybe we can just add a more brief "assess severity and likelihood and severity of your proposed change, and include that assessment when proposing a deprecation. See <here> for more details". And then we can link to a wiki page or separate doc page, that we can then easily update without it being a NEP revision. Cheers, Ralf
![](https://secure.gravatar.com/avatar/b4f6d4f8b501cb05fd054944a166a121.jpg?s=120&d=mm&r=g)
On Sat, 2021-01-02 at 18:06 +0100, Ralf Gommers wrote:
Yeah, I guess its for new products mostly, to compile many risks and make it easier to compare them. And yes, we do not have "many" risks, unless you would compile this for the complete changelog between one or more versions.
Yes, it adds a lot of content and I don't want to force it on anyone or into the NEP, in that sense it is more brainstorming than a very concrete proposal. And I am also fine with just dropping it, whatever others think is useful. I fleshed it out a bit, because I actually think it ends up representing fairly well how I currently try to approach this and I think it may be useful when a proposal gets stuck because it is unclear whether it is worth the pain/risk. Cheers, Sebastian
![](https://secure.gravatar.com/avatar/5f88830d19f9c83e2ddfd913496c5025.jpg?s=120&d=mm&r=g)
Hi all, The update PR was merged after a lot more review on GitHub. I propose we change the status of this NEP to Accepted. We'll merge a PR to do so unless there are objections within the next five days. Cheers, Ralf On Wed, Dec 30, 2020 at 3:05 PM Ralf Gommers <ralf.gommers@gmail.com> wrote:
![](https://secure.gravatar.com/avatar/d9ac9213ada4a807322f99081296784b.jpg?s=120&d=mm&r=g)
On Tue, Jan 26, 2021, at 00:25, Ralf Gommers wrote:
The update PR was merged after a lot more review on GitHub. I propose we change the status of this NEP to Accepted. We'll merge a PR to do so unless there are objections within the next five days.
Thanks for the heads-up, Ralf. I am happy to have the NEP accepted. Stéfan
participants (5)
-
Ilhan Polat
-
Ralf Gommers
-
Sebastian Berg
-
Stefan van der Walt
-
Stephan Hoyer