[Numpy-discussion] updated backwards compatibility and deprecation policy NEP
ralf.gommers at gmail.com
Sat Jan 2 12:06:16 EST 2021
On Sat, Jan 2, 2021 at 3:55 AM Sebastian Berg <sebastian at sipsolutions.net>
> On Wed, 2020-12-30 at 11:43 -0600, Sebastian Berg wrote:
> On Wed, 2020-12-30 at 16:27 +0100, Ralf Gommers wrote:
> That's very hard to describe, since it relies so much on previous
> experience and qualitative judgements. That's the main reason why I
> more examples before, but they just led to more discussion about
> examples - so that didn't quite have the intended effect.
> I only took a short course and used this very little. I am sure there
> are many here with industry experience where the use of Q&A is every
> day work.
Thanks for thinking about this Sebastian.
I used to use such a risk management approach fairly regularly, and it can
be useful. In general it's something you do for a larger design change or
new product, rather than for an individual change. It helps get an overview
of the main risks, and prompts thinking about risks you may have missed.
> One concept from there is to create a risk/danger and probability
> assessment, which can be ad-hoc for your product. An example just to
> make something up:
> I am not sure anyone finds this interesting or if fits to the NEP
> specifically , but I truly think it can be useful (although maybe it
> doesn't need to be formalized). So I fleshed it out:
> https://hackmd.io/WuS1rCzrTYOTgzUfRJUOnw (also pasted it below)
I'd be happy to try it. It does feel a bit too much to put all that content
into the NEP though. Maybe we can just add a more brief "assess severity
and likelihood and severity of your proposed change, and include that
assessment when proposing a deprecation. See <here> for more details". And
then we can link to a wiki page or separate doc page, that we can then
easily update without it being a NEP revision.
> My reasoning for suggesting it is that a process/formalism (no matter how
> ridiculous it may seem at first) for how to assess the impact of a backward
> compatible change can be helpful by: conceptualizing, clearly separating
> backward incompatible impact assessment from benefits assessment, making it
> easier to follow a decision/thought processes, and allowing some nuance .
> I actually believe that it can help with difficult decisions, even if only
> applied occasionally, and that it is not a burden because it provides
> fairly steps. Will it be useful often? Maybe not. But every time there is a
> proposal and we pause and hesitate because it is unclear whether it is
> worth the backcompat impact, I think this can provide a way to discuss it
> and come to a decision as objectively as possible. (And no, I do not think
> that any of the categories or mitigation strategies are an exact science.)
>  This is additional to the proposed promises such as a two releases of
> deprecations and discussing most/all deprecations on the mailing list,
> which are unrelated. It is rather to provide a formalism where currently
> only the examples give points of reference.
>  There is a reason that also the Python version is short and
> intentionally fuzzy: https://www.python.org/dev/peps/pep-0387/ and
> https://discuss.python.org/t/pep-387-backwards-compatibilty-policy/4421 There
> are just few definite rules that can be formalized, so a framework for
> diligent assessment seems the best we can do (if we want to).
> Assessing impact
> Here “impact” means how unmodified code may be negatively affected by a
> change ignoring any deprecation period.
> To get an idea about how much impact a change has, try to list all
> potential impacts. This will often be just a single item (user of function
> x has to replace it with y), but it could be multiple different ones.
> *After* listing all potential impacts rank them on the following two
> scales (do not yet think about how to make the transition easier):
> 1. *Severity* (How bad is the impact for an affected user?)
> - Minor: A performance regression or change in (undocumented)
> warning/error category will fall here. This type of change would normally
> not require a deprecation cycle or special consideration.
> - Typical: Code must be updated to avoid an error, the update is
> simple to do in a way that works both on existing and future NumPy versions.
> - Severe: Code will error or crash, and there is no simple work
> around or fix.
> - Critical: Code returns incorrect results. A change requiring
> massive effort may fall here. A hard crash (e.g. segfault) in itself is
> typically *not* critical.
> 2. *Likelihood* (How many users does the change affect?)
> - Rare: Change has very few impacted users (or even no known users
> after a code search). The normal assumption is that there is always someone
> affected, but a rarely used keyword argument of an already rarely used
> function will fall here.
> - Limited: Change is in a rarely used function or function
> argument. Another possibility is that it affects only a small group of very
> advanced users.
> - Common: Change affects a bigger audience or multiple large
> downstream libraries.
> - Ubiquitous: Change affects a large fraction of NumPy users.
> The categories will not always be perfectly clear. That is OK. Rather than
> establishing precise guidelines, the purpose is a structured *processes* that
> can be reviewed. When the impact is exceptionally difficult to assess, it
> is often feasible to try a change on the development branch while
> signalling willigness to revert it. Downstream libraries test against it
> (and the release candidate) which gives a chance to correct an originally
> optimistic assessment.
> After assessing each impact, it will fall somewhere on the following table:
> *Minor* ok ok ok?
> *Typical* ok? no?
> *Severe* no? no
> *Critical* no? no no no
> Note that all changes should normally follow the two release deprecation
> warning policy (except “minor” ones). The “no” fields means a change is
> clearly unacceptable, although a NEP can always overrule it. This table
> only assesses the “impact”. It does not assess how the impact compares to
> the benefits of the proposed change. This must be favourable no matter how
> small the impact is. However, by assessing the impact, it will be easier to
> weigh it against the benefit. (Note that the table is not symmetric. An
> impact with “critical” severity is unlikely to be considered even when no
> known users are impacted.)
> and arguing of benefits
> Any change falling outside the “ok” fields requires careful consideration.
> When an impact is larger, you can try to mitigate it and “move” on the
> table. Some possible reasons for this are:
> - A avoidable warning for at least two releases (the policy for any
> change that modifies behaviour) reduces a change one category (usually from
> “typical” to “minor” severity).
> - The severity category may be reduced by creating an easy work around
> (i.e. to move it from “sever” to “typical”).
> - Sometimes a change may break working code, but also fix *existing* bugs,
> this can offset the severity. In extreme cases, this may warrant
> classifying a change as a bug-fix.
> - For particularly noisy changes (i.e. ubiquitous category)
> considering fixing downstream packages, delay the warning (or use a
> PendingDeprecationWarning). Simply prolonging the the deprecation
> period is also an option. This reduces how many users struggle with the
> change and smoothens the transition.
> - Exceptionally clear documentation and communication could be used to
> ensure that the impact is more acceptable. This may not be enough to move a
> “category” by itself, but also helps.
> After mitigation, the benefits can be assessed:
> - Any benefit of the change can be argued to “offset” the impact. If
> this is necessary, a broad community discussion on the mailing list is
> required. It should be clear that this does not actually “mitigate” the
> impact but rather argues that the benefit outweighs it.
> These are not a fixed set of rules, but provide a framework to assess and
> then try to mitigate the impact of a proposed change to an acceptable
> level. Arguing that a benefit can overcome multiple “impact” categories
> will require exceptionally large benefits, and most likely a NEP. For
> example a change with an initial impact classification of “severe” and
> “ubiquitous” is unlikely to even be considered unless the severity can be
> Many deprecations will fall somewhere below or equal to a “typical and
> limited” impact (i.e. removal of an uncommon function argument). They
> recieve a deprecation warning to make the impact acceptable with a brief
> discussiong that the change itself is worthwhile (i.e. the API is much
> cleaner afterwards). Any more disruptive change requires broad community
> discussion. This needs at least a discussion on the NumPy mailing list and
> it is likely that the person proposing it will be asked to write a NEP.
> and reasoning for this processess
> The aim of this process and table is to provide a loose formalism with the
> goal of:
> - *Diligence:* Following this process ensures detailed assessment of
> its impact without being distracted by the benefits. This is achieved by
> following well defined steps:
> 1. Listing each potential impact (usually one).
> 2. Assessing the severity.
> 3. Assessing the likelihood.
> 4. Discussing what steps are/can be taken to lower the impact *ignoring
> any benefits*.
> 5. If the impact is not low at this point, this should prompt
> considering and listing of alternatives.
> 6. Argue that the benefits outweigh the remaining impact. (This is
> a distinct step: the original impact assessment stands as it was.)
> - *Transparency:* Using this process for difficult decisions makes it
> easier for the reviewer and community to follow how a decision was made and
> criticize it.
> - *Nuance:* When the it is clear that an impact is larger than typical
> with will prompt more care and thought. In some cases it may also clarify
> that a change is lower impact than expected on first sight.
> - *Experience:* Using a similar formalism for many changes makes it
> easier to learn from past decisions by providing an approach to compare and
> conceptualize them.
> We aim to follow these steps in the future for difficult decisions. In
> general, any reviewer and community member may ask for this process to be
> followed for a proposed change, if the change is difficult, it will be
> worth the effort. If it is very low impact it will be quick to clarify why.
> NOTE: At this time the process is new and is expected to require
> It should be stressed again, that the categories will rarely be clear and
> intentially are categorized with some uncertainty below. Even unclear
> categories can help in forming a more clear idea of a change.
> The “histogram” example doesn’t really add much with respect to this
> process. But noting the duplicate effort/impact would move probably move it
> into a more severe category than most deprecations. That makes it a more
> difficult decision and indicates that careful thought should be spend on
> indexing requirement
> - Severity: Typical–Severe (although fairly easy, users often had to
> do many changes)
> - Likelihood: Ubiquitous
> How ubiquitous it really was became probably only clear after the (rc?)
> release. The change would now probably go through a NEP as it initially
> falls into the lower right part of the table. To get into the
> “acceptable” part of the table we note that:
> 1. Real bugs were caught in the processes (argued to reduce severity)
> 2. The deprecation was delayed and longer than normally (argued to
> mitigate the number of affected users by giving much more time)
> Even with these considerations, it still has a larger impact and clearly
> requires careful thought and community discussion about the benefits.
> financial functions
> - Severity: Severe (on the high end)
> - Likelihood: Limited (maybe common)
> While not used by a large user base (limited), the removal is disurptive
> (severe). The change ultimately required a NEP, since it is not easy to
> weigh the maintainence advantage of removing the functions against the
> impact to their users.
> The NEP included the reduction of the severity by providing a work-around:
> A pip installable package as a drop-in replacement (reducing the severity).
> For heavy users of these functions this will still be more severe than most
> deprecations, but it lowered the impact assessment enough to consider the
> benefit of removal to outweigh the impact.
> NumPy-Discussion mailing list
> NumPy-Discussion at python.org
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the NumPy-Discussion