[Numpy-discussion] updated backwards compatibility and deprecation policy NEP

Sat Jan 2 12:06:16 EST 2021

On Sat, Jan 2, 2021 at 3:55 AM Sebastian Berg <sebastian at sipsolutions.net>
wrote:

> On Wed, 2020-12-30 at 11:43 -0600, Sebastian Berg wrote:
>
> On Wed, 2020-12-30 at 16:27 +0100, Ralf Gommers wrote:
>
> <snip>
>
>
> That's very hard to describe, since it relies so much on previous
> experience and qualitative judgements. That's the main reason why I
> had
> more examples before, but they just led to more discussion about
> those
> examples - so that didn't quite have the intended effect.
>
> <snip>
>
> I only took a short course and used this very little. I am sure there
> are many here with industry experience where the use of Q&A is every
> day work.
>
>
Thanks for thinking about this Sebastian.

I used to use such a risk management approach fairly regularly, and it can
be useful. In general it's something you do for a larger design change or
new product, rather than for an individual change. It helps get an overview
of the main risks, and prompts thinking about risks you may have missed.

>
> One concept from there is to create a risk/danger and probability
> assessment, which can be ad-hoc for your product.  An example just to
> make something up:
>
>
>
> I am not sure anyone finds this interesting or if fits to the NEP
> specifically [1], but I truly think it can be useful (although maybe it
> doesn't need to be formalized). So I fleshed it out:
> https://hackmd.io/WuS1rCzrTYOTgzUfRJUOnw (also pasted it below)
>

I'd be happy to try it. It does feel a bit too much to put all that content
into the NEP though. Maybe we can just add a more brief "assess severity
and likelihood and severity of your proposed change, and include that
assessment when proposing a deprecation. See <here> for more details". And
then we can link to a wiki page or separate doc page, that we can then
easily update without it being a NEP revision.

Cheers,
Ralf

> My reasoning for suggesting it is that a process/formalism (no matter how
> ridiculous it may seem at first) for how to assess the impact of a backward
> compatible change can be helpful by: conceptualizing, clearly separating
> backward incompatible impact assessment from benefits assessment, making it
> easier to follow a decision/thought processes, and allowing some nuance [2].
>
> I actually believe that it can help with difficult decisions, even if only
> applied occasionally, and that it is not a burden because it provides
> fairly steps. Will it be useful often? Maybe not. But every time there is a
> proposal and we pause and hesitate because it is unclear whether it is
> worth the backcompat impact, I think this can provide a way to discuss it
> and come to a decision as objectively as possible. (And no, I do not think
> that any of the categories or mitigation strategies are an exact science.)
>
> Cheers,
>
> Sebastian
>
>
> [1] This is additional to the proposed promises such as a two releases of
> deprecations and discussing most/all deprecations on the mailing list,
> which are unrelated. It is rather to provide a formalism where currently
> only the examples give points of reference.
> [2] There is a reason that also the Python version is short and
> intentionally fuzzy: https://www.python.org/dev/peps/pep-0387/ and
> https://discuss.python.org/t/pep-387-backwards-compatibilty-policy/4421 There
> are just few definite rules that can be formalized, so a framework for
> diligent assessment seems the best we can do (if we want to).
>
>
>
>
>
> Assessing impact
> Here “impact” means how unmodified code may be negatively affected by a
> change ignoring any deprecation period.
>
> To get an idea about how much impact a change has, try to list all
> potential impacts. This will often be just a single item (user of function
>  x has to replace it with y), but it could be multiple different ones.
> *After* listing all potential impacts rank them on the following two
> scales (do not yet think about how to make the transition easier):
>
>    1. *Severity* (How bad is the impact for an affected user?)
>    - Minor: A performance regression or change in (undocumented)
>       warning/error category will fall here. This type of change would normally
>       not require a deprecation cycle or special consideration.
>       - Typical: Code must be updated to avoid an error, the update is
>       simple to do in a way that works both on existing and future NumPy versions.
>       - Severe: Code will error or crash, and there is no simple work
>       around or fix.
>       - Critical: Code returns incorrect results. A change requiring
>       massive effort may fall here. A hard crash (e.g. segfault) in itself is
>       typically *not* critical.
>    2. *Likelihood* (How many users does the change affect?)
>    - Rare: Change has very few impacted users (or even no known users
>       after a code search). The normal assumption is that there is always someone
>       affected, but a rarely used keyword argument of an already rarely used
>       function will fall here.
>       - Limited: Change is in a rarely used function or function
>       argument. Another possibility is that it affects only a small group of very
>       advanced users.
>       - Common: Change affects a bigger audience or multiple large
>       downstream libraries.
>       - Ubiquitous: Change affects a large fraction of NumPy users.
>
> The categories will not always be perfectly clear. That is OK. Rather than
> establishing precise guidelines, the purpose is a structured *processes* that
> can be reviewed. When the impact is exceptionally difficult to assess, it
> is often feasible to try a change on the development branch while
> signalling willigness to revert it. Downstream libraries test against it
> (and the release candidate) which gives a chance to correct an originally
> optimistic assessment.
>
> After assessing each impact, it will fall somewhere on the following table:
> Severity\LikelyhoodRareLimitedCommonUbiquitous
> *Minor* ok ok ok?
> *Typical* ok? no?
> *Severe* no? no
> *Critical* no? no no no
> Note that all changes should normally follow the two release deprecation
> warning policy (except “minor” ones). The “no” fields means a change is
> clearly unacceptable, although a NEP can always overrule it. This table
> only assesses the “impact”. It does not assess how the impact compares to
> the benefits of the proposed change. This must be favourable no matter how
> small the impact is. However, by assessing the impact, it will be easier to
> weigh it against the benefit. (Note that the table is not symmetric. An
> impact with “critical” severity is unlikely to be considered even when no
> known users are impacted.)
>
> <https://hackmd.io/WuS1rCzrTYOTgzUfRJUOnw#Mitigation-and-arguing-of-benefits>Mitigation
> and arguing of benefits
> Any change falling outside the “ok” fields requires careful consideration.
> When an impact is larger, you can try to mitigate it and “move” on the
> table. Some possible reasons for this are:
>
>    - A avoidable warning for at least two releases (the policy for any
>    change that modifies behaviour) reduces a change one category (usually from
>    “typical” to “minor” severity).
>    - The severity category may be reduced by creating an easy work around
>    (i.e. to move it from “sever” to “typical”).
>    - Sometimes a change may break working code, but also fix *existing* bugs,
>    this can offset the severity. In extreme cases, this may warrant
>    classifying a change as a bug-fix.
>    - For particularly noisy changes (i.e. ubiquitous category)
>    considering fixing downstream packages, delay the warning (or use a
>    PendingDeprecationWarning). Simply prolonging the the deprecation
>    period is also an option. This reduces how many users struggle with the
>    change and smoothens the transition.
>    - Exceptionally clear documentation and communication could be used to
>    ensure that the impact is more acceptable. This may not be enough to move a
>    “category” by itself, but also helps.
>
> After mitigation, the benefits can be assessed:
>
>    - Any benefit of the change can be argued to “offset” the impact. If
>    this is necessary, a broad community discussion on the mailing list is
>    required. It should be clear that this does not actually “mitigate” the
>    impact but rather argues that the benefit outweighs it.
>
> These are not a fixed set of rules, but provide a framework to assess and
> then try to mitigate the impact of a proposed change to an acceptable
> level. Arguing that a benefit can overcome multiple “impact” categories
> will require exceptionally large benefits, and most likely a NEP. For
> example a change with an initial impact classification of “severe” and
> “ubiquitous” is unlikely to even be considered unless the severity can be
> reduced.
> Many deprecations will fall somewhere below or equal to a “typical and
> limited” impact (i.e. removal of an uncommon function argument). They
> recieve a deprecation warning to make the impact acceptable with a brief
> discussiong that the change itself is worthwhile (i.e. the API is much
> cleaner afterwards). Any more disruptive change requires broad community
> discussion. This needs at least a discussion on the NumPy mailing list and
> it is likely that the person proposing it will be asked to write a NEP.
>
> <https://hackmd.io/WuS1rCzrTYOTgzUfRJUOnw#Summary-and-reasoning-for-this-processess>Summary
> and reasoning for this processess
> The aim of this process and table is to provide a loose formalism with the
> goal of:
>
>    - *Diligence:* Following this process ensures detailed assessment of
>    its impact without being distracted by the benefits. This is achieved by
>    following well defined steps:
>       1. Listing each potential impact (usually one).
>       2. Assessing the severity.
>       3. Assessing the likelihood.
>       4. Discussing what steps are/can be taken to lower the impact *ignoring
>       any benefits*.
>       5. If the impact is not low at this point, this should prompt
>       considering and listing of alternatives.
>       6. Argue that the benefits outweigh the remaining impact. (This is
>       a distinct step: the original impact assessment stands as it was.)
>    - *Transparency:* Using this process for difficult decisions makes it
>    easier for the reviewer and community to follow how a decision was made and
>    criticize it.
>    - *Nuance:* When the it is clear that an impact is larger than typical
>    with will prompt more care and thought. In some cases it may also clarify
>    that a change is lower impact than expected on first sight.
>    - *Experience:* Using a similar formalism for many changes makes it
>    easier to learn from past decisions by providing an approach to compare and
>    conceptualize them.
>
> We aim to follow these steps in the future for difficult decisions. In
> general, any reviewer and community member may ask for this process to be
> followed for a proposed change, if the change is difficult, it will be
> worth the effort. If it is very low impact it will be quick to clarify why.
> NOTE: At this time the process is new and is expected to require
> clarification.
> <https://hackmd.io/WuS1rCzrTYOTgzUfRJUOnw#Examples>Examples
> It should be stressed again, that the categories will rarely be clear and
> intentially are categorized with some uncertainty below. Even unclear
> categories can help in forming a more clear idea of a change.
> <https://hackmd.io/WuS1rCzrTYOTgzUfRJUOnw#Histogram>Histogram
> The “histogram” example doesn’t really add much with respect to this
> process. But noting the duplicate effort/impact would move probably move it
> into a more severe category than most deprecations. That makes it a more
> difficult decision and indicates that careful thought should be spend on
> alternatives.
> <https://hackmd.io/WuS1rCzrTYOTgzUfRJUOnw#Integer-indexing-requirement>Integer
> indexing requirement
>
>    - Severity: Typical–Severe (although fairly easy, users often had to
>    do many changes)
>    - Likelihood: Ubiquitous
>
> How ubiquitous it really was became probably only clear after the (rc?)
> release. The change would now probably go through a NEP as it initially
> falls into the lower right part of the table. To get into the
> “acceptable” part of the table we note that:
>
>    1. Real bugs were caught in the processes (argued to reduce severity)
>    2. The deprecation was delayed and longer than normally (argued to
>    mitigate the number of affected users by giving much more time)
>
> Even with these considerations, it still has a larger impact and clearly
> requires careful thought and community discussion about the benefits.
> <https://hackmd.io/WuS1rCzrTYOTgzUfRJUOnw#Removing-financial-functions>Removing
> financial functions
>
>    - Severity: Severe (on the high end)
>    - Likelihood: Limited (maybe common)
>
> While not used by a large user base (limited), the removal is disurptive
> (severe). The change ultimately required a NEP, since it is not easy to
> weigh the maintainence advantage of removing the functions against the
> impact to their users.
> The NEP included the reduction of the severity by providing a work-around:
> A pip installable package as a drop-in replacement (reducing the severity).
> For heavy users of these functions this will still be more severe than most
> deprecations, but it lowered the impact assessment enough to consider the
> benefit of removal to outweigh the impact.
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at python.org
> https://mail.python.org/mailman/listinfo/numpy-discussion
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mail.python.org/pipermail/numpy-discussion/attachments/20210102/bdd161ce/attachment-0001.html>