[Numpy-discussion] updated backwards compatibility and deprecation policy NEP

Sat Jan 2 14:10:54 EST 2021

On Sat, 2021-01-02 at 18:06 +0100, Ralf Gommers wrote:
> On Sat, Jan 2, 2021 at 3:55 AM Sebastian Berg <
> sebastian at sipsolutions.net>
> wrote:
> 
> > On Wed, 2020-12-30 at 11:43 -0600, Sebastian Berg wrote:
> > 
> > On Wed, 2020-12-30 at 16:27 +0100, Ralf Gommers wrote:
> > 
> > <snip>
> > 
> > 
> > That's very hard to describe, since it relies so much on previous
> > experience and qualitative judgements. That's the main reason why I
> > had
> > more examples before, but they just led to more discussion about
> > those
> > examples - so that didn't quite have the intended effect.
> > 
> > <snip>
> > 
> > I only took a short course and used this very little. I am sure
> > there
> > are many here with industry experience where the use of Q&A is
> > every
> > day work.
> > 
> > 
> Thanks for thinking about this Sebastian.
> 
> I used to use such a risk management approach fairly regularly, and
> it can
> be useful. In general it's something you do for a larger design
> change or
> new product, rather than for an individual change. It helps get an
> overview
> of the main risks, and prompts thinking about risks you may have
> missed.
> 

Yeah, I guess its for new products mostly, to compile many risks and
make it easier to compare them.  And yes, we do not have "many" risks,
unless you would compile this for the complete changelog between one or
more versions.

> 
> > 
> > One concept from there is to create a risk/danger and probability
> > assessment, which can be ad-hoc for your product.  An example just
> > to
> > make something up:
> > 
> > 
> > 
> > I am not sure anyone finds this interesting or if fits to the NEP
> > specifically [1], but I truly think it can be useful (although
> > maybe it
> > doesn't need to be formalized). So I fleshed it out:
> > https://hackmd.io/WuS1rCzrTYOTgzUfRJUOnw (also pasted it below)
> > 
> 
> I'd be happy to try it. It does feel a bit too much to put all that
> content
> into the NEP though. Maybe we can just add a more brief "assess
> severity
> and likelihood and severity of your proposed change, and include that
> assessment when proposing a deprecation. See <here> for more
> details". And
> then we can link to a wiki page or separate doc page, that we can
> then
> easily update without it being a NEP revision.

Yes, it adds a lot of content and I don't want to force it on anyone or
into the NEP, in that sense it is more brainstorming than a very
concrete proposal.  And I am also fine with just dropping it, whatever
others think is useful.  I fleshed it out a bit, because I actually
think it ends up representing fairly well how I currently try to
approach this and I think it may be useful when a proposal gets stuck
because it is unclear whether it is worth the pain/risk.

Cheers,

Sebastian

> 
> Cheers,
> Ralf
> 
> 
> > My reasoning for suggesting it is that a process/formalism (no
> > matter how
> > ridiculous it may seem at first) for how to assess the impact of a
> > backward
> > compatible change can be helpful by: conceptualizing, clearly
> > separating
> > backward incompatible impact assessment from benefits assessment,
> > making it
> > easier to follow a decision/thought processes, and allowing some
> > nuance [2].
> > 
> > I actually believe that it can help with difficult decisions, even
> > if only
> > applied occasionally, and that it is not a burden because it
> > provides
> > fairly steps. Will it be useful often? Maybe not. But every time
> > there is a
> > proposal and we pause and hesitate because it is unclear whether it
> > is
> > worth the backcompat impact, I think this can provide a way to
> > discuss it
> > and come to a decision as objectively as possible. (And no, I do
> > not think
> > that any of the categories or mitigation strategies are an exact
> > science.)
> > 
> > Cheers,
> > 
> > Sebastian
> > 
> > 
> > [1] This is additional to the proposed promises such as a two
> > releases of
> > deprecations and discussing most/all deprecations on the mailing
> > list,
> > which are unrelated. It is rather to provide a formalism where
> > currently
> > only the examples give points of reference.
> > [2] There is a reason that also the Python version is short and
> > intentionally fuzzy: https://www.python.org/dev/peps/pep-0387/ and
> >  
> > https://discuss.python.org/t/pep-387-backwards-compatibilty-policy/4421
> >  There
> > are just few definite rules that can be formalized, so a framework
> > for
> > diligent assessment seems the best we can do (if we want to).
> > 
> > 
> > 
> > 
> > 
> > Assessing impact
> > Here “impact” means how unmodified code may be negatively affected
> > by a
> > change ignoring any deprecation period.
> > 
> > To get an idea about how much impact a change has, try to list all
> > potential impacts. This will often be just a single item (user of
> > function
> >  x has to replace it with y), but it could be multiple different
> > ones.
> > *After* listing all potential impacts rank them on the following
> > two
> > scales (do not yet think about how to make the transition easier):
> > 
> >    1. *Severity* (How bad is the impact for an affected user?)
> >    - Minor: A performance regression or change in (undocumented)
> >       warning/error category will fall here. This type of change
> > would normally
> >       not require a deprecation cycle or special consideration.
> >       - Typical: Code must be updated to avoid an error, the update
> > is
> >       simple to do in a way that works both on existing and future
> > NumPy versions.
> >       - Severe: Code will error or crash, and there is no simple
> > work
> >       around or fix.
> >       - Critical: Code returns incorrect results. A change
> > requiring
> >       massive effort may fall here. A hard crash (e.g. segfault) in
> > itself is
> >       typically *not* critical.
> >    2. *Likelihood* (How many users does the change affect?)
> >    - Rare: Change has very few impacted users (or even no known
> > users
> >       after a code search). The normal assumption is that there is
> > always someone
> >       affected, but a rarely used keyword argument of an already
> > rarely used
> >       function will fall here.
> >       - Limited: Change is in a rarely used function or function
> >       argument. Another possibility is that it affects only a small
> > group of very
> >       advanced users.
> >       - Common: Change affects a bigger audience or multiple large
> >       downstream libraries.
> >       - Ubiquitous: Change affects a large fraction of NumPy users.
> > 
> > The categories will not always be perfectly clear. That is OK.
> > Rather than
> > establishing precise guidelines, the purpose is a structured
> > *processes* that
> > can be reviewed. When the impact is exceptionally difficult to
> > assess, it
> > is often feasible to try a change on the development branch while
> > signalling willigness to revert it. Downstream libraries test
> > against it
> > (and the release candidate) which gives a chance to correct an
> > originally
> > optimistic assessment.
> > 
> > After assessing each impact, it will fall somewhere on the
> > following table:
> > Severity\LikelyhoodRareLimitedCommonUbiquitous
> > *Minor* ok ok ok?
> > *Typical* ok? no?
> > *Severe* no? no
> > *Critical* no? no no no
> > Note that all changes should normally follow the two release
> > deprecation
> > warning policy (except “minor” ones). The “no” fields means a
> > change is
> > clearly unacceptable, although a NEP can always overrule it. This
> > table
> > only assesses the “impact”. It does not assess how the impact
> > compares to
> > the benefits of the proposed change. This must be favourable no
> > matter how
> > small the impact is. However, by assessing the impact, it will be
> > easier to
> > weigh it against the benefit. (Note that the table is not
> > symmetric. An
> > impact with “critical” severity is unlikely to be considered even
> > when no
> > known users are impacted.)
> > 
> > < 
> > https://hackmd.io/WuS1rCzrTYOTgzUfRJUOnw#Mitigation-and-arguing-of-benefits
> > >Mitigation
> > and arguing of benefits
> > Any change falling outside the “ok” fields requires careful
> > consideration.
> > When an impact is larger, you can try to mitigate it and “move” on
> > the
> > table. Some possible reasons for this are:
> > 
> >    - A avoidable warning for at least two releases (the policy for
> > any
> >    change that modifies behaviour) reduces a change one category
> > (usually from
> >    “typical” to “minor” severity).
> >    - The severity category may be reduced by creating an easy work
> > around
> >    (i.e. to move it from “sever” to “typical”).
> >    - Sometimes a change may break working code, but also fix
> > *existing* bugs,
> >    this can offset the severity. In extreme cases, this may warrant
> >    classifying a change as a bug-fix.
> >    - For particularly noisy changes (i.e. ubiquitous category)
> >    considering fixing downstream packages, delay the warning (or
> > use a
> >    PendingDeprecationWarning). Simply prolonging the the
> > deprecation
> >    period is also an option. This reduces how many users struggle
> > with the
> >    change and smoothens the transition.
> >    - Exceptionally clear documentation and communication could be
> > used to
> >    ensure that the impact is more acceptable. This may not be
> > enough to move a
> >    “category” by itself, but also helps.
> > 
> > After mitigation, the benefits can be assessed:
> > 
> >    - Any benefit of the change can be argued to “offset” the
> > impact. If
> >    this is necessary, a broad community discussion on the mailing
> > list is
> >    required. It should be clear that this does not actually
> > “mitigate” the
> >    impact but rather argues that the benefit outweighs it.
> > 
> > These are not a fixed set of rules, but provide a framework to
> > assess and
> > then try to mitigate the impact of a proposed change to an
> > acceptable
> > level. Arguing that a benefit can overcome multiple “impact”
> > categories
> > will require exceptionally large benefits, and most likely a NEP.
> > For
> > example a change with an initial impact classification of “severe”
> > and
> > “ubiquitous” is unlikely to even be considered unless the severity
> > can be
> > reduced.
> > Many deprecations will fall somewhere below or equal to a “typical
> > and
> > limited” impact (i.e. removal of an uncommon function argument).
> > They
> > recieve a deprecation warning to make the impact acceptable with a
> > brief
> > discussiong that the change itself is worthwhile (i.e. the API is
> > much
> > cleaner afterwards). Any more disruptive change requires broad
> > community
> > discussion. This needs at least a discussion on the NumPy mailing
> > list and
> > it is likely that the person proposing it will be asked to write a
> > NEP.
> > 
> > < 
> > https://hackmd.io/WuS1rCzrTYOTgzUfRJUOnw#Summary-and-reasoning-for-this-processess
> > >Summary
> > and reasoning for this processess
> > The aim of this process and table is to provide a loose formalism
> > with the
> > goal of:
> > 
> >    - *Diligence:* Following this process ensures detailed
> > assessment of
> >    its impact without being distracted by the benefits. This is
> > achieved by
> >    following well defined steps:
> >       1. Listing each potential impact (usually one).
> >       2. Assessing the severity.
> >       3. Assessing the likelihood.
> >       4. Discussing what steps are/can be taken to lower the impact
> > *ignoring
> >       any benefits*.
> >       5. If the impact is not low at this point, this should prompt
> >       considering and listing of alternatives.
> >       6. Argue that the benefits outweigh the remaining impact.
> > (This is
> >       a distinct step: the original impact assessment stands as it
> > was.)
> >    - *Transparency:* Using this process for difficult decisions
> > makes it
> >    easier for the reviewer and community to follow how a decision
> > was made and
> >    criticize it.
> >    - *Nuance:* When the it is clear that an impact is larger than
> > typical
> >    with will prompt more care and thought. In some cases it may
> > also clarify
> >    that a change is lower impact than expected on first sight.
> >    - *Experience:* Using a similar formalism for many changes makes
> > it
> >    easier to learn from past decisions by providing an approach to
> > compare and
> >    conceptualize them.
> > 
> > We aim to follow these steps in the future for difficult decisions.
> > In
> > general, any reviewer and community member may ask for this process
> > to be
> > followed for a proposed change, if the change is difficult, it will
> > be
> > worth the effort. If it is very low impact it will be quick to
> > clarify why.
> > NOTE: At this time the process is new and is expected to require
> > clarification.
> > <https://hackmd.io/WuS1rCzrTYOTgzUfRJUOnw#Examples>Examples
> > It should be stressed again, that the categories will rarely be
> > clear and
> > intentially are categorized with some uncertainty below. Even
> > unclear
> > categories can help in forming a more clear idea of a change.
> > <https://hackmd.io/WuS1rCzrTYOTgzUfRJUOnw#Histogram>Histogram
> > The “histogram” example doesn’t really add much with respect to
> > this
> > process. But noting the duplicate effort/impact would move probably
> > move it
> > into a more severe category than most deprecations. That makes it a
> > more
> > difficult decision and indicates that careful thought should be
> > spend on
> > alternatives.
> > < 
> > https://hackmd.io/WuS1rCzrTYOTgzUfRJUOnw#Integer-indexing-requirement
> > >Integer
> > indexing requirement
> > 
> >    - Severity: Typical–Severe (although fairly easy, users often
> > had to
> >    do many changes)
> >    - Likelihood: Ubiquitous
> > 
> > How ubiquitous it really was became probably only clear after the
> > (rc?)
> > release. The change would now probably go through a NEP as it
> > initially
> > falls into the lower right part of the table. To get into the
> > “acceptable” part of the table we note that:
> > 
> >    1. Real bugs were caught in the processes (argued to reduce
> > severity)
> >    2. The deprecation was delayed and longer than normally (argued
> > to
> >    mitigate the number of affected users by giving much more time)
> > 
> > Even with these considerations, it still has a larger impact and
> > clearly
> > requires careful thought and community discussion about the
> > benefits.
> > < 
> > https://hackmd.io/WuS1rCzrTYOTgzUfRJUOnw#Removing-financial-functions
> > >Removing
> > financial functions
> > 
> >    - Severity: Severe (on the high end)
> >    - Likelihood: Limited (maybe common)
> > 
> > While not used by a large user base (limited), the removal is
> > disurptive
> > (severe). The change ultimately required a NEP, since it is not
> > easy to
> > weigh the maintainence advantage of removing the functions against
> > the
> > impact to their users.
> > The NEP included the reduction of the severity by providing a work-
> > around:
> > A pip installable package as a drop-in replacement (reducing the
> > severity).
> > For heavy users of these functions this will still be more severe
> > than most
> > deprecations, but it lowered the impact assessment enough to
> > consider the
> > benefit of removal to outweigh the impact.
> > _______________________________________________
> > NumPy-Discussion mailing list
> > NumPy-Discussion at python.org
> > https://mail.python.org/mailman/listinfo/numpy-discussion
> > 
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at python.org
> https://mail.python.org/mailman/listinfo/numpy-discussion

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 833 bytes
Desc: This is a digitally signed message part
URL: <https://mail.python.org/pipermail/numpy-discussion/attachments/20210102/992f92e5/attachment.sig>