[Numpy-discussion] backwards compatibility and deprecation policy NEP

Ralf Gommers ralf.gommers at gmail.com
Sat Jul 21 21:05:28 EDT 2018


On Sat, Jul 21, 2018 at 5:46 PM, Hameer Abbasi <einstein.edison at gmail.com>
wrote:

> Hello,
>
> Very well written article! It takes a lot of important things into
> account. I think a number of things should be mentioned, if only in the
> alternatives:
>
>    - One major version number change, with lots of “major version change”
>    deprecations grouped into it, along with an LTS release.
>
> Good point, will add under alternatives. Note that we've tried that
before, or planned to do. It doesn't work well in practice; we also don't
really have the manpower to do all the changes we'd want in a single
release.

>
>    - The possibility of another major version change (possibly the same
>    one) where we re-write all portions that were agreed upon (via NEPs) to be
>    re-written, with a longer LTS release (3 years? 5?).
>       - I’m thinking this one could be similar to the Python 2 -> Python
>       3 transition. Note that this is different from having constant breakages,
>       this will be a mostly one-time effort and one-time breakage.
>
> The Python 2 to 3 analogy is a good reason for not doing this:)

>
>    - We break the ABI, but not most of the C API.
>
> Good catch, I didn't mention ABI at all. My opinion: breaking ABI will
still require a major version change, but the bar for it is now lower.
Basically what Travis was arguing for years ago, only today his argument is
actually true due to conda and binary wheels on the 3 major platforms.



>    - We port at least bug fixes and possibly oft-requested functionality
>       to the old version for a long time.
>       - But we fix all of the little things that are agreed upon by the
>       community to be “missing” or “wrong” in the current release. It may be a
>       while before this is adopted but it’ll be really beneficial in the long run.
>       - We ping the dev-discussions of most major downstream users
>       (SciPy, all the scikits, Matplotlib, etc.) for their “pain points” and also
>       if they think this is a good idea. This way, the amount of users included
>       aren’t just those on the NumPy mailing list.
>       - We enforce good practices in our code. For example, we will
>       explicitly disallow subclassing from ndarray, we get rid of scalars, we fix
>       the type system.
>
> This may sound radical (I myself think so), but consider that if we get
> rid of a large amount of technical debt on the onset, have a reputation for
> a clean code-base (rather than one that’s decades old), then we could
> onboard a lot more active developers and existing developers can also get a
> lot more work done. I may be getting ahead of myself on this, but feel free
> to leave your thoughts and opinions.
>

I think it sounds nice in theory, but given the history on large design
changes/decisions I don't believe we are able to get things right on a
first big rewrite. For example "fix the type system" - we all would like
something better, but in the 5+ years that we've talked about it, no one
has even put a complete design on paper. And for ones we did do like
__numpy_ufunc__ we definitely needed a few iterations. That points to
gradual evolution being a better model.

Cheers.
Ralf


> Best regards,
> Hameer Abbasi
> Sent from Astro <https://www.helloastro.com> for Mac
>
> On 22. Jul 2018 at 01:48, Ralf Gommers <ralf.gommers at gmail.com> wrote:
>
>
> Hi all,
>
> Here is a first draft of a NEP on backwards compatibility and deprecation
> policy. This I think mostly formalized what we've done for the last couple
> of years, however I'm sure opinions and wish lists will differ here.
>
> Pull request: https://github.com/numpy/numpy/pull/11596
>
> Rendered version: https://github.com/rgommers/
> numpy/blob/nep-backcompat/doc/neps/nep-0023-backwards-compatibility.rst
>
> Full text below (ducks).
>
> Cheers,
> Ralf
>
>
> =======================================================
> NEP 23 - Backwards compatibility and deprecation policy
> =======================================================
>
> :Author: Ralf Gommers <ralf.gommers at gmail.com>
> :Status: Draft
> :Type: Process
> :Created: 2018-07-14
> :Resolution: <url> (required for Accepted | Rejected | Withdrawn)
>
> Abstract
> --------
>
> In this NEP we describe NumPy's approach to backwards compatibility,
> its deprecation and removal policy, and the trade-offs and decision
> processes for individual cases where breaking backwards compatibility
> is considered.
>
>
> Detailed description
> --------------------
>
> NumPy has a very large user base.  Those users rely on NumPy being stable
> and the code they write that uses NumPy functionality to keep working.
> NumPy is also actively maintained and improved -- and sometimes
> improvements
> require, or are made much easier, by breaking backwards compatibility.
> Finally, there are trade-offs in stability for existing users vs. avoiding
> errors or having a better user experience for new users.  These competing
> needs often give rise to heated debates and delays in accepting or
> rejecting
> contributions.  This NEP tries to address that by providing a policy as
> well
> as examples and rationales for when it is or isn't a good idea to break
> backwards compatibility.
>
> General principles:
>
> - Aim not to break users' code unnecessarily.
> - Aim never to change code in ways that can result in users silently
> getting
>   incorrect results from their previously working code.
> - Backwards incompatible changes can be made, provided the benefits
> outweigh
>   the costs.
> - When assessing the costs, keep in mind that most users do not read the
> mailing
>   list, do not look at deprecation warnings, and sometimes wait more than
> one or
>   two years before upgrading from their old version.  And that NumPy has
>   many hundreds of thousands or even a couple of million users, so "no one
> will
>   do or use this" is very likely incorrect.
> - Benefits include improved functionality, usability and performance (in
> order
>   of importance), as well as lower maintenance cost and improved future
>   extensibility.
> - Bug fixes are exempt from the backwards compatibility policy.  However
> in case
>   of serious impact on users (e.g. a downstream library doesn't build
> anymore),
>   even bug fixes may have to be delayed for one or more releases.
> - The Python API and the C API will be treated in the same way.
>
>
> Examples
> ^^^^^^^^
>
> We now discuss a number of concrete examples to illustrate typical issues
> and trade-offs.
>
> **Changing the behavior of a function**
>
> ``np.histogram`` is probably the most infamous example.
> First, a new keyword ``new=False`` was introduced, this was then switched
> over to None one release later, and finally it was removed again.
> Also, it has a ``normed`` keyword that had behavior that could be
> considered
> either suboptimal or broken (depending on ones opinion on the statistics).
> A new keyword ``density`` was introduced to replace it; ``normed`` started
> giving
> ``DeprecationWarning`` only in v.1.15.0.  Evolution of ``histogram``::
>
>     def histogram(a, bins=10, range=None, normed=False):  # v1.0.0
>
>     def histogram(a, bins=10, range=None, normed=False, weights=None,
> new=False):  #v1.1.0
>
>     def histogram(a, bins=10, range=None, normed=False, weights=None,
> new=None):  #v1.2.0
>
>     def histogram(a, bins=10, range=None, normed=False, weights=None):
> #v1.5.0
>
>     def histogram(a, bins=10, range=None, normed=False, weights=None,
> density=None):  #v1.6.0
>
>     def histogram(a, bins=10, range=None, normed=None, weights=None,
> density=None):  #v1.15.0
>         # v1.15.0 was the first release where `normed` started emitting
>         # DeprecationWarnings
>
> The ``new`` keyword was planned from the start to be temporary; such a plan
> forces users to change their code more than once.  Such keywords (there
> have
> been other instances proposed, e.g. ``legacy_index`` in
> `NEP 21 <http://www.numpy.org/neps/nep-0021-advanced-indexing.html>`_)
> are not
> desired.  The right thing to have done here would probably have been to
> deprecate ``histogram`` and introduce a new function ``hist`` in its place.
>
> **Returning a view rather than a copy**
>
> The ``ndarray.diag`` method used to return a copy.  A view would be better
> for
> both performance and design consistency.  This change was warned about
> (``FutureWarning``) in v.8.0, and in v1.9.0 ``diag`` was changed to return
> a *read-only* view.  The planned change to a writeable view in v1.10.0 was
> postponed due to backwards compatibility concerns, and is still an open
> issue
> (gh-7661).
>
> What should have happened instead: nothing.  This change resulted in a lot
> of
> discussions and wasted effort, did not achieve its final goal, and was not
> that
> important in the first place.  Finishing the change to a *writeable* view
> in
> the future is not desired, because it will result in users silently getting
> different results if they upgraded multiple versions or simply missed the
> warnings.
>
> **Disallowing indexing with floats**
>
> Indexing an array with floats is asking for something ambiguous, and can
> be a
> sign of a bug in user code.  After some discussion, it was deemed a good
> idea
> to deprecate indexing with floats.  This was first tried for the v1.8.0
> release, however in pre-release testing it became clear that this would
> break
> many libraries that depend on NumPy.  Therefore it was reverted before
> release,
> to give those libraries time to fix their code first.  It was finally
> introduced for v1.11.0 and turned into a hard error for v1.12.0.
>
> This change was disruptive, however it did catch real bugs in e.g. SciPy
> and
> scikit-learn.  Overall the change was worth the cost, and introducing it in
> master first to allow testing, then removing it again before a release, is
> a
> useful strategy.
>
> Similar recent deprecations also look like good examples of
> cleanups/improvements:
>
> - removing deprecated boolean indexing (gh-8312)
> - deprecating truth testing on empty arrays (gh-9718)
> - deprecating ``np.sum(generator)`` (gh-10670, one issue with this one is
> that
>   its warning message is wrong - this should error in the future).
>
> **Removing the financial functions**
>
> The financial functions (e.g. ``np.pmt``) are badly named, are present in
> the
> main NumPy namespace, and don't really fit well with NumPy's scope.
> They were added in 2008 after
> `a discussion <https://mail.python.org/pipermail/numpy-discussion/
> 2008-April/032353.html>`_
> on the mailing list where opinion was divided (but a majority in favor).
> At the moment these functions don't cause a lot of overhead, however there
> are
> multiple issues and PRs a year for them which cost maintainer time to deal
> with.  And they clutter up the ``numpy`` namespace.  Discussion in 2013
> happened
> on removing them again (gh-2880).
>
> This case is borderline, but given that they're clearly out of scope,
> deprecation and removal out of at least the main ``numpy`` namespace can be
> proposed.  Alternatively, document clearly that new features for financial
> functions are unwanted, to keep the maintenance costs to a minimum.
>
> **Examples of features not added because of backwards compatibility**
>
> TODO: do we have good examples here? Possibly subclassing related?
>
>
> Removing complete submodules
> ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
>
> This year there have been suggestions to consider removing some or all of
> ``numpy.distutils``, ``numpy.f2py``, ``numpy.linalg``, and
> ``numpy.random``.
> The motivation was that all these cost maintenance effort, and that they
> slow
> down work on the core of Numpy (ndarrays, dtypes and ufuncs).
>
> The import on downstream libraries and users would be very large, and
> maintenance of these modules would still have to happen.  Therefore this is
> simply not a good idea; removing these submodules should not happen even
> for
> a new major version of NumPy.
>
>
> Subclassing of ndarray
> ^^^^^^^^^^^^^^^^^^^^^^
>
> Subclassing of ``ndarray`` is a pain point.  ``ndarray`` was not (or at
> least
> not well) designed to be subclassed.  Despite that, a lot of subclasses
> have
> been created even within the NumPy code base itself, and some of those
> (e.g.
> ``MaskedArray``, ``astropy.units.Quantity``) are quite popular.  The main
> problems with subclasses are:
>
> - They make it hard to change ``ndarray`` in ways that would otherwise be
>   backwards compatible.
> - Some of them change the behavior of ndarray methods, making it difficult
> to
>   write code that accepts array duck-types.
>
> Subclassing ``ndarray`` has been officially discouraged for a long time.
> Of
> the most important subclasses, ``np.matrix`` will be deprecated (see
> gh-10142)
> and ``MaskedArray`` will be kept in NumPy (`NEP 17
> <http://www.numpy.org/neps/nep-0017-split-out-maskedarray.html>`_).
> ``MaskedArray`` will ideally be rewritten in a way such that it uses only
> public NumPy APIs.  For subclasses outside of NumPy, more work is needed to
> provide alternatives (e.g. mixins, see gh-9016 and gh-10446) or better
> support
> for custom dtypes (see gh-2899).  Until that is done, subclasses need to be
> taken into account when making change to the NumPy code base.  A future
> change
> in NumPy to not support subclassing will certainly need a major version
> increase.
>
>
> Policy
> ------
>
> 1. Code changes that have the potential to silently change the results of
> a users'
>    code must never be made (except in the case of clear bugs).
> 2. Code changes that break users' code (i.e. the user will see a clear
> exception)
>    can be made, *provided the benefit is worth the cost* and suitable
> deprecation
>    warnings have been raised first.
> 3. Deprecation warnings are in all cases warnings that functionality will
> be removed.
>    If there is no intent to remove functionlity, then deprecation in
> documentation
>    only or other types of warnings shall be used.
> 4. Deprecations for stylistic reasons (e.g. consistency between functions)
> are
>    strongly discouraged.
>
> Deprecations:
>
> - shall include the version numbers of both when the functionality was
> deprecated
>   and when it will be removed (either two releases after the warning is
>   introduced, or in the next major version).
> - shall include information on alternatives to the deprecated
> functionality, or a
>   reason for the deprecation if no clear alternative is available.
> - shall use ``VisibleDeprecationWarning`` rather than
> ``DeprecationWarning``
>   for cases of relevance to end users (as opposed to cases only relevant to
>   libraries building on top of NumPy).
> - shall be listed in the release notes of the release where the
> deprecation happened.
>
> Removal of deprecated functionality:
>
> - shall be done after 2 releases (assuming a 6-monthly release cycle; if
> that changes,
>   there shall be at least 1 year between deprecation and removal), unless
> the
>   impact of the removal is such that a major version number increase is
>   warranted.
> - shall be listed in the release notes of the release where the removal
> happened.
>
> Versioning:
>
> - removal of deprecated code can be done in any minor (but not bugfix)
> release.
> - for heavily used functionality (e.g. removal of ``np.matrix``, of a
> whole submodule,
>   or significant changes to behavior for subclasses) the major version
> number shall
>   be increased.
>
> In concrete cases where this policy needs to be applied, decisions are
> made according
> to the `NumPy governance model
> <https://docs.scipy.org/doc/numpy/dev/governance/index.html>`_.
>
> Functionality with more strict policies:
>
> - ``numpy.random`` has its own backwards compatibility policy,
>   see `NEP 19 <http://www.numpy.org/neps/nep-0019-rng-policy.html>`_.
> - The file format for ``.npy`` and ``.npz`` files must not be changed in a
> backwards
>   incompatible way.
>
>
> Alternatives
> ------------
>
> **Being more agressive with deprecations.**
>
> The goal of being more agressive is to allow NumPy to move forward faster.
> This would avoid others inventing their own solutions (often in multiple
> places), as well as be a benefit to users without a legacy code base.  We
> reject this alternative because of the place NumPy has in the scientific
> Python
> ecosystem - being fairly conservative is required in order to not increase
> the
> extra maintenance for downstream libraries and end users to an unacceptable
> level.
>
> **Semantic versioning.**
>
> This would change the versioning scheme for code removals; those could then
> only be done when the major version number is increased.  Rationale for
> rejection: semantic versioning is relatively common in software
> engineering,
> however it is not at all common in the Python world.  Also, it would mean
> that
> NumPy's version number simply starts to increase faster, which would be
> more
> confusing than helpful. gh-10156 contains more discussion on this
> alternative.
>
>
> Discussion
> ----------
>
> TODO
>
> This section may just be a bullet list including links to any discussions
> regarding the NEP:
>
> - This includes links to mailing list threads or relevant GitHub issues.
>
>
> References and Footnotes
> ------------------------
>
> .. [1] TODO
>
>
> Copyright
> ---------
>
> This document has been placed in the public domain. [1]_
>
>
>
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at python.org
> https://mail.python.org/mailman/listinfo/numpy-discussion
>
>
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at python.org
> https://mail.python.org/mailman/listinfo/numpy-discussion
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20180721/80a86321/attachment-0001.html>


More information about the NumPy-Discussion mailing list