On Wed, Jan 11, 2023 at 1:59 PM Sebastian Berg <sebastian@sipsolutions.net> wrote:
Hi all,

as brought up many times, I would like to aim for a NumPy 2.0.  The current hope would be to release within the year and start adding small breaking changes soon, but hidden behind feature flags.  Similar to what is already the case for NEP 50 with `export NPY_PROMOTION_STATE=weak`.

Below, is a draft version for a NEP, I have also created the corresponding project board on github.
Clearly, especially specific changes will need more discussion, but there are some clearer bigger ones as well as small changes that are breaking but should be easy to adapt for.

Thanks to Inessa and Ralf who helped draft and revise this!

Thanks for drafting this proposal and leading this effort Sebastian!

It seems like no one wants to be the first to reply here, so I'll try to get us started:) My opinion has always been that NumPy 2.0 should be a "major" thing, and either reserved for a needed ABI break or if we'd have other compelling features or needs. It looks to me like we have now reached that point. In particular, Sebastian as the main developer of new dtype and ufunc internals features, seems to have reached the point where the need for backwards compatibility in the C API is imposing too much of a burden. Making that work easier is enough of a reason for me to be +1 on a NumPy 2.0. After so many years, saying that it's fine to have a breaking release to clean things up is very likely a good thing long term.

With that need established, other important improvements that are already in the pipeline and best done in a 2.0 release, like enabling NEP 50 and Python API improvements, make the overall picture a compelling one.

I also like the proposed logistics: any major change needs to land on a roadmap for 2.0, and for that it needs to have two champions who commit to getting it done. Not breaking our regular 6-monthly releases schedules looks like a good plan. Having a feature flag for the 1.25.0 release (June) and then making breaking changes the default in the July-December period seems very reasonable.

Cheers,
Ralf

 


Road to NumPy 2.0

Note: This is a living document. We are prepared to modify it through continued dialogue with the community. Its acceptance indicates consensus on the process and timelines.

Abstract

NumPy 2.0 release is an opportunity to make some complex changes for which a normal deprecation wouldn’t be viable as the user impact may be larger than is normally considered acceptable for a minor release. Yet, NumPy 2.0 is not meant to be a large breaking release. Most users should not need to worry about introduced changes.
This document contains essential information about the work on NumPy 2.0 release.

Motivation and impact

NumPy 2.0 release is required for fixing old bugs and modernizing NumPy’s code base. It is not planned to be a “break the world release”. This means:
  • It must be possible to compile downstream packages to be compatible with both new and old NumPy versions. However, the C-API is expected to be broken. The path to achieve this compatibility will be defined as a high priority project.
  • The majority of users should not require code updates or such updates should be very easy to do. Expert users are likely to notice changes though.
  • We accept that some NumPy users may not able to adopt NumPy 2.0 immediately or may have to wait until following releases for adoption.
One should keep in mind that even bug fixes can break the code of a small number of users.

Timeline

NumPy 2.0 will be scheduled for release in Jan 2024. Projects and changes should be proposed as soon as possible. We propose a NumPy team meeting around April 2023 (details to be discussed) in order to finalize high-impact projects and review all candidate projects.
Projects not proposed by this time may not be prioritized for a final 2.0 release.
Changes which can be implemented using a feature-flag are strongly encouraged as it simplifies keeping projects moving.

Project selection process

To determine the scope of work for NumPy 2.0 release, we suggest introducing three categories of projects/proposals:
  1. high: proposal requires high visibility or may be critical for the NumPy 2.0 release,
  2. normal,
  3. candidate: changes which are in an early planning stage.
High priority proposals will be listed explicitly in this NEP.
A project board will track all projects proposed for NumPy 2.0, distinguishing the category and progress.

Proposing a project for NumPy 2.0 release

To start a project, there is one important thing: Believe that your change makes NumPy better and commit to trying to make it happen.
To have a proposal listed on the NumPy 2.0 project board, we require the following:
  • At least two champions for each proposal, one of whom must be a NumPy core developer or similar to one in standing.
  • A brief assessment of the anticipated impact on downstream and end-users. This means assessing how many users/what groups of users are affected and in what way.
  • Support by the NumPy community or Steering Council (ideally both). Positive feedback to your proposal on the NumPy mailing list is a strong indicator of the community support.
If any of the above requirements are not met, proposals will be listed as “candidate”. NumPy maintainers will review “candidate” projects on a case by case basis.
We suggest including a brief header in every proposal (issue or PR):
* **Champions**:
* **Severity**: How does it affect users?
* **Affects**: Who/how many users does it affect?
Any further details or adjustments shall be added on request. Large changes may require their own NEP when requested by a maintainer.
As a suggestion, “affects” could be roughly guided by the number of users: rare, limited, common, and ubiquitous. While “severity” could be minor, typical (code update needed), severe (e.g. large change/difficult to find), critical (incorrect results or no clear path for fixing things). The two together can then be used as a basis for decision making and discussion.

Scope of work

High priority projects

The projects in this section are considered high impact from the compatibility point of view.
Unless otherwise noted, these are currently proposals, most of these changes have their own NEPs which should be accepted.

Enable breaking the C-API

NumPy needs to define a process for breaking C-API. This project does not define what is broke, this is done separately on a case-by-case basis.
We simply assume that sufficient changes will be done to make this worthwhile.
  • Status: Planning
  • Champion: Matti Picus (?), Sebastian Berg (?)
  • Severity: Severe (for maintainers without a plan), typical for users
  • Affects: Library maintainers, some users
  • Notes:
    • Many users may have issues if pip installing a very new NumPy version without updating other libraries. We assume that this isn’t a common scenario and will mostly result in clear errors.
    • All libraries will have to be recompiled. The transition plan will ensure that libraries adhering to best practices will have an easy transition.
Note: A full plan is still outstanding and may require its own NEP.

Adopt NEP 50

Adopting NEP 50 changes the promotion behavior of NumPy scalars by removing any value-based casting. Details for this change are discussed in :ref:NEP50.
  • Status: Largely implemented, but open for discussions and open questions to be addressed.
  • Champion: Sebastian Berg, …
  • Severity: High in rare cases, some results can change or memory can bloat.
  • Affects: Many users, but hopefully not most as one needs to use smaller than default precision types to be affected.

A thorough cleanup of the Python API

The NumPy API is quite messy, with many functions and aliases that are not recommended for use, namespaces that are private but missing underscores, inconsistencies in argument names, and more. Changes will include removing aliases and outdated functionality (including many things that have been doc-deprecated already), making namespaces private, and making function signatures more consistent.
  • Status: Needs a separate NEP, and deprecations in 1.25.0 for what can be deprecated in a sensible way.
  • Champion: Ralf Gommers, Stefan van der Walt, …
  • Severity: Medium. It is expected that a lot of projects and users will see some breakage, but also that code changes to more idiomatic usage will be straightforward and compatible with both numpy 1.X and 2.0
  • Affects: Many users and downstream projects

Add array API standard support to the main namespace

The main reason NEP 47 aimed for a separate numpy.array_api submodule rather than the main namespace is that casting rules differed too much. With NEP 50 (see above), that will be resolved in NumPy 2.0. Having NumPy be a superset of the array API standard will be a significant improvement for code portability to other libraries (CuPy, JAX, PyTorch, etc.) and thereby address one of the top user requests from the 2020 NumPy user survey (GPU support). See the numpy.array_api API docs for an overview of differences between it and the main namespace (the “strictness” ones are not applicable).
  • Status: separate NEP to be written.
  • Champion: Aaron Meurer, Ralf Gommers
  • Severity: Medium. Most impact of breaking changes is likely concentrated in a few widely used APIs (e.g., change semantics of copy=False keyword to actually mean “don’t copy” rather than “copy if needed”)
  • Affects: most users and downstream projects

Other projects

See the project board.
_______________________________________________
NumPy-Discussion mailing list -- numpy-discussion@python.org
To unsubscribe send an email to numpy-discussion-leave@python.org
https://mail.python.org/mailman3/lists/numpy-discussion.python.org/
Member address: ralf.gommers@googlemail.com