preparing downstream code for NumPy 2.0 & API deprecations/removals
Hi all, This email is about two 2.0 release related topics: 1. Advice/guidance for downstream library authors and end users 2. Strategy for development work around public API changes that will be breaking backwards compatibility. I also just created https://github.com/numpy/numpy/issues/24300 as a tracking issue that we can post announcements on and anyone can subscribe to. I imagine many folks will want some way to follow along and be notified of important changes around the release, but not subscribe to this mailing list. Some of the content of that issue overlaps with this email. In case of questions/comments about the content of that issue, let's discuss it here. Please keep that tracking issue for announcements, not for technical discussion. **Advice or downstream package authors and end users** 1. If you rely on the NumPy C API (e.g. via direct use in C/C++, or via Cython code that uses NumPy), please add a `numpy<2.0` requirement in your package's dependency metadata. Rationale: the NumPy C ABI will change in 2.0, so any compiled extension modules that rely on NumPy are likely to break, they need to be recompiled. 2. If you rely on a large API surface from NumPy's Python API, also consider adding the same ` numpy<2.0` requirement to your metadata. Rationale: we will do a significant cleanup (see NEP 52), so unless you only use modern/recommended functions and objects, your code is likely to require at least some adjustments. 3. Consider cleaning up your code. E.g. remove `from numpy import *`, or importing any private modules like `numpy.core`. See https://github.com/numpy/numpy/blob/main/numpy/tests/test_public_api.py#L114... for what we consider public/private. If it's not in the NumPy docs or in the list of public modules there, don't use it! 4. Plan to do a release of your own packages which depend on `numpy` shortly after the first NumPy 2.0 release candidate is released (probably in Dec 2023). Rationale: at that point, you can release packages that will work with both 2.0 and 1.X, and hence your own end users will not be seeing much/any disruption (you want `pip install mypacackage` to continue working on the day NumPy 2.0 is released). 5. Consider testing against NumPy nightlies in your own CI. We publish those at https://anaconda.org/scientific-python-nightly-wheels/numpy, and have documented that as a stable location at https://numpy.org/devdocs/dev/depending_on_numpy.html. Rationale: this will detect potential issues in your code so you can fix them well ahead of the NumPy 2.0 release. **Strategy for public API changes for 2.0** Based on experience over the past weeks with adding deprecations and making breaking changes, I think it'd be good to articulate a strategy for Python/C API changes. We are not yet collectively used to the change of pace that the run-up to a major release is. I think we want to use and balance these two principles: 1. Make the API and behavior changes that we want to see for 2.0, in a way that doesn't incur unreasonable amounts of effort or get completely blocked by backwards compatibility constraints which we'd apply for a regular minor release. 2. Mitigate the inevitable disturbances for downstream projects and end users as best as we can. To start with (2), issuing good guidance (like in the section above and the tracking issue gh-24300) is one way to help. Another one is to, when you are in the process of making a change, use code search tools and either code that will break downstream proactively or at least notify downstream project authors. The former can be done by sending PRs to at least the largest projects (SciPy, Pandas, scikit-learn, scikit-image, Matplotlib) when there are easy changes to make. And otherwise by filing issues on the issue tracker of other projects. Yet another way, in case of mechanical changes like removal of aliases, would be to provide a sed script that others can run to automatically update their code as much as possible. For (1), I think it's important to understand that 2.0 is a one-off change of pace where our regular backwards compatibility policy does not apply. If particular functions are desirable to touch but widely used, it may be wise to leave them in or deprecate them of course - this is a case by case decision. However, it doesn't have to be done like that. Every single object in the NumPy API is used somewhere, and removing it is going to affect some users/packages. This is inevitable, and we can't achieve our 2.0 goals if every little niche API change is going to require following our regular backwards compatibility strategy. The only thing that would do is to make 2.2 the breaking release. Many downstream libraries have CI setups that turn deprecation warnings into hard errors, hence often it doesn't matter whether we deprecate something in `main`, or remove it straight away. Apply good judgement here I'd say (how widely is something used, is the replacement trivial or does it require some thought, etc.). Here are four recent examples where we had downstream breakage and either did extra work to revert a change or discussed doing so: - Adding np.min, np.max and np.round to the `__all__` dict: this already happened in 1.25.0, and we reverted it after discussion in https://github.com/numpy/numpy/pull/24234 (but left it in the `main` branch for 2.0). - Removing `np.cast` in `main`: this broke SciPy and as a result also JAX/MNE-Python/AstroPy CI. We left it in, but discussed in https://github.com/numpy/numpy/pull/24144 whether or not to revert the removal. - Moving PyArray_MIN/PyArray_MAX to a different header file: this broke SciPy's CI. We left it as is and fixed up the issue in SciPy in https://github.com/numpy/numpy/pull/24234 - Removal of `np.byte_bounds`: we removed it from the main namespace in https://github.com/numpy/numpy/pull/23830, and after discussion on that issue are bringing it back for now (first in its old location in https://github.com/numpy/numpy/pull/24154, and then moving it to a new namespace under `np.lib` once we sort that out). This is going to happen a lot more I'm sure. We have to be careful about what we do, but also we need downstream library authors to be proactive, audit their code and use the nightlies we publish. For very niche APIs like `np.cast` we should expect to not have to justify removal of them in detail or entertain reverting changes made in preparation of 2.0. Final thought: we should get a 1.26.0b1 beta release out shortly, but will likely have a couple of weeks before 1.26.0rc1. So any deprecations we want to put into 1.26.0 can go in until the RC1 release. Cheers, Ralf
participants (1)
-
Ralf Gommers