NOTE: Python 3.9.3 contains an unintentional ABI incompatibility leading to crashes on 32-bit systems

The memory layout of PyThreadState was unintentionally changed in the recent 3.9.3 bugfix release. This leads to crashes on 32-bit systems when importing binary extensions compiled for Python 3.9.0 - 3.9.2. This is a regression. We will be releasing a hotfix 3.9.4 around 24 hours from now to address this issue and restore ABI compatibility with C extensions built for Python 3.9.0 - 3.9.2. Details: https://bugs.python.org/issue43710 - Ł

On 4/3/2021 7:15 PM, Miro Hrončok wrote:
Unless the mistake was just introduced, the mistake would have happened. One this severe would likely have been caught within the week or two before a final. But as Łukasz noted when announcing the change, .rcs are generally ignored. (I suspect that most everyone assumes that someone else will test them. And begging people to not do that does not work well enough to justify the release.) 3.8.5 (2020 July 20 was hotfix for 3.8.4 (2020 July 14, which did have a candidate, which did not get tested the way that 3.8.4 itself was. -- Terry Jan Reedy

On Sat, Apr 3, 2021 at 7:49 PM Terry Reedy <tjreedy@udel.edu> wrote:
For 3.9.4 I suggest a strict revert of the offending change. I created such a PR and attached it to the bpo-43710 issue. It is a holiday weekend for a large swath of the world. The recursion based crasher issue the original change was fixing can be saved for a future release and not made under time pressure. I filed https://bugs.python.org/issue43725 to track one suggested way to help automate prevention of these from landing in a release branch and slipping through the cracks in a release. (discuss that on the issue, not here) -Greg

On 4/4/21 4:44 AM, Terry Reedy wrote:
Well, that's not true. I think for at least the past 3.8 and current 3.9 cycle I always tested the release candidates, and built them for various Linux architectures. And I'm filing issues marked as 'released-blocker' when I see regressions introduced, it's up to the release managers to determine if such changes are intended or not. Looking at the failing CI tests triggered by these builds, yes I see that 32bit archs have the ABI change. So maybe it's worth to re-introduce these RC builds, or at least provide source RC tarballs which can be tested. If it's a matter of resources, maybe fall back to quarterly subminor releases again (at least for the most current X.Y releases). Matthias

Matthias Klose writes:
Looking at the failing CI tests triggered by these builds, yes I see that 32bit archs have the ABI change.
I'm not sure precisely what you mean by that, but if you mean that CI has caught the bug, then
So maybe it's worth to re-introduce these RC builds,
seems to be just makework. It makes more sense to delay the schedule somewhat, but only so that the release engineer, or one among the many eyes, catch the CI's warning. Steve

On 4/4/21 2:49 PM, Stephen J. Turnbull wrote:
No, you can't see that with CPython's CI alone. The Debian and Ubuntu build machines trigger CI tests per architecture for around 3000 packages depending on python3.9, using the just built python3.9, and without rebuilding these packages. That's where I see the 32bit failures. How would delaying the release schedule have helped with the issue that we just saw? Matthias

Matthias Klose writes:
Thank you, that's what I wanted to know.
How would delaying the release schedule have helped with the issue that we just saw?
I mean to have a period between the announcement of the release date, and the actual release. So it works the same way an rc would, except not rolling the tarball etc. For almost zero maintainer effort, tag it "rc", give you a few days to run your tests on builds from git. If you (and others) don't report a problem, he tags "final" and then produces tarballs, installers, etc to the extent those things are done at this point in the version's lifecycle. Distros could either drop the "rc" in hopes that this commit will indeed be final (and reroll their release if bugs are found), or they could use the sha1 as identifier for the specific commit instead of stuff like "rc" in versioning the package. (I'm brainstorming, this is *not* a thought-out recommendation!) See also Terry Reedy's post for a different explanation of what I believe is the same idea. Steve

On 4/4/2021 9:57 AM, Łukasz Langa wrote:
On 4 Apr 2021, at 11:34, Matthias Klose <doko@ubuntu.com <mailto:doko@ubuntu.com>> wrote:
Can you show us an example of a release-blocker issue you reported?
Searching creator: doko; priority: release blocker returns 19 issues, from 3 to 20 years ago. That about 1/years but none for 3.8 or 3.9, for which Łukasz is RM. 32232 36 months ago building extensions as builtins is broken in 3.7 has patch has PR closed 32233 40 months ago [3.7 Regression] build with --with-system-libmpdec is broken has patch has PR closed 31016 43 months ago [Regression] sphinx shows an EOF error when using python2.7 from the trunk closed 23968 55 months ago rename the platform directory from plat-$(MACHDEP) to plat-$(PLATFORM_TRIPLET) has patch closed 26839 58 months ago Python 3.5 running on Linux kernel 3.17+ can block at startup or on importing the random module on getrandom() has patch closed 24226 71 months ago [3.5 Regression] unable to byte-compile the attached IN.py closed 24162 71 months ago [2.7 regression] test_asynchat test failure on i586-linux-gnu closed 23842 72 months ago SystemError: ../Objects/longobject.c:998: bad argument to internal function has patch closed 22523 79 months ago [regression] Lib/ssl.py still references _ssl.sslwrap has patch closed 17192 96 months ago libffi-3.0.13 import has patch closed 17579 97 months ago socket module in 2.7.4 raises error instead of gaierror in 2.7.3 closed 14330 105 months ago don't use host python, use host search paths for host compiler has patch closed 4303 149 months ago [2.5 regression] ctypes fails to build on arm-linux-gnu closed 4469 149 months ago CVE-2008-5031 multiple integer overflows closed 4552 150 months ago Doc/tools/sphinxext not included in the 2.6.1 tarball closed 4519 150 months ago .pyc files included in 2.6 and 3.0 release tarballs closed 2601 157 months ago [regression] reading from a urllib2 file descriptor happens byte-at-a-time closed 984880 203 months ago fix regression on 2.3 branch: Lib/threading has patch closed 598996 226 months ago Failure building the documentation has patch closed The tradeoffs are the cost of bugfix release candidates (paid most by a few people) versus the cost of the relative rare hotfix releases versus the frequency of bugfix releases. I personally prefer a bugfix every 2 months without candidate to a bugfix every 4 months with one (these being alternatives with the same work by the release crew). A possible compromise would be to cut the release a couple of days ahead and then pause while Matthias or anyone else runs tests. But I would understand if Lukasz considered pausing to be worse than finishing the release then and there. Or announce the upcoming release a week ahead and ask committers to only merge PRs that could possibly break anything to only do so with fresh CI tests. (Stale tests were one recent reason a test-breaking PR got merged.) -- Terry Jan Reedy

On 4 Apr 2021, at 01:15, Miro Hrončok <mhroncok@redhat.com> wrote:
However, I need to ask: Would this also happen if there was a rc version of 3.9.3?
Good question. The RC would not help. Most importantly, 3.9.3 was itself an expedited release due to its security content. When I did use an RC phase for 3.9.2, which also contained security fixes, it met with considerable backlash and urges to release the update faster. And I ultimately did, two days after the RC was out. Informed by this experience, I would have likely skipped the RC for 3.9.3 anyway. More generally, RCs historically provided little value. Since Python 3.4 we've provided 55 bugfix releases. Five of those included an RC2, suggesting testing caught a regression. Let's look closer: - none of those happened for 3.8 and 3.9 releases; - two of those are a single issue in 3.7.1rc1 and 3.6.7rc1: https://bugs.python.org/issue34927 <https://bugs.python.org/issue34927>, indeed caught by a user downloading an rc1 installer from python.org <http://python.org/>; - one was found by a third-party during "preparation for Python 3.8" and it just happened to be a regression also present in 3.7.4rc1 (https://bugs.python.org/issue24214 <https://bugs.python.org/issue24214>); - one was found by a third-party using nightly Python builds in CI (https://bugs.python.org/issue38216 <https://bugs.python.org/issue38216>) and it just happened to be a regression also present in 3.5.8rc1; - one was found by a core developer running regression tests on what coincidentally happened to be 3.6.2rc1 on Windows (https://bugs.python.org/issue30716 <https://bugs.python.org/issue30716>). The bug was in the tests themselves. So, we're looking at a single instance of a bug found an RC1 installer being out there. Python 3.0 through 3.3 had limited user penetration so looking at those isn't informative. But we can look at Python 2.7, and that one had a single rc2 in its 10 years of bugfix releases. That was 2.7.3rc2, in 2012. It was in the Windows help file, discovered by a core developer looking through it. In the time of 3.8 and 3.9 so far, there was a single hotfix release which was due to a regression not caught by a published release candidate (https://bugs.python.org/issue41304 <https://bugs.python.org/issue41304>). Given the information above, I stand by my decision (confirmed with other release managers) to skip RCs for bugfix releases. - Ł

On Sat, 2021-04-03 at 21:44 +0200, Łukasz Langa wrote:
This is precisely what I meant when I said I don't like the idea of combining security fixes with irrelevant changes. Good that I've chosen to backport the secfixes instead of pushing the new version to Gentoo stable. -- Best regards, Michał Górny

On Mon, 2021-04-05 at 11:17 -0700, Ethan Furman wrote:
I suppose the best way is to look at the security bug: https://bugs.gentoo.org/779841 I'm working on a better tool to check your system for vulnerable packages but I can only dedicate a little time every few days to work on it, so it will take some time before it's ready. -- Best regards, Michał Górny

Hi, About this very specific ABI issue, one long term solution would be to exclude the PyThreadState structure from the C API, to not rely on it the ABI level. I started to add getter functions in Python 3.9: PyThreadState_GetInterpreter(), PyThreadState_GetFrame() and PyThreadState_GetID(). I'm working on updating C extensions to use these getter functions, rather than accessing directly PyThreadState members. I wrote a new pythoncapi_compat.h header file (in an exteral project, pythoncapi_compat) to provide getter functions to Python 2.7-3.8. Cython gives me most of the work, since it gets and sets many PyThreadState members. You can follow the progress at: https://bugs.python.org/issue39947 Victor On Sat, Apr 3, 2021 at 9:45 PM Łukasz Langa <lukasz@langa.pl> wrote:
-- Night gathers, and now my watch begins. It shall not end until my death.

On 4/3/2021 7:15 PM, Miro Hrončok wrote:
Unless the mistake was just introduced, the mistake would have happened. One this severe would likely have been caught within the week or two before a final. But as Łukasz noted when announcing the change, .rcs are generally ignored. (I suspect that most everyone assumes that someone else will test them. And begging people to not do that does not work well enough to justify the release.) 3.8.5 (2020 July 20 was hotfix for 3.8.4 (2020 July 14, which did have a candidate, which did not get tested the way that 3.8.4 itself was. -- Terry Jan Reedy

On Sat, Apr 3, 2021 at 7:49 PM Terry Reedy <tjreedy@udel.edu> wrote:
For 3.9.4 I suggest a strict revert of the offending change. I created such a PR and attached it to the bpo-43710 issue. It is a holiday weekend for a large swath of the world. The recursion based crasher issue the original change was fixing can be saved for a future release and not made under time pressure. I filed https://bugs.python.org/issue43725 to track one suggested way to help automate prevention of these from landing in a release branch and slipping through the cracks in a release. (discuss that on the issue, not here) -Greg

On 4/4/21 4:44 AM, Terry Reedy wrote:
Well, that's not true. I think for at least the past 3.8 and current 3.9 cycle I always tested the release candidates, and built them for various Linux architectures. And I'm filing issues marked as 'released-blocker' when I see regressions introduced, it's up to the release managers to determine if such changes are intended or not. Looking at the failing CI tests triggered by these builds, yes I see that 32bit archs have the ABI change. So maybe it's worth to re-introduce these RC builds, or at least provide source RC tarballs which can be tested. If it's a matter of resources, maybe fall back to quarterly subminor releases again (at least for the most current X.Y releases). Matthias

Matthias Klose writes:
Looking at the failing CI tests triggered by these builds, yes I see that 32bit archs have the ABI change.
I'm not sure precisely what you mean by that, but if you mean that CI has caught the bug, then
So maybe it's worth to re-introduce these RC builds,
seems to be just makework. It makes more sense to delay the schedule somewhat, but only so that the release engineer, or one among the many eyes, catch the CI's warning. Steve

On 4/4/21 2:49 PM, Stephen J. Turnbull wrote:
No, you can't see that with CPython's CI alone. The Debian and Ubuntu build machines trigger CI tests per architecture for around 3000 packages depending on python3.9, using the just built python3.9, and without rebuilding these packages. That's where I see the 32bit failures. How would delaying the release schedule have helped with the issue that we just saw? Matthias

Matthias Klose writes:
Thank you, that's what I wanted to know.
How would delaying the release schedule have helped with the issue that we just saw?
I mean to have a period between the announcement of the release date, and the actual release. So it works the same way an rc would, except not rolling the tarball etc. For almost zero maintainer effort, tag it "rc", give you a few days to run your tests on builds from git. If you (and others) don't report a problem, he tags "final" and then produces tarballs, installers, etc to the extent those things are done at this point in the version's lifecycle. Distros could either drop the "rc" in hopes that this commit will indeed be final (and reroll their release if bugs are found), or they could use the sha1 as identifier for the specific commit instead of stuff like "rc" in versioning the package. (I'm brainstorming, this is *not* a thought-out recommendation!) See also Terry Reedy's post for a different explanation of what I believe is the same idea. Steve

On 4/4/2021 9:57 AM, Łukasz Langa wrote:
On 4 Apr 2021, at 11:34, Matthias Klose <doko@ubuntu.com <mailto:doko@ubuntu.com>> wrote:
Can you show us an example of a release-blocker issue you reported?
Searching creator: doko; priority: release blocker returns 19 issues, from 3 to 20 years ago. That about 1/years but none for 3.8 or 3.9, for which Łukasz is RM. 32232 36 months ago building extensions as builtins is broken in 3.7 has patch has PR closed 32233 40 months ago [3.7 Regression] build with --with-system-libmpdec is broken has patch has PR closed 31016 43 months ago [Regression] sphinx shows an EOF error when using python2.7 from the trunk closed 23968 55 months ago rename the platform directory from plat-$(MACHDEP) to plat-$(PLATFORM_TRIPLET) has patch closed 26839 58 months ago Python 3.5 running on Linux kernel 3.17+ can block at startup or on importing the random module on getrandom() has patch closed 24226 71 months ago [3.5 Regression] unable to byte-compile the attached IN.py closed 24162 71 months ago [2.7 regression] test_asynchat test failure on i586-linux-gnu closed 23842 72 months ago SystemError: ../Objects/longobject.c:998: bad argument to internal function has patch closed 22523 79 months ago [regression] Lib/ssl.py still references _ssl.sslwrap has patch closed 17192 96 months ago libffi-3.0.13 import has patch closed 17579 97 months ago socket module in 2.7.4 raises error instead of gaierror in 2.7.3 closed 14330 105 months ago don't use host python, use host search paths for host compiler has patch closed 4303 149 months ago [2.5 regression] ctypes fails to build on arm-linux-gnu closed 4469 149 months ago CVE-2008-5031 multiple integer overflows closed 4552 150 months ago Doc/tools/sphinxext not included in the 2.6.1 tarball closed 4519 150 months ago .pyc files included in 2.6 and 3.0 release tarballs closed 2601 157 months ago [regression] reading from a urllib2 file descriptor happens byte-at-a-time closed 984880 203 months ago fix regression on 2.3 branch: Lib/threading has patch closed 598996 226 months ago Failure building the documentation has patch closed The tradeoffs are the cost of bugfix release candidates (paid most by a few people) versus the cost of the relative rare hotfix releases versus the frequency of bugfix releases. I personally prefer a bugfix every 2 months without candidate to a bugfix every 4 months with one (these being alternatives with the same work by the release crew). A possible compromise would be to cut the release a couple of days ahead and then pause while Matthias or anyone else runs tests. But I would understand if Lukasz considered pausing to be worse than finishing the release then and there. Or announce the upcoming release a week ahead and ask committers to only merge PRs that could possibly break anything to only do so with fresh CI tests. (Stale tests were one recent reason a test-breaking PR got merged.) -- Terry Jan Reedy

On 4 Apr 2021, at 01:15, Miro Hrončok <mhroncok@redhat.com> wrote:
However, I need to ask: Would this also happen if there was a rc version of 3.9.3?
Good question. The RC would not help. Most importantly, 3.9.3 was itself an expedited release due to its security content. When I did use an RC phase for 3.9.2, which also contained security fixes, it met with considerable backlash and urges to release the update faster. And I ultimately did, two days after the RC was out. Informed by this experience, I would have likely skipped the RC for 3.9.3 anyway. More generally, RCs historically provided little value. Since Python 3.4 we've provided 55 bugfix releases. Five of those included an RC2, suggesting testing caught a regression. Let's look closer: - none of those happened for 3.8 and 3.9 releases; - two of those are a single issue in 3.7.1rc1 and 3.6.7rc1: https://bugs.python.org/issue34927 <https://bugs.python.org/issue34927>, indeed caught by a user downloading an rc1 installer from python.org <http://python.org/>; - one was found by a third-party during "preparation for Python 3.8" and it just happened to be a regression also present in 3.7.4rc1 (https://bugs.python.org/issue24214 <https://bugs.python.org/issue24214>); - one was found by a third-party using nightly Python builds in CI (https://bugs.python.org/issue38216 <https://bugs.python.org/issue38216>) and it just happened to be a regression also present in 3.5.8rc1; - one was found by a core developer running regression tests on what coincidentally happened to be 3.6.2rc1 on Windows (https://bugs.python.org/issue30716 <https://bugs.python.org/issue30716>). The bug was in the tests themselves. So, we're looking at a single instance of a bug found an RC1 installer being out there. Python 3.0 through 3.3 had limited user penetration so looking at those isn't informative. But we can look at Python 2.7, and that one had a single rc2 in its 10 years of bugfix releases. That was 2.7.3rc2, in 2012. It was in the Windows help file, discovered by a core developer looking through it. In the time of 3.8 and 3.9 so far, there was a single hotfix release which was due to a regression not caught by a published release candidate (https://bugs.python.org/issue41304 <https://bugs.python.org/issue41304>). Given the information above, I stand by my decision (confirmed with other release managers) to skip RCs for bugfix releases. - Ł

On Sat, 2021-04-03 at 21:44 +0200, Łukasz Langa wrote:
This is precisely what I meant when I said I don't like the idea of combining security fixes with irrelevant changes. Good that I've chosen to backport the secfixes instead of pushing the new version to Gentoo stable. -- Best regards, Michał Górny

On Mon, 2021-04-05 at 11:17 -0700, Ethan Furman wrote:
I suppose the best way is to look at the security bug: https://bugs.gentoo.org/779841 I'm working on a better tool to check your system for vulnerable packages but I can only dedicate a little time every few days to work on it, so it will take some time before it's ready. -- Best regards, Michał Górny

Hi, About this very specific ABI issue, one long term solution would be to exclude the PyThreadState structure from the C API, to not rely on it the ABI level. I started to add getter functions in Python 3.9: PyThreadState_GetInterpreter(), PyThreadState_GetFrame() and PyThreadState_GetID(). I'm working on updating C extensions to use these getter functions, rather than accessing directly PyThreadState members. I wrote a new pythoncapi_compat.h header file (in an exteral project, pythoncapi_compat) to provide getter functions to Python 2.7-3.8. Cython gives me most of the work, since it gets and sets many PyThreadState members. You can follow the progress at: https://bugs.python.org/issue39947 Victor On Sat, Apr 3, 2021 at 9:45 PM Łukasz Langa <lukasz@langa.pl> wrote:
-- Night gathers, and now my watch begins. It shall not end until my death.
participants (9)
-
Ethan Furman
-
Gregory P. Smith
-
Matthias Klose
-
Michał Górny
-
Miro Hrončok
-
Stephen J. Turnbull
-
Terry Reedy
-
Victor Stinner
-
Łukasz Langa