Need help to debug a ssl crash on Windows which prevents merging PRs
Hi, In the 3.10 branch, it became really hard to merge PRs because the following ssl crashs on Windows: https://bugs.python.org/issue44252 It has a failure rate 1/2 (on average) on the "Windows x86" and "Windows x64" jobs of GitHub Action and on the Win32 and Win64 jobs of the Azure Pipelines. I failed to reproduce it on an up to date Windows 10 with an up to date Visual Studio 2019. I cannot say if it's a race condition, if it's a bug in Python, if it's a bug in the tests (it sounds unlikely, it worked well previously and it's a hard crash, not a Python exception), if it's a bug in the C compiler or in Windows itself... Since there are other random test failures like test_asyncio, it now requires multiple "re-run jobs" on GitHub Actions. Example of test_asyncio test which fails frequently on Windows: https://bugs.python.org/issue41682 If someone can reproduce https://bugs.python.org/issue44252 crash on Windows, can you please provide me a SSH access to your machine so I can debug it? Here is my public SSH key: https://github.com/vstinner.keys Is there a way to get a SSH access to a machine of the GitHub Action CI job or of an Azure Pipelines CI job? Victor -- Night gathers, and now my watch begins. It shall not end until my death.
On Fri, May 28, 2021 at 6:40 PM Victor Stinner <vstinner@python.org> wrote:
In the 3.10 branch, it became really hard to merge PRs because the following ssl crashs on Windows: https://bugs.python.org/issue44252
Update on this bug which blocked the Python 3.10 beta 2 release. It's now fully fixed! It was a simple bug in the _ssl.SSLError exception. The problem was that the crash only occurred on Windows and only if tests were run in a very specific way. On CIs, the crash was deterministic. When I debugged the issue manually, I failed to reproduce it. I tried many different ways to run the tests, none worked. I recall an old hack: run "import gc; gc.set_threshold(5)" at startup. It makes crashes related to GC way more likely (the default threshold of GC generation 0 is 700). I used this hack 3 years ago to debug another GC bug really hard to reproduce: https://mail.python.org/pipermail/python-dev/2018-June/153857.html https://docs.python.org/dev/library/gc.html#gc.set_threshold Not only the _ssl.SSLError bug is fixed, but Pablo also fixed the documentation to explain clearly that a traverse function must be implemented if Py_TPFLAGS_HAVE_GC is set: https://github.com/python/cpython/commit/8b55bc3f93a655bc803bff79725d5fe3f12... Moreover, for people who don't read the documentation ;-), I also made sure that it's no longer possible to create a type with Py_TPFLAGS_HAVE_GC but with no traverse function: https://github.com/python/cpython/commit/ee7637596d8de25f54261bbeabc602d31e7... By the way, I had to fix two stdlib types (_testcapi and _decimal modules) which didn't respect that! Victor
Well done Victor! This stuff is way over my head, but rest assured that humble Python programmers like me appreciate all the effort put in from guys like you into improving Python. Rob Cliffe On 01/06/2021 23:14, Victor Stinner wrote:
On Fri, May 28, 2021 at 6:40 PM Victor Stinner <vstinner@python.org> wrote:
In the 3.10 branch, it became really hard to merge PRs because the following ssl crashs on Windows: https://bugs.python.org/issue44252 Update on this bug which blocked the Python 3.10 beta 2 release. It's now fully fixed!
It was a simple bug in the _ssl.SSLError exception. [snip]
participants (2)
-
Rob Cliffe
-
Victor Stinner