Need help to debug a ssl crash on Windows which prevents merging PRs

Hi, In the 3.10 branch, it became really hard to merge PRs because the following ssl crashs on Windows: https://bugs.python.org/issue44252 It has a failure rate 1/2 (on average) on the "Windows x86" and "Windows x64" jobs of GitHub Action and on the Win32 and Win64 jobs of the Azure Pipelines. I failed to reproduce it on an up to date Windows 10 with an up to date Visual Studio 2019. I cannot say if it's a race condition, if it's a bug in Python, if it's a bug in the tests (it sounds unlikely, it worked well previously and it's a hard crash, not a Python exception), if it's a bug in the C compiler or in Windows itself... Since there are other random test failures like test_asyncio, it now requires multiple "re-run jobs" on GitHub Actions. Example of test_asyncio test which fails frequently on Windows: https://bugs.python.org/issue41682 If someone can reproduce https://bugs.python.org/issue44252 crash on Windows, can you please provide me a SSH access to your machine so I can debug it? Here is my public SSH key: https://github.com/vstinner.keys Is there a way to get a SSH access to a machine of the GitHub Action CI job or of an Azure Pipelines CI job? Victor -- Night gathers, and now my watch begins. It shall not end until my death.

On Fri, May 28, 2021 at 6:40 PM Victor Stinner <vstinner@python.org> wrote:
Update on this bug which blocked the Python 3.10 beta 2 release. It's now fully fixed! It was a simple bug in the _ssl.SSLError exception. The problem was that the crash only occurred on Windows and only if tests were run in a very specific way. On CIs, the crash was deterministic. When I debugged the issue manually, I failed to reproduce it. I tried many different ways to run the tests, none worked. I recall an old hack: run "import gc; gc.set_threshold(5)" at startup. It makes crashes related to GC way more likely (the default threshold of GC generation 0 is 700). I used this hack 3 years ago to debug another GC bug really hard to reproduce: https://mail.python.org/pipermail/python-dev/2018-June/153857.html https://docs.python.org/dev/library/gc.html#gc.set_threshold Not only the _ssl.SSLError bug is fixed, but Pablo also fixed the documentation to explain clearly that a traverse function must be implemented if Py_TPFLAGS_HAVE_GC is set: https://github.com/python/cpython/commit/8b55bc3f93a655bc803bff79725d5fe3f12... Moreover, for people who don't read the documentation ;-), I also made sure that it's no longer possible to create a type with Py_TPFLAGS_HAVE_GC but with no traverse function: https://github.com/python/cpython/commit/ee7637596d8de25f54261bbeabc602d31e7... By the way, I had to fix two stdlib types (_testcapi and _decimal modules) which didn't respect that! Victor

On Fri, May 28, 2021 at 6:40 PM Victor Stinner <vstinner@python.org> wrote:
Update on this bug which blocked the Python 3.10 beta 2 release. It's now fully fixed! It was a simple bug in the _ssl.SSLError exception. The problem was that the crash only occurred on Windows and only if tests were run in a very specific way. On CIs, the crash was deterministic. When I debugged the issue manually, I failed to reproduce it. I tried many different ways to run the tests, none worked. I recall an old hack: run "import gc; gc.set_threshold(5)" at startup. It makes crashes related to GC way more likely (the default threshold of GC generation 0 is 700). I used this hack 3 years ago to debug another GC bug really hard to reproduce: https://mail.python.org/pipermail/python-dev/2018-June/153857.html https://docs.python.org/dev/library/gc.html#gc.set_threshold Not only the _ssl.SSLError bug is fixed, but Pablo also fixed the documentation to explain clearly that a traverse function must be implemented if Py_TPFLAGS_HAVE_GC is set: https://github.com/python/cpython/commit/8b55bc3f93a655bc803bff79725d5fe3f12... Moreover, for people who don't read the documentation ;-), I also made sure that it's no longer possible to create a type with Py_TPFLAGS_HAVE_GC but with no traverse function: https://github.com/python/cpython/commit/ee7637596d8de25f54261bbeabc602d31e7... By the way, I had to fix two stdlib types (_testcapi and _decimal modules) which didn't respect that! Victor
participants (2)
-
Rob Cliffe
-
Victor Stinner