[New-bugs-announce] [issue42969] pthread_exit & PyThread_exit_thread are harmful
Gregory P. Smith
report at bugs.python.org
Tue Jan 19 14:35:46 EST 2021
New submission from Gregory P. Smith <greg at krypto.org>:
`PyThread_exit_thread()` calls `pthread_exit(`) and is in turn called from a variety of APIs as documented in the C-API doc update from https://bugs.python.org/issue36427.
The `pthread_exit()` call was originally introduced as a way "resolve" a crashes from daemon threads during shutdown https://bugs.python.org/issue1856. It did that. That fix to that even landed in 2.7.8 but was rolled back before 2.7.9 due to a bug in an existing application it exposed at the time (did we miss the implications of that? maybe). It remained in 3.x.
`pthread_exit()` cannot be used blindly by any application. All code in the threaded application needs to be on board with it and always prepared for any API call they make to potentially lead to thread termination. Quoting a colleague: "pthread_exit() does not result in stack unwind or local variable destruction". This means that any code up the stack from the ultimate `pthread_exit()` call that has process-wide state implications that did not go out of its way to register cleanup with `pthread_cleanup_push()` could lead to deadlocks or lost resources. Something implausible to assume that code does.
We're seeing this happen with C/C++ code. Our C++ builds with `-fno-exceptions` so uncatchable exception based stack unwinding as some pthread_exit implementations may trigger does not happen (and cannot be guaranteed anyways, see bpo-42888). Said C/C++ code is calling back into Python from a thread and thus must use `PyEval_RestoreThread()` or similar APIs before performing Python C API calls. If the interpreter is being finalized from another thread... these enter a codepath that ultimately calls `pthread_exit()` leaving corrupt state in the process. In this case that unexpected thread simply disappearing can lead to a deadlock in our process.
Fundamentally I do not believe the CPython VM should ever call `pthread_exit()` when non-CPython frames are anywhere in the C stack. This may mean we should never call `pthread_exit()` at all (unsure; but it'd be ideal).
The documentation suggests that all callers in user code of the four C-APIs with the documented `pthread_exit()` caveats need auditing and pre-call `_Py_IsFinalizing()` API checks. But... I do not believe that would fix anything even if it were done. `_Py_IsFinalizing()` called without the GIL held means that it could change by the time the `PyEval_RestoreThreads()` API calls it internally do determine if it should exit the thread. Thus the race condition window would merely be narrowed, not eliminated. Not good enough.
## CURRENT WORKAROUND (Big Hammer)
Change CPython to call abort() instead of pthread_exit() as that situation is unresolvable and the process dying is better than hanging, partially alive. That solution isn't friendly, but is better than being silent and allowing deadlock. A failing process is always better than a hung process, especially a partially hung process.
## SEMI RELATED WORK
https://bugs.python.org/issue42888 - appears to be avoiding some `PyThread_exit_thread()` calls to stop some crashes due to `libgcc_s` being loaded on demand upon thread exit.
components: Interpreter Core
title: pthread_exit & PyThread_exit_thread are harmful
versions: Python 3.10
Python tracker <report at bugs.python.org>
More information about the New-bugs-announce