Hello,
I have successfully built Python 3.8.1 on QNX, but ran into a problem when using 'make -j4'. The build process invariably hangs with multiple single-threaded child processes stuck indefinitely waiting on semaphores. These semaphores will clearly never be posted to, as the processes are all single threaded and the semaphores are not shared with any other process. A backtrace shows that the the offending calls come from run_at_forkers(), which is not surprising. I consider a multi-threaded fork() to be an ill-defined operation and my arch-nemesis... The problem here is that the value of the semaphore inherited from the parent shows the semaphore to be unavailable, even though the semaphore *object* itself is not the same as the one used by the parent (they share the same virtual address, but in different address spaces with different backing memory).
A few (noob) questions: 1. Is there a way to correlate the C backtrace from the core dump to the Python code that resulted in this hang? 2. It is well-known that a multi-threaded fork() is inherently unsafe, and POSIX says that no non-async-signal-safe operations are allowed between the fork() and exec() calls. I even saw comments to this effect in the Python source code ;-) So why is it done? 3. Any reason not to use posix_spawn() instead of fork()/exec()? While some systems implement posix_spawn() with fork()/exec() others (at least QNX) implements it without first creating a duplicate of the parent, making it both more efficient and safer in a multi-threaded parent. 4. thread_pthread.h seems to go to great lengths to implement locks without using native mutexes. I found one reference in the code dating to 1994 as to why that is done. Is it still applicable? Contrary to the claim in that comment the semantics for trying to lock an already-locked mutex and for unlocking an unowned mutex are well-defined.
Thanks, --Elad
More information: The hang happens when building extensions, using the setup.py script. The script determines that the build is parallel (build_ext.py/build_extensions) and creates a thread pool. Each thread then executes a compilation job by fork()ing a compiler process.
I don't see how it works on any system that keeps the semaphore state in user mode.
--Elad
What semaphore is the subprocess blocking on? I.e. where in the Python code (presumably of ThreadPoolExecutor) is that semaphore defined?
Given your hypothesis of the cause of the problem, can you perhaps write a much simper program (which is simpler to debug and simpler to reason about) that exhibits the same hang?
Does the same problem happen in earlier Python versions? On other operating systems? (I've never had a problem with `make -j` on macOS, for example; I suspect that there's something unique to threads and semaphores on QNX that contributes to the hang.)
On Wed, Feb 26, 2020 at 7:53 AM Elad Lahav e2lahav@gmail.com wrote:
More information: The hang happens when building extensions, using the setup.py script. The script determines that the build is parallel ( build_ext.py/build_extensions) and creates a thread pool. Each thread then executes a compilation job by fork()ing a compiler process.
I don't see how it works on any system that keeps the semaphore state in user mode.
--Elad _______________________________________________ Python-Dev mailing list -- python-dev@python.org To unsubscribe send an email to python-dev-leave@python.org https://mail.python.org/mailman3/lists/python-dev.python.org/ Message archived at https://mail.python.org/archives/list/python-dev@python.org/message/DGEMKEX3... Code of Conduct: http://python.org/psf/codeofconduct/
It's actually not clear to me what lock it is from the core file I took, as rlock_acquire() is called through a function pointer from method_vectorcall_VARARGS_KEYWORDS() (I posted the backtrace separately).
My suspicion is that it doesn't fail on macOS because it may keep all of the semaphore's state in the kernel, which means that it is not necessarily inherited on fork(). QNX keeps the count in user mode, in a similar fashion to the way some state is kept in user mode for fast mutexes.
I'll see if I can come up with a simple scenario. In the mean time I am trying to switch to posix_spawn() to see if it fixes the problem (and will also be much faster on QNX, as you don't need to create a duplicate address space just to tear it down with exec()).
--Elad
A change to posix_spawnp() fixes the problem for me:
diff --git a/Lib/distutils/spawn.py b/Lib/distutils/spawn.py index ceb94945dc..cb69de4242 100644 --- a/Lib/distutils/spawn.py +++ b/Lib/distutils/spawn.py @@ -90,7 +90,7 @@ def _spawn_posix(cmd, search_path=1, verbose=0, dry_run=0): return executable = cmd[0] exec_fn = search_path and os.execvp or os.execv - env = None + env = os.environ if sys.platform == 'darwin': global _cfg_target, _cfg_target_split if _cfg_target is None: @@ -112,7 +112,7 @@ def _spawn_posix(cmd, search_path=1, verbose=0, dry_run=0): env = dict(os.environ, MACOSX_DEPLOYMENT_TARGET=cur_target) exec_fn = search_path and os.execvpe or os.execve - pid = os.fork() + pid = os.posix_spawnp(executable, cmd, env) if pid == 0: # in the child try: if env is None:
--Elad
Admittedly, the whole distutils spawn code should be rewritten to use subprocess.
Regards
Antoine.
On Wed, 26 Feb 2020 17:52:53 -0000 "Elad Lahav" e2lahav@gmail.com wrote:
A change to posix_spawnp() fixes the problem for me:
diff --git a/Lib/distutils/spawn.py b/Lib/distutils/spawn.py index ceb94945dc..cb69de4242 100644 --- a/Lib/distutils/spawn.py +++ b/Lib/distutils/spawn.py @@ -90,7 +90,7 @@ def _spawn_posix(cmd, search_path=1, verbose=0, dry_run=0): return executable = cmd[0] exec_fn = search_path and os.execvp or os.execv - env = None + env = os.environ if sys.platform == 'darwin': global _cfg_target, _cfg_target_split if _cfg_target is None: @@ -112,7 +112,7 @@ def _spawn_posix(cmd, search_path=1, verbose=0, dry_run=0): env = dict(os.environ, MACOSX_DEPLOYMENT_TARGET=cur_target) exec_fn = search_path and os.execvpe or os.execve - pid = os.fork() + pid = os.posix_spawnp(executable, cmd, env) if pid == 0: # in the child try: if env is None:
--Elad
A simple example that reproduces the hang (please keep in mind that I have very little experience writing Python code...):
import os from concurrent.futures import ThreadPoolExecutor
def new_process(arg): pid = os.fork() if pid == 0: exec_fn("/bin/true", "/bin/true") else: pid, status = os.waitpid(pid, 0)
with ThreadPoolExecutor(max_workers=4) as executor: futures = [executor.submit(new_process, None) for i in range(0, 4)] for fut in futures: fut.result()
--Elad
I believe that the problem is in logging/__init__.py, which registers an atfork() handler for re-initializing its lock. However, as part of this process it attempts to acquire the module lock, which has not been reinitialized and so still reflects the parent's state of the lock.
--Elad
Hi Elad,
Can you open an issue on https://bugs.python.org/ and post your findings there? I don't think it makes sense to continue discussing this on python-dev.
(note that opening a bug doesn't mean it will be fixed quickly, but at least it's recorded somewhere instead of being lost in the mailing-list archives)
Best regards
Antoine.
On Wed, 26 Feb 2020 18:47:39 -0000 "Elad Lahav" e2lahav@gmail.com wrote:
I believe that the problem is in logging/__init__.py, which registers an atfork() handler for re-initializing its lock. However, as part of this process it attempts to acquire the module lock, which has not been reinitialized and so still reflects the parent's state of the lock.
--Elad
Elad created the issue:
"distutils.spawn should use subprocess (hang in parallel builds on QNX)" https://bugs.python.org/issue39763
Victor
Le mer. 26 févr. 2020 à 20:22, Elad Lahav e2lahav@gmail.com a écrit :
Done.
Thanks, --Elad _______________________________________________ Python-Dev mailing list -- python-dev@python.org To unsubscribe send an email to python-dev-leave@python.org https://mail.python.org/mailman3/lists/python-dev.python.org/ Message archived at https://mail.python.org/archives/list/python-dev@python.org/message/5YBM6LDL... Code of Conduct: http://python.org/psf/codeofconduct/
Hi,
On Tue, 25 Feb 2020 11:48:44 -0000 e2lahav@gmail.com wrote:
Hello,
I have successfully built Python 3.8.1 on QNX, but ran into a problem when using 'make -j4'. The build process invariably hangs with multiple single-threaded child processes stuck indefinitely waiting on semaphores. These semaphores will clearly never be posted to, as the processes are all single threaded and the semaphores are not shared with any other process. A backtrace shows that the the offending calls come from run_at_forkers(), which is not surprising. I consider a multi-threaded fork() to be an ill-defined operation and my arch-nemesis...
Can you post the detailed backtrace(s) somewhere? Otherwise nobody will be able to dive deeper.
- Any reason not to use posix_spawn() instead of fork()/exec()? While some systems implement posix_spawn() with fork()/exec() others (at least QNX) implements it without first creating a duplicate of the parent, making it both more efficient and safer in a multi-threaded parent.
Well, subprocess can do so in some cases: https://bugs.python.org/issue35537
I'm not sure what can prevent it here, but you're welcome to investigate.
- thread_pthread.h seems to go to great lengths to implement locks without using native mutexes.
Python locks are not like mutexes. They can be locked from one thread and unlocked from another, so they're more like binary semaphores.
Regards
Antoine.
Sorry, should have posted the backtrace from the beginning. It goes deeper than this, but the important part is in the child after fork():
#0 SyncSemWait () at /builds/workspace/710-SDP/build_x86_64/lib/c/kercalls/x86_64/SyncSemWait.S:37 #1 0x0000004bfa174ac6 in PyThread_acquire_lock_timed (lock=lock@entry=0x24f0cd9430, microseconds=microseconds@entry=-1000000, intr_flag=intr_flag@entry=1) at Python/thread_pthread.h:459 #2 0x0000004bfa1c77bd in acquire_timed (lock=0x24f0cd9430, timeout=-1000000000) at ./Modules/_threadmodule.c:63 #3 0x0000004bfa1c78e7 in rlock_acquire (self=0x24f13027e0, args=<optimized out>, kwds=<optimized out>) at ./Modules/_threadmodule.c:308 #4 0x0000004bfa204631 in method_vectorcall_VARARGS_KEYWORDS (func=0x24f0d61ef0, args=0x24f131c660, nargsf=<optimized out>, kwnames=<optimized out>) at Objects/descrobject.c:332 #5 0x0000004bfa06eff6 in _PyObject_Vectorcall (kwnames=<optimized out>, nargsf=<optimized out>, args=<optimized out>, callable=<optimized out>) at ./Include/cpython/abstract.h:123 #6 call_function (kwnames=0x0, oparg=1, pp_stack=<synthetic pointer>, tstate=0x24f0d8ee00) at Python/ceval.c:4987 #7 _PyEval_EvalFrameDefault (f=<optimized out>, throwflag=<optimized out>) at Python/ceval.c:3486 #8 0x0000004bfa06c95b in function_code_fastcall (co=<optimized out>, args=<optimized out>, nargs=0, globals=<optimized out>) at Objects/call.c:283 #9 0x0000004bfa06ed29 in _PyObject_Vectorcall (kwnames=<optimized out>, nargsf=<optimized out>, args=<optimized out>, callable=<optimized out>) at ./Include/cpython/abstract.h:127 #10 call_function (kwnames=0x0, oparg=<optimized out>, pp_stack=<synthetic pointer>, tstate=0x24f0d8ee00) at Python/ceval.c:4987 #11 _PyEval_EvalFrameDefault (f=<optimized out>, throwflag=<optimized out>) at Python/ceval.c:3500 #12 0x0000004bfa06c95b in function_code_fastcall (co=<optimized out>, args=<optimized out>, nargs=1, globals=<optimized out>) at Objects/call.c:283 #13 0x0000004bfa06ed29 in _PyObject_Vectorcall (kwnames=<optimized out>, nargsf=<optimized out>, args=<optimized out>, callable=<optimized out>) at ./Include/cpython/abstract.h:127 #14 call_function (kwnames=0x0, oparg=<optimized out>, pp_stack=<synthetic pointer>, tstate=0x24f0d8ee00) at Python/ceval.c:4987 #15 _PyEval_EvalFrameDefault (f=<optimized out>, throwflag=<optimized out>) at Python/ceval.c:3500 #16 0x0000004bfa06c95b in function_code_fastcall (co=<optimized out>, args=<optimized out>, nargs=1, globals=<optimized out>) at Objects/call.c:283 #17 0x0000004bfa06eff6 in _PyObject_Vectorcall (kwnames=<optimized out>, nargsf=<optimized out>, args=<optimized out>, callable=<optimized out>) at ./Include/cpython/abstract.h:123 #18 call_function (kwnames=0x0, oparg=1, pp_stack=<synthetic pointer>, tstate=0x24f0d8ee00) at Python/ceval.c:4987 #19 _PyEval_EvalFrameDefault (f=<optimized out>, throwflag=<optimized out>) at Python/ceval.c:3486 #20 0x0000004bfa06c95b in function_code_fastcall (co=<optimized out>, args=<optimized out>, nargs=0, globals=<optimized out>) at Objects/call.c:283 #21 0x0000004bfa08626a in _PyObject_FastCallDict (callable=0x24f152fd30, args=<optimized out>, nargsf=<optimized out>, kwargs=<optimized out>) at Objects/call.c:96 #22 0x0000004bfa189584 in run_at_forkers (lst=<optimized out>, reverse=<optimized out>) at ./Modules/posixmodule.c:435 #23 0x0000004bfa194bfb in run_at_forkers (reverse=0, lst=<optimized out>) at ./Modules/posixmodule.c:420 #24 PyOS_AfterFork_Child () at ./Modules/posixmodule.c:474 #25 0x0000004bfa194d08 in os_fork_impl (module=<optimized out>) at ./Modules/posixmodule.c:6082 #26 os_fork (module=<optimized out>, _unused_ignored=<optimized out>) at ./Modules/clinic/posixmodule.c.h:2709
--Elad