[New-bugs-announce] [issue42849] pool worker can't be terminated
Zhesi Huang
report at bugs.python.org
Wed Jan 6 23:00:06 EST 2021
New submission from Zhesi Huang <huangzhesi at gmail.com>:
i see a case, the worker proc of pool can't be terminated, i try kill -SIGTERM or -SIGINT, but it still can't terminated the worker proc, then the pool __exit__ will hang forever
```
class NonDaemonProcess(multiprocessing.Process):
# https://stackoverflow.com/questions/6974695/python-process-pool-non-daemonic
# make 'daemon' attribute always return False
def _get_daemon(self):
return False
def _set_daemon(self, value):
pass
daemon = property(_get_daemon, _set_daemon)
def wrapper_copy_parallel(output_name, local_output_path, obs_path):
log_thread('upload the content of [%s] outputs' % output_name)
try:
if os.path.exists(local_output_path):
log_thread('%s has %d files to be uploaded' % (output_name, len(file_list)))
xxxxxx
log_thread('upload the content of [%s] outputs successfully' % output_name)
log_thread('it can be accessed at obs path [%s]' % obs_path)
else:
log_thread('local output path is not found, skip upload the content of [%s] outputs' % output_name)
except Exception as upload_exception:
err_thread('upload the content of [%s] outputs failed: %s' % (output_name, str(upload_exception)))
return 255
return 0
def upload_to_s3():
"""
upload the content of local path to s3, handle action [on_completed]
:return:
"""
outputs = []
for local_output_path, (output_name, obs_path, action, _, _) in local_to_target.items():
if action == ACTION_ON_COMPLETED:
outputs.append((output_name, local_output_path, obs_path))
if len(outputs) == 0:
return 0
with NonDaemonPool(processes=len(outputs)) as pool:
results = pool.starmap(wrapper_copy_parallel, outputs)
for result in results:
if result != 0:
return result
return 0
```
stdout/stderr log
[ma-test Service Log][INFO][2021/01/05 03:07:49,218]: registered signal handler
[ma-test Service Log][INFO][2021/01/05 03:09:40,299]: output-handler finalizing
[ma-test Service Log][INFO][2021-01-05 03:09:40,309][NonDaemonPoolWorker-1]: child process calling self.run()
[ma-test Service Log][INFO][2021-01-05 03:09:40,311][NonDaemonPoolWorker-1]: upload the content of [] outputs
[ma-test Service Log][INFO][2021-01-05 03:09:41,331][Process-1:1]: child process calling self.run()
[ma-test Service Log][INFO][2021-01-05 03:09:41,333][Process-1:2]: child process calling self.run()
[ma-test Service Log][INFO][2021-01-05 03:09:41,338][Process-1:3]: child process calling self.run()
[ma-test Service Log][INFO][2021-01-05 03:09:41,351][Process-1:1]: process shutting down
[ma-test Service Log][INFO][2021-01-05 03:09:41,351][Process-1:1]: process exiting with exitcode 0
[ma-test Service Log][INFO][2021-01-05 03:09:41,386][Process-1:2]: process shutting down
[ma-test Service Log][INFO][2021-01-05 03:09:41,386][Process-1:2]: process exiting with exitcode 0
[ma-test Service Log][INFO][2021-01-05 03:09:41,410][Process-1:3]: process shutting down
[ma-test Service Log][INFO][2021-01-05 03:09:41,410][Process-1:3]: process exiting with exitcode 0
[ma-test Service Log][INFO][2021-01-05 03:09:41,415][NonDaemonPoolWorker-1]: upload the content of [] outputs successfully
[ma-test Service Log][INFO][2021-01-05 03:09:41,415][NonDaemonPoolWorker-1]: it can be accessed at obs path [s3://ma-test-algorancher-intel/model_evaluation/6e5746ff-2839-400a-ba93-df38311415f4/dac957b0-b43b-43e2-ab19-0b45672a7ea0/]
18 (pid) proc python stacktrace
>>>
Interrupting process at following point:
File "/home/ma-user/runtime-scripts-v2/init-container/outputs-handler.py", line 396, in <module>
ret_code = upload_to_obs()
File "/home/ma-user/runtime-scripts-v2/init-container/outputs-handler.py", line 287, in upload_to_obs
results = pool.starmap(wrapper_copy_parallel, outputs)
File "/home/ma-user/miniconda3/lib/python3.7/multiprocessing/pool.py", line 623, in __exit__
self.terminate()
File "/home/ma-user/miniconda3/lib/python3.7/multiprocessing/pool.py", line 548, in terminate
self._terminate()
File "/home/ma-user/miniconda3/lib/python3.7/multiprocessing/util.py", line 201, in __call__
res = self._callback(*self._args, **self._kwargs)
File "/home/ma-user/miniconda3/lib/python3.7/multiprocessing/pool.py", line 617, in _terminate_pool
p.join()
File "/home/ma-user/miniconda3/lib/python3.7/multiprocessing/process.py", line 140, in join
res = self._popen.wait(timeout)
File "/home/ma-user/miniconda3/lib/python3.7/multiprocessing/popen_fork.py", line 48, in wait
return self.poll(os.WNOHANG if timeout == 0.0 else 0)
File "/home/ma-user/miniconda3/lib/python3.7/multiprocessing/popen_fork.py", line 28, in poll
pid, sts = os.waitpid(self.pid, flag)
18 (pid) proc gdb bt
#0 0x00007f15c2c5ff7b in waitpid () from /lib/x86_64-linux-gnu/libpthread.so.0
#1 0x000055f54a8a5180 in os_waitpid_impl.isra.78 (options=0, pid=28) at /tmp/build/80754af9/python_1588882889832/work/Modules/posixmodule.c:7066
#2 os_waitpid () at /tmp/build/80754af9/python_1588882889832/work/Modules/clinic/posixmodule.c.h:3049
#3 0x000055f54a7f1b19 in _PyMethodDef_RawFastCallKeywords () at /tmp/build/80754af9/python_1588882889832/work/Objects/call.c:655
#4 0x000055f54a7f1db1 in _PyCFunction_FastCallKeywords (func=0x7f15c1f175f0, args=<optimized out>, nargs=<optimized out>, kwnames=<optimized out>)
at /tmp/build/80754af9/python_1588882889832/work/Objects/call.c:734
28 (pid) proc python stacktrace
Interrupting process at following point:
File "/home/ma-user/runtime-scripts-v2/init-container/outputs-handler.py", line 396, in <module>
ret_code = upload_to_obs()
File "/home/ma-user/runtime-scripts-v2/init-container/outputs-handler.py", line 286, in upload_to_obs
with NonDaemonPool(processes=len(outputs)) as pool:
File "/home/ma-user/miniconda3/lib/python3.7/multiprocessing/pool.py", line 176, in __init__
self._repopulate_pool()
File "/home/ma-user/miniconda3/lib/python3.7/multiprocessing/pool.py", line 241, in _repopulate_pool
w.start()
File "/home/ma-user/miniconda3/lib/python3.7/multiprocessing/process.py", line 112, in start
self._popen = self._Popen(self)
File "/home/ma-user/miniconda3/lib/python3.7/multiprocessing/context.py", line 223, in _Popen
return _default_context.get_context().Process._Popen(process_obj)
File "/home/ma-user/miniconda3/lib/python3.7/multiprocessing/context.py", line 277, in _Popen
return Popen(process_obj)
File "/home/ma-user/miniconda3/lib/python3.7/multiprocessing/popen_fork.py", line 20, in __init__
self._launch(process_obj)
File "/home/ma-user/miniconda3/lib/python3.7/multiprocessing/popen_fork.py", line 74, in _launch
code = process_obj._bootstrap()
File "/home/ma-user/miniconda3/lib/python3.7/multiprocessing/process.py", line 297, in _bootstrap
self.run()
File "/home/ma-user/miniconda3/lib/python3.7/multiprocessing/process.py", line 99, in run
self._target(*self._args, **self._kwargs)
File "/home/ma-user/miniconda3/lib/python3.7/multiprocessing/pool.py", line 110, in worker
task = get()
File "/home/ma-user/miniconda3/lib/python3.7/multiprocessing/queues.py", line 351, in get
with self._rlock:
File "/home/ma-user/miniconda3/lib/python3.7/multiprocessing/synchronize.py", line 95, in __enter__
return self._semlock.__enter__()
28 (pid) proc gdb bt
#0 0x00007f15c2c5e827 in do_futex_wait.constprop () from /lib/x86_64-linux-gnu/libpthread.so.0
#1 0x00007f15c2c5e8d4 in __new_sem_wait_slow.constprop.0 () from /lib/x86_64-linux-gnu/libpthread.so.0
#2 0x00007f15c2c5e97a in sem_wait@@GLIBC_2.2.5 () from /lib/x86_64-linux-gnu/libpthread.so.0
#3 0x00007f15a98058b1 in semlock_acquire () at /usr/local/src/conda/python-3.7.7/Modules/_multiprocessing/semaphore.c:319
#4 0x000055f54a7f1c94 in _PyMethodDef_RawFastCallKeywords () at /tmp/build/80754af9/python_1588882889832/work/Objects/call.c:694
#5 0x000055f54a7f8aef in _PyMethodDescr_FastCallKeywords () at /tmp/build/80754af9/python_1588882889832/work/Objects/descrobject.c:288
proc list
ma-user at ma-test-eval-dac957b0-b43b-43e2-ab19-0b45672a7ea0-worker-0:/$ ps -ef
UID PID PPID C STIME TTY TIME CMD
ma-user 1 0 0 Jan05 ? 00:00:00 /bin/bash /home/ma-user/sidecar.sh
ma-user 16 1 0 Jan05 ? 00:00:00 tail -n 0 -F /home/ma-user/ma-test/log/-0---ma-platform-sidecar.log
ma-user 18 1 0 Jan05 ? 00:00:00 python /home/ma-user/runtime-scripts-v2/init-container/outputs-handler.py
ma-user 28 18 0 Jan05 ? 00:00:00 python /home/ma-user/runtime-scripts-v2/init-container/outputs-handler.py
ma-user 102 0 0 Jan05 pts/0 00:00:00 bash
ma-user 911 0 0 14:56 pts/1 00:00:00 bash
ma-user 931 911 0 14:57 pts/1 00:00:00 ps -ef
ma-user at ma-test-eval-dac957b0-b43b-43e2-ab19-0b45672a7ea0-worker-0:/$
28 fd
ma-user at ma-test-eval-dac957b0-b43b-43e2-ab19-0b45672a7ea0-worker-0:/$ ls -alh /proc/28/fd
total 0
dr-x------ 2 ma-user ma-group 0 Jan 6 20:23 .
dr-xr-xr-x 9 ma-user ma-group 0 Jan 5 17:27 ..
lr-x------ 1 ma-user ma-group 64 Jan 6 20:23 0 -> /dev/null
l-wx------ 1 ma-user ma-group 64 Jan 6 20:23 1 -> /home/ma-user/ma-test/log/-0---ma-platform-sidecar.log
l-wx------ 1 ma-user ma-group 64 Jan 6 20:23 2 -> /home/ma-user/ma-test/log/-0---ma-platform-sidecar.log
lr-x------ 1 ma-user ma-group 64 Jan 6 20:23 3 -> /dev/urandom
lr-x------ 1 ma-user ma-group 64 Jan 6 20:23 4 -> pipe:[952852668]
lrwx------ 1 ma-user ma-group 64 Jan 6 20:23 5 -> socket:[952857499]
lrwx------ 1 ma-user ma-group 64 Jan 6 20:23 6 -> /home/ma-user/conf/scc_logger.conf
l-wx------ 1 ma-user ma-group 64 Jan 6 20:23 7 -> pipe:[952852671]
lr-x------ 1 ma-user ma-group 64 Jan 6 20:23 8 -> /dev/null
l-wx------ 1 ma-user ma-group 64 Jan 6 20:23 9 -> pipe:[952852675]
18 fd
ma-user at ma-test-eval-dac957b0-b43b-43e2-ab19-0b45672a7ea0-worker-0:/$ ls -alh /proc/18/fd
total 0
dr-x------ 2 ma-user ma-group 0 Jan 6 20:26 .
dr-xr-xr-x 9 ma-user ma-group 0 Jan 5 03:07 ..
lr-x------ 1 ma-user ma-group 64 Jan 6 20:26 0 -> /dev/null
l-wx------ 1 ma-user ma-group 64 Jan 6 20:26 1 -> /home/ma-user/ma-test/log/-0---ma-platform-sidecar.log
l-wx------ 1 ma-user ma-group 64 Jan 6 20:26 2 -> /home/ma-user/ma-test/log/-0---ma-platform-sidecar.log
lr-x------ 1 ma-user ma-group 64 Jan 6 20:26 3 -> /dev/urandom
lr-x------ 1 ma-user ma-group 64 Jan 6 20:26 4 -> pipe:[952852668]
l-wx------ 1 ma-user ma-group 64 Jan 6 20:26 5 -> pipe:[952852668]
lr-x------ 1 ma-user ma-group 64 Jan 6 20:26 6 -> pipe:[952852671]
l-wx------ 1 ma-user ma-group 64 Jan 6 20:26 7 -> pipe:[952852671]
lr-x------ 1 ma-user ma-group 64 Jan 6 20:26 8 -> pipe:[952852675]
SIGTERM / SIGINT not work
ma-user at ma-test-eval-dac957b0-b43b-43e2-ab19-0b45672a7ea0-worker-0:/$ ps -ef
UID PID PPID C STIME TTY TIME CMD
ma-user 1 0 0 Jan05 ? 00:00:00 /bin/bash /home/ma-user/sidecar.sh
ma-user 16 1 0 Jan05 ? 00:00:00 tail -n 0 -F /home/ma-user/ma-test/log/-0---ma-platform-sidecar.log
ma-user 18 1 0 Jan05 ? 00:00:00 python /home/ma-user/runtime-scripts-v2/init-container/outputs-handler.py
ma-user 28 18 0 Jan05 ? 00:00:00 python /home/ma-user/runtime-scripts-v2/init-container/outputs-handler.py
ma-user 102 0 0 Jan05 pts/0 00:00:00 bash
ma-user 1166 0 0 Jan06 pts/1 00:00:00 bash
ma-user 1216 1166 0 10:51 pts/1 00:00:00 ps -ef
ma-user at ma-test-eval-dac957b0-b43b-43e2-ab19-0b45672a7ea0-worker-0:/$ kill -SIGTERM 28
ma-user at ma-test-eval-dac957b0-b43b-43e2-ab19-0b45672a7ea0-worker-0:/$
ma-user at ma-test-eval-dac957b0-b43b-43e2-ab19-0b45672a7ea0-worker-0:/$ kill -SIGINT 28
ma-user at ma-test-eval-dac957b0-b43b-43e2-ab19-0b45672a7ea0-worker-0:/$
ma-user at ma-test-eval-dac957b0-b43b-43e2-ab19-0b45672a7ea0-worker-0:/$
finally, pool hang
----------
components: Library (Lib)
messages: 384560
nosy: huangzhesi
priority: normal
severity: normal
status: open
title: pool worker can't be terminated
type: behavior
versions: Python 3.7
_______________________________________
Python tracker <report at bugs.python.org>
<https://bugs.python.org/issue42849>
_______________________________________
More information about the New-bugs-announce
mailing list