[issue9205] Parent process hanging in multiprocessing if children terminate unexpectedly

Greg Brockman report at bugs.python.org
Fri Aug 13 22:53:15 CEST 2010


Greg Brockman <gdb at mit.deu> added the comment:

I'll take another stab at this.  In the attachment (assign-tasks.patch), I've combined a lot of the ideas presented on this issue, so thank you both for your input.  Anyway:

- The basic idea of the patch is to record the mapping of tasks to workers.  I've added a protocol between the parent process and the workers that allows this to happen without adding a race condition between recording the task and the child dying.
- If a child unexpectedly dies, the worker_handler pretends that all of the jobs currently assigned to it raised a RuntimeError.  (Multiple jobs can be assigned to a single worker if the result handler is being slow.)
- The guarantee I try to provide is that each job will be started at most once.  There is enough information to instead ensure that each job is run exactly once, but in general whether that's acceptable or useful is really only known at the application level.

Some notes:
- I haven't implemented this for approach for the ThreadPool yet.
- The test suite runs but occasionally hangs on shutting down the pool in Ask's tests in multiprocessing-trunk at 82502-termination-trackjobs.patch.  My experiments seem to indicate this is due to a worker dying while holding a queue lock.  So I think a next step is to deal with workers dying while holding a queue lock, although this seems unlikely in practice.  I have some ideas as to how you could fix this, if we decide it's worth trying.

Anyway, please let me know what you think of this approach/sample implementation.  If we decide that this seems promising, I'd be happy to built it out further.

----------
Added file: http://bugs.python.org/file18513/assign-tasks.patch

_______________________________________
Python tracker <report at bugs.python.org>
<http://bugs.python.org/issue9205>
_______________________________________


More information about the Python-bugs-list mailing list