[issue29795] Clarify how to share multiprocessing primitives

New submission from Max: It seems both me and many other people (judging from SO questions) are confused about whether it's ok to write this: from multiprocessing import Process, Queue q = Queue() def f(): q.put([42, None, 'hello']) def main(): p = Process(target=f) p.start() print(q.get()) # prints "[42, None, 'hello']" p.join() if __name__ == '__main__': main() It's not ok (doesn't work on Windows presumably because somehow when it's pickled, the connection between global queues in the two processes is lost; works on Linux, because I guess fork keeps more information than pickle, so the connection is maintained). I thought it would be good to clarify in the docs that all the Queue() and Manager().* and other similar objects should be passed as parameters not just defined as globals. ---------- assignee: docs@python components: Documentation messages: 289454 nosy: docs@python, max priority: normal severity: normal status: open title: Clarify how to share multiprocessing primitives type: behavior versions: Python 3.6 _______________________________________ Python tracker <report@bugs.python.org> <http://bugs.python.org/issue29795> _______________________________________

Davin Potts added the comment: On Windows, because that OS does not support fork, multiprocessing uses spawn to create new processes by default. Note that in Python 3, multiprocessing provides the user with a choice of how to create new processes (i.e. fork, spawn, forkserver). When fork is used, the 'q = Queue()' in this example would be executed once by the parent process before the fork takes place, the resulting child process continues execution from the same point as the parent when it triggered the fork, and thus both parent and child processes would see the same multiprocessing.Queue. When spawn is used, a new process is spawned and the whole of this example script would be executed again from scratch by the child process, resulting in the child (spawned) process creating a new Queue object of its own with no sense of connection to the parent. Would you be up for proposing replacement text to improve the documentation? Getting the documentation just right so that everyone understands it is worth spending time on. ---------- nosy: +davin stage: -> needs patch type: behavior -> enhancement _______________________________________ Python tracker <report@bugs.python.org> <http://bugs.python.org/issue29795> _______________________________________

Max added the comment: How about inserting this text somewhere: Note that sharing and synchronization objects (such as `Queue()`, `Pipe()`, `Manager()`, `Lock()`, `Semaphore()`) should be made available to a new process by passing them as arguments to the `target` function invoked by the `run()` method. Making these objects visible through global variables will only work when the process was started using `fork` (and as such sacrifices portability for no special benefit). ---------- _______________________________________ Python tracker <report@bugs.python.org> <http://bugs.python.org/issue29795> _______________________________________

Max added the comment: Somewhat related is this statement from Programming Guidelines:
When using the spawn or forkserver start methods many types from multiprocessing need to be picklable so that child processes can use them. However, one should generally avoid sending shared objects to other processes using pipes or queues. Instead you should arrange the program so that a process which needs access to a shared resource created elsewhere can inherit it from an ancestor process.
Since on Windows, even "inheritance" is really the same pickle + pipe executed inside CPython, I assume the entire paragraph is intended for UNIX platform only (might be worth clarifying, btw). On Linux, "inheritance" works faster, and can deal with more complex objects compared to pickle with pipe/queue -- but it's equally true whether it's inheritance through global variables or through arguments to the target function. There's no reason So the text I proposed earlier wouldn't conflict with this one. It would just encourage programmers to use function arguments instead of global variables: because it's doesn't matter on Linux but makes the code portable to Windows. ---------- _______________________________________ Python tracker <report@bugs.python.org> <http://bugs.python.org/issue29795> _______________________________________

Max added the comment: Actually, never mind, I think one of the paragraphs in the Programming Guidelines ("Explicitly pass resources to child processes") basically explains everything already. I just didn't notice it until @noxdafox pointed it out to me on SO. Close please. ---------- _______________________________________ Python tracker <report@bugs.python.org> <http://bugs.python.org/issue29795> _______________________________________

Changes by Davin Potts <python@discontinuity.net>: ---------- resolution: -> works for me stage: needs patch -> resolved status: open -> closed _______________________________________ Python tracker <report@bugs.python.org> <http://bugs.python.org/issue29795> _______________________________________
participants (2)
-
Davin Potts
-
Max