On Sat, Apr 12, 2014 at 12:07 AM, Sturla Molden
On 12/04/14 00:39, Nathaniel Smith wrote:
The spawn mode is fine and all, but (a) the presence of something in 3.4 helps only a minority of users, (b) "spawn" is not a full replacement for fork;
It basically does the same as on Windows. If you want portability to Windows, you must abide by these restrictions anyway.
Yes, but "sorry Unix guys, we've decided to take away this nice feature from you because it doesn't work on Windows" is a really terrible argument. If it can't be made to work, then fine, but fork safety is just not *that* much to ask.
with large read-mostly data sets it can be a *huge* win to load them into the parent process and then let them be COW-inherited by forked children.
The thing is that Python reference counts breaks COW fork. This has been discussed several times on the Python-dev list. What happens is that as soon as the child process updates a refcount, the OS copies the page. And because of how Python behaves, this copying of COW-marked pages quickly gets excessive. Effectively the performance of os.fork in Python will close to a non-COW fork. A suggested solution is to move the refcount out of the PyObject struct, and perhaps keep them in a dedicated heap. But doing so will be unfriendly to cache.
Yes, it's limited, but again this is not a reason to break it in the cases where it *does* work. The case where I ran into this was loading a big language model using SRILM: http://www.speech.sri.com/projects/srilm/ https://github.com/njsmith/pysrilm This produces a single Python object that references an opaque, tens-of-gigabytes mess of C++ objects. For this case explicit shared mem is useless, but fork worked brilliantly. -n -- Nathaniel J. Smith Postdoctoral researcher - Informatics - University of Edinburgh http://vorpus.org