On 12/04/14 00:39, Nathaniel Smith wrote:
The spawn mode is fine and all, but (a) the presence of something in 3.4 helps only a minority of users, (b) "spawn" is not a full replacement for fork;
It basically does the same as on Windows. If you want portability to Windows, you must abide by these restrictions anyway.
with large read-mostly data sets it can be a *huge* win to load them into the parent process and then let them be COW-inherited by forked children.
The thing is that Python reference counts breaks COW fork. This has been discussed several times on the Python-dev list. What happens is that as soon as the child process updates a refcount, the OS copies the page. And because of how Python behaves, this copying of COW-marked pages quickly gets excessive. Effectively the performance of os.fork in Python will close to a non-COW fork. A suggested solution is to move the refcount out of the PyObject struct, and perhaps keep them in a dedicated heap. But doing so will be unfriendly to cache.
ATM the only other way to work with a data set that's larger than memory-divided-by-numcpus is to explicitly set up shared memory, and this is *really* hard for anything more complicated than a single flat array.
Not difficult. You just go to my GitHub site and grab the code ;) (I have some problems running it on my MBP though, not sure why, but it used to work on Linux and Windows, and possibly still does.) https://github.com/sturlamolden/sharedmem-numpy Sturla