
Trent Nelson <trent@snakebite.org> wrote:
The situation Ryan describes is literally the exact situation that PyParallel excels at: large reference data structures accessible in parallel contexts.
Back in 2009 I solved this for multiprocessing using a NumPy array that used shared memory as backend (Sys V IPC, not BSD mmap, on mac and Linux). By monkey-patching the pickling of numpy.ndarray, the contents of the shared memory buffer was not pickled, only the metadata needed to reopen the shared memory. After a while it stopped working on Mac (I haven't had time to fix it -- maybe I should), but it still works on Windows. :( Anyway, there is another library that does something similar called joblib. It is used for parallel computing in scikit-learn. It creates shared memory by mmap from /tmp, which means it is only shared memory on Linux. On Mac and Window there is no tmpfs so it ends up using a physical file on disk instead :-( Sturla