[Python-ideas] solving multi-core Python

Sturla Molden sturla.molden at gmail.com
Thu Jun 25 11:35:35 CEST 2015

Trent Nelson <trent at snakebite.org> wrote:

>     The situation Ryan describes is literally the exact situation
>     that PyParallel excels at: large reference data structures
>     accessible in parallel contexts.

Back in 2009 I solved this for multiprocessing using a NumPy array that
used shared memory as backend (Sys V IPC, not BSD mmap, on mac and Linux).
By monkey-patching the pickling of numpy.ndarray, the contents of the
shared memory buffer was not pickled, only the metadata needed to reopen
the shared memory. After a while it stopped working on Mac (I haven't had
time to fix it -- maybe I should), but it still works on Windows. :(

Anyway, there is another library that does something similar called joblib.
It is used for parallel computing in scikit-learn. It creates shared memory
by mmap from /tmp, which means it is only shared memory on Linux. On Mac
and Window there is no tmpfs so it ends up using a physical file on disk
instead :-(


