[Python-ideas] solving multi-core Python
Sturla Molden
sturla.molden at gmail.com
Wed Jun 24 18:58:01 CEST 2015
On 24/06/15 13:43, M.-A. Lemburg wrote:
> That said, I still think the multiple-process is a better one (more
> robust, more compatible, fewer problems). We'd just need a way more
> efficient approach to sharing objects between the Python processes
> than using pickle and shared memory or pipes :-)
It is hard to get around shared memory, Unix domain sockets, or pipes.
There must be some sort of IPC, regardless.
One idea I have played with is to use a specialized queue instead of the
current multiprocessing.Queue. In scientific computing we often need to
pass arrays, so it would make sense to have a queue that could bypass
pickle for NumPy arrays, scalars and dtypes, simply by using the NumPy C
API to process the data. It could also have specialized code for a
number of other objects -- at least str, int, float, complex, and PEP
3118 buffers, but perhaps also simple lists, tuples and dicts with these
types. I think it should be possible to make a queue that would avoid
the pickle issue for 99 % of scientific computing. It would be very easy
to write such a queue with Cython and e.g. have it as a part of NumPy or
SciPy.
One thing I did some years ago was to have NumPy arrays that would store
the data in shared memory. And when passed to multiprocessing.Queue they
would not pickle the data buffer, only the metadata. However this did
not improve on performance, because the pickle overhead was still there,
and passing a lot of binary data over a pipe was not comparably
expensive. So while it would save memory, it did not make programs using
multiprocessing and NumPy more efficient.
Sturla
More information about the Python-ideas
mailing list