[Python-ideas] solving multi-core Python

Wed Jun 24 18:58:01 CEST 2015

On 24/06/15 13:43, M.-A. Lemburg wrote:

> That said, I still think the multiple-process is a better one (more
> robust, more compatible, fewer problems). We'd just need a way more
> efficient approach to sharing objects between the Python processes
> than using pickle and shared memory or pipes :-)

It is hard to get around shared memory, Unix domain sockets, or pipes. 
There must be some sort of IPC, regardless.

One idea I have played with is to use a specialized queue instead of the 
current multiprocessing.Queue. In scientific computing we often need to 
pass arrays, so it would make sense to have a queue that could bypass 
pickle for NumPy arrays, scalars and dtypes, simply by using the NumPy C 
API to process the data. It could also have specialized code for a 
number of other objects -- at least str, int, float, complex, and PEP 
3118 buffers, but perhaps also simple lists, tuples and dicts with these 
types. I think it should be possible to make a queue that would avoid 
the pickle issue for 99 % of scientific computing. It would be very easy 
to write such a queue with Cython and e.g. have it as a part of NumPy or 
SciPy.

One thing I did some years ago was to have NumPy arrays that would store 
the data in shared memory. And when passed to multiprocessing.Queue they 
would not pickle the data buffer, only the metadata. However this did 
not improve on performance, because the pickle overhead was still there, 
and passing a lot of binary data over a pipe was not comparably 
expensive. So while it would save memory, it did not make programs using 
multiprocessing and NumPy more efficient.

Sturla