
On 24/06/15 13:43, M.-A. Lemburg wrote:
That said, I still think the multiple-process is a better one (more robust, more compatible, fewer problems). We'd just need a way more efficient approach to sharing objects between the Python processes than using pickle and shared memory or pipes :-)
It is hard to get around shared memory, Unix domain sockets, or pipes. There must be some sort of IPC, regardless. One idea I have played with is to use a specialized queue instead of the current multiprocessing.Queue. In scientific computing we often need to pass arrays, so it would make sense to have a queue that could bypass pickle for NumPy arrays, scalars and dtypes, simply by using the NumPy C API to process the data. It could also have specialized code for a number of other objects -- at least str, int, float, complex, and PEP 3118 buffers, but perhaps also simple lists, tuples and dicts with these types. I think it should be possible to make a queue that would avoid the pickle issue for 99 % of scientific computing. It would be very easy to write such a queue with Cython and e.g. have it as a part of NumPy or SciPy. One thing I did some years ago was to have NumPy arrays that would store the data in shared memory. And when passed to multiprocessing.Queue they would not pickle the data buffer, only the metadata. However this did not improve on performance, because the pickle overhead was still there, and passing a lot of binary data over a pipe was not comparably expensive. So while it would save memory, it did not make programs using multiprocessing and NumPy more efficient. Sturla