[Numpy-discussion] Numpy arrays shareable among related processes (PR #7533)

Wed May 11 10:41:46 EDT 2016

Strula, this sounds brilliant!  To be clear, you're talking about
serializing the numpy array and reconstructing it in a way that's faster
than pickle? Or using shared memory and signaling array creation around
that shared memory rather than using pickle?

For what it's worth, I have used shared memory with numpy arrays as IPC (no
queue), with one process writing to it and one process reading from it, and
liked it.  Your point #5 did not apply because I was reusing the shared
memory.

Do you have a public repo where you are working on this?

Thanks!
  Elliot

On Wed, May 11, 2016 at 3:29 AM, Sturla Molden <sturla.molden at gmail.com>
wrote:

> I did some work on this some years ago. I have more or less concluded that
> it was a waste of effort. But first let me explain what the suggested
> approach do not work. As it uses memory mapping to create shared memory
> (i.e. shared segments are not named), they must be created ahead of
> spawning processes. But if you really want this to work smoothly, you want
> named shared memory (Sys V IPC or posix shm_open), so that shared arrays
> can be created in the spawned processes and passed back.
>
> Now for the reason I don't care about shared memory arrays anymore, and
> what I am currently working on instead:
>
> 1. I have come across very few cases where threaded code cannot be used in
> numerical computing. In fact, multithreading nearly always happens in the
> code where I write pure C or Fortran anyway. Most often it happens in
> library code that are already multithreaded (Intel MKL, Apple Accelerate
> Framework, OpenBLAS, etc.), which means using it requires no extra effort
> from my side. A multithreaded LAPACK library is not less multithreaded if I
> call it from Python.
>
> 2. Getting shared memory right can be difficult because of hierarchical
> memory and false sharing. You might not see it if you only have a multicore
> CPU with a shared cache. But your code might not scale up on computers with
> more than one physical processor. False sharing acts like the GIL, except
> it happens in hardware and affects your C code invisibly without any
> explicit locking you can pinpoint. This is also why MPI code tends to scale
> much better than OpenMP code. If nothing is shared there will be no false
> sharing.
>
> 3. Raw C level IPC is cheap – very, very cheap. Even if you use pipes or
> sockets instead of shared memory it is cheap. There are very few cases
> where the IPC tends to be a bottleneck.
>
> 4. The reason IPC appears expensive with NumPy is because multiprocessing
> pickles the arrays. It is pickle that is slow, not the IPC. Some would say
> that the pickle overhead is an integral part of the IPC ovearhead, but i
> will argue that it is not. The slowness of pickle is a separate problem
> alltogether.
>
> 5. Share memory does not improve on the pickle overhead because also NumPy
> arrays with shared memory must be pickled. Multiprocessing can bypass
> pickling the RawArray object, but the rest of the NumPy array is pickled.
> Using shared memory arrays have no speed advantage over normal NumPy arrays
> when we use multiprocessing.
>
> 6. It is much easier to write concurrent code that uses queues for message
> passing than anything else. That is why using a Queue object has been the
> popular Pythonic approach to both multitreading and multiprocessing. I
> would like this to continue.
>
> I am therefore focusing my effort on the multiprocessing.Queue object. If
> you understand the six points I listed you will see where this is going:
> What we really need is a specialized queue that has knowledge about NumPy
> arrays and can bypass pickle. I am therefore focusing my efforts on
> creating a NumPy aware queue object.
>
> We are not doing the users a favor by encouraging the use of shared memory
> arrays. They help with nothing.
>
>
> Sturla Molden
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20160511/21cc04c2/attachment.html>