Re: [Numpy-discussion] Numpy arrays shareable among related processes (PR #7533)

May 11, 2016

      Hi,

I've been thinking and exploring this for some time. If we are to
start some effort I'd like to help. Here are my comments, mostly
regarding to Sturla's comments.

1. If we are talking about shared memory and copy-on-write
inheritance, then we are using 'fork'. If we are free to use fork,
then a large chunk of the concerns regarding the python std library
multiprocessing is no longer relevant. Especially those functions must
be in a module limitation that tends to impose a special requirement
on the software design.

2. Picking of inherited shared memory array can be done minimally by
just picking the array_interface and the pointer address. It is
because the child process and the parent share the same address space
layout, guarenteed by the fork call.

3. The RawArray and RawValue implementation in std multiprocessing has
its own memory allocator for managing small variables. It is a huge
overkill (in terms of implementation) if we only care about very large
memory chunks.

4. Hidden sychronization cost on multi-cpu (NUMA?) systems. A choice
is to defer the responsibility of avoiding racing to the developer.
Simple structs for working on slices of array in parallel can cover a
huge fraction of use cases and fully avoid this issue.

5. Whether to delegate parallelism to underlying low level
implementation or to implement the paralellism in python while
maintaining the underlying low level implementation sequential is
probably dependent on the problem. It may be convenient as of the
current state of parallelism support in Python to delegate, but will
it forever be the case?

For example, after the MPI FFTW binding stuck for a long time, someone
wrote a parallel python FFT package
(https://github.com/spectralDNS/mpiFFT4py)  that uses FFTW for
sequential and write all parallel semantics in Python with mpi4py, and
it uses a more efficient domain decomposition.

6. If we are to define a set of operations I would recommend take a
look at OpenMP as a reference -- It has been out there for decades and
used widely. An equiavlant to the 'omp parallel for' construct in
Python will be a very good starting point and immediately useful.

- Yu

On Wed, May 11, 2016 at 11:22 AM, Benjamin Root <ben.v.root@gmail.com> wrote:
...
Oftentimes, if one needs to share numpy arrays for multiprocessing, I would
imagine that it is because the array is huge, right? So, the pickling
approach would copy that array for each process, which defeats the purpose,
right?
Ben Root
On Wed, May 11, 2016 at 2:01 PM, Allan Haldane <allanhaldane@gmail.com>
wrote:
...
On 05/11/2016 04:29 AM, Sturla Molden wrote:
...
4. The reason IPC appears expensive with NumPy is because
multiprocessing
pickles the arrays. It is pickle that is slow, not the IPC. Some would
say
that the pickle overhead is an integral part of the IPC ovearhead, but i
will argue that it is not. The slowness of pickle is a separate problem
alltogether.
That's interesting. I've also used multiprocessing with numpy and didn't
realize that. Is this true in python3 too?
In python2 it appears that multiprocessing uses pickle protocol 0 which
must cause a big slowdown (a factor of 100) relative to protocol 2, and
uses pickle instead of cPickle.
a = np.arange(40*40)
%timeit pickle.dumps(a)
1000 loops, best of 3: 1.63 ms per loop
%timeit cPickle.dumps(a)
1000 loops, best of 3: 1.56 ms per loop
%timeit cPickle.dumps(a, protocol=2)
100000 loops, best of 3: 18.9 µs per loop
Python 3 uses protocol 3 by default:
%timeit pickle.dumps(a)
10000 loops, best of 3: 20 µs per loop
...
5. Share memory does not improve on the pickle overhead because also
NumPy
arrays with shared memory must be pickled. Multiprocessing can bypass
pickling the RawArray object, but the rest of the NumPy array is
pickled.
Using shared memory arrays have no speed advantage over normal NumPy
arrays
when we use multiprocessing.
6. It is much easier to write concurrent code that uses queues for
message
passing than anything else. That is why using a Queue object has been
the
popular Pythonic approach to both multitreading and multiprocessing. I
would like this to continue.
I am therefore focusing my effort on the multiprocessing.Queue object.
If
you understand the six points I listed you will see where this is going:
What we really need is a specialized queue that has knowledge about
NumPy
arrays and can bypass pickle. I am therefore focusing my efforts on
creating a NumPy aware queue object.
We are not doing the users a favor by encouraging the use of shared
memory
arrays. They help with nothing.
Sturla Molden
_______________________________________________
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
https://mail.scipy.org/mailman/listinfo/numpy-discussion
_______________________________________________
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
https://mail.scipy.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] Numpy arrays shareable among related processes (PR #7533)

Feng Yu