[Numpy-discussion] Numpy arrays shareable among related processes (PR #7533)

Feng Yu rainwoodman at gmail.com
Thu May 12 16:46:14 EDT 2016


> Again, not everyone uses Unix.
>
> And on Unix it is not trival to pass data back from the child process. I
> solved that problem with Sys V IPC (pickling the name of the segment).
>

I wonder if it is neccessary insist being able to pass large amount of data
back from child to the parent process.

In most (half?) situations the result can be directly write back via
preallocated shared array before works are spawned. Then there is no
need to pass data back with named segments.

Here I am just doodling some possible use cases along the OpenMP line.
The sample would just copy the data from s to r, in two different
ways. On systems that does not support multiprocess + fork, the
semantics is still well preserved if threading is used.

```
import ...... as mp

# the access attribute of inherited variables is at least 'privatecopy'
# but with threading backend it becomes 'shared'
s = numpy.arange(10000)

with mp.parallel(num_threads=8) as section:
    r = section.empty(10000) # variables defined via section.empty
will always be 'shared'
    def work():
         # variables defined in the body is 'private'
         tid = section.get_thread_num()
         size = section.get_num_threads()
         sl = slice(tid * r.size // size, (tid + 1) * r.size // size)
         r[sl] = s[sl]

    status = section.run(work)
    assert not any(status.errors)

    # the support to the following could be implemented with section.run

    chunksize = 1000
    def work(i):
          sl = slice(i, i + chunksize)
          r[sl] = s[sl]
          return s[sl].sum()
    status = section.loop(work, range(0, r.size, chunksize), schedule='static')
    assert not any(status.errors)
    total = sum(status.results)
```

>> 6. If we are to define a set of operations I would recommend take a
>> look at OpenMP as a reference -- It has been out there for decades and
>> used widely. An equiavlant to the 'omp parallel for' construct in
>> Python will be a very good starting point and immediately useful.
>
> If you are on Unix, you can just use a context manager. Call os.fork in
> __enter__ and os.waitpid in __exit__.
>
> Sturla
>
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at scipy.org
> https://mail.scipy.org/mailman/listinfo/numpy-discussion



More information about the NumPy-Discussion mailing list