[Numpy-discussion] Numpy arrays shareable among related processes (PR #7533)

Matěj Týč matej.tyc at gmail.com
Tue May 17 05:04:07 EDT 2016


On 11.5.2016 10:29, Sturla Molden wrote:
> I did some work on this some years ago. ...
>
I am sorry, I have missed this discussion when it started.

There are two cases when I had feeling that I had to use this functionality:

 - Parallel processing of HUGE data, and

 - using parallel processing in an application that had plug-ins which
operated on one shared array (that was updated every one and then - it
was a producer-consumer pattern thing). As everything got set up, it
worked like a charm.

The thing I especially like about the proposed module is the lack of
external dependencies + it works if one knows how to use it.

The bad thing about it is its fragility - I admit that using it as it is
is not particularly intuitive. Unlike Sturla, I think that this is not a
dead end, but it indeed feels clumsy. However, I dislike the necessity
of writing Cython or C to get true multithreading for reasons I have
mentioned - what if you want to run high-level Python functions in parallel?

So, what I would really like to see is some kind of numpy documentation
on how to approach parallel computing with numpy arrays (depending on
what kind of task one wants to achieve). Maybe just using the queue is
good enough, or there are those 3-rd party modules with known
limitations? Plenty of people start off with numpy, so some kind of
overview should be part of numpy docs.





More information about the NumPy-Discussion mailing list