On 30 September 2015 at 18:20, Nathaniel Smith <njs@pobox.com> wrote:

- parallel code in general is not very composable. If someone is calling a numpy operation from one thread, great, transparently using multiple threads internally is a win. If they're exploiting some higher-level structure in their problem to break it into pieces and process each in parallel, and then using numpy on each piece, then numpy spawning threads internally will probably destroy performance. And numpy is too low-level to know which case it's in. This problem exists to some extent already with multi-threaded BLAS, so people use various BLAS-specific knobs to manage it in ad hoc ways, but this doesn't scale.

One idea: what about creating a "parallel numpy"? There are a few algorithms that can benefit from parallelisation. This library would mimic Numpy's signature, and the user would be responsible for choosing the single threaded or the parallel one by just changing np.function(x, y) to pnp.function(x, y)

If that were deemed a good one, what would be the best parallelisation scheme? OpenMP? Threads?