On 30.09.2015 19:20, Nathaniel Smith wrote:
The challenges to providing transparent multithreading in numpy generally are:
- gcc + OpenMP on linux still breaks multiprocessing. There's a patch to fix this but they still haven't applied it; alternatively there's a workaround you can use in multiprocessing (not using fork mode), but this requires every user update their code and the workaround has other limitations. We're unlikely to use OpenMP while this is the case.
Ah, I didn't know this. Thanks.
- parallel code in general is not very composable. If someone is calling a numpy operation from one thread, great, transparently using multiple threads internally is a win. If they're exploiting some higher-level structure in their problem to break it into pieces and process each in parallel, and then using numpy on each piece, then numpy spawning threads internally will probably destroy performance. And numpy is too low-level to know which case it's in. This problem exists to some extent already with multi-threaded BLAS, so people use various BLAS-specific knobs to manage it in ad hoc ways, but this doesn't scale.
Very good point. I've had both kinds of use cases myself. It would be nice if there was some way to tell NumPy to either use additional threads or not, but that adds complexity. It's also not a good solution, considering that any higher-level code building on NumPy, if it is designed to be at all reusable, may find *itself* in either role. Only the code that, at any particular point of time in the development of a software project, happens to form the top level at that time, has the required context... Then again, the matter is further complicated by considering codes that run on a single machine, versus codes that run on a cluster. Threads being local to each node in a cluster, it may make sense in a solver targeted for a cluster to split the problem at the process level, distribute the processes across the network, and use the threading capability to accelerate computation on each node. A complex issue with probably no easy solutions :) -J