Re: [Numpy-discussion] Cython-based OpenMP-accelerated quartic polynomial solver

Oct. 1, 2015

      On 01.10.2015 01:04, Sturla Molden wrote:
...
There are two principal problems with using OpenMP in NumPy:
One is that the GNU OpenMP threadpool is not fork-safe, and can break 
multiprocessing on some platforms (e.g. when using Python 2.7 on 
Linux). Anything that uses GCD has this nasty effect on Apple and 
FreeBSD as well. Note that the problem is actually in multiprocessing. 
It is not present on Windows (there is no fork system call) and it is 
avoidable even on Linux with Python 3.4 or later. Also the default 
builds of NumPy and SciPy on MacOSX uses GCD by default.
The other problem is that NumPy with its iterator objects and gufuncs 
is not particularly fit for multithreading. There will be a lot of 
contention for the iterator object. Without a major redesign, one 
thread would do useful work while most of the others would be 
busy-waiting on a spinlock and fighting for iterator access. Not to 
mention that the iterator object would be false shared between the 
processors, which would trash the performance if you have more than 
one CPU, even when compared to using a single thread. This means that 
for multithreading to be useful in NumPy, the loops will have to be 
redesigned so that the work sharing between the threads is taken care 
of ahead of creating the iterator, i.e. allowing us to use one 
iterator per thread. Each iterator would then iterate over a slice of 
the original array. This is ok, but we cannot simply do this by adding 
OpenMP pragmas in the C code.
Thanks for the details.
...
Given the two problems mentioned above, it would likely be better to 
use a fork-safe threadpool instead of OpenMP.
Sounds good. Out of curiosity, are there any standard fork-safe 
threadpools, or would this imply rolling our own?

Also, after Chris's, Nathaniel's and your replies, I'm no longer sure an 
attempt at transparent multithreading belongs in the polynomial solver. 
Although the additional performance would be (very) nice when running it 
from a single-threaded program - after all, quad-cores are common these 
days - it is not just this specific solver that encounters the issue, 
but instead the same considerations apply to basically all NumPy operations.

So maybe it would be better, at least at first, to make a pure-Cython 
version with no attempt at multithreading?

  -J

Re: [Numpy-discussion] Cython-based OpenMP-accelerated quartic polynomial solver

Juha Jeronen