[Numpy-discussion] Cython-based OpenMP-accelerated quartic polynomial solver

Juha Jeronen juha.jeronen at jyu.fi
Fri Oct 2 05:58:49 EDT 2015


On 01.10.2015 03:52, Sturla Molden wrote:
> On 01/10/15 02:32, Juha Jeronen wrote:
>
>> Sounds good. Out of curiosity, are there any standard fork-safe
>> threadpools, or would this imply rolling our own?
>
> You have to roll your own.
>
> Basically use pthreads_atfork to register a callback that shuts down 
> the threadpool before a fork and another callback that restarts it. 
> Python's threading module does not expose the pthreads_atfork 
> function, so you must call it from Cython.
>
> I believe there is also a tiny atfork module in PyPI.

Ok. Thanks. This approach fixes the issue of the threads not being there 
for the child process.

I think it still leaves open the issue of creating the correct number of 
threads in the pools for each of the processes when the pool is 
restarted (so that in total there will be as many threads as cores 
(physical or virtual, whichever the user desires)). But this is again 
something that requires context...


>> So maybe it would be better, at least at first, to make a pure-Cython
>> version with no attempt at multithreading?
>
> I would start by making a pure Cython version that works correctly. 
> The next step would be to ensure that it releases the GIL. After that 
> you can worry about parallel processing, or just tell the user to use 
> threads or joblib.

First version done and uploaded:

https://yousource.it.jyu.fi/jjrandom2/miniprojects/trees/master/misc/polysolve_for_numpy

OpenMP support removed; this version uses only Cython.

The example program has been renamed to main.py, and setup.py has been 
cleaned, removing the irrelevant module.

This folder contains only the files for the polynomial solver.


As I suspected, removing OpenMP support only required changing a few 
lines, and dropping the import for Cython.parallel. The "prange"s have 
been replaced with "with nogil" and "range".

Note that both the original version and this version release the GIL 
when running the processing loops.


It may be better to leave this single-threaded for now. Using Python 
threads isn't that difficult and joblib sounds nice, too.

What's the next step?

  -J




More information about the NumPy-Discussion mailing list