[Numpy-discussion] Cython-based OpenMP-accelerated quartic polynomial solver

Nathaniel Smith njs at pobox.com
Thu Oct 1 03:05:22 EDT 2015


On Wed, Sep 30, 2015 at 11:54 PM, Daπid <davidmenhur at gmail.com> wrote:
>
>
> On 30 September 2015 at 18:20, Nathaniel Smith <njs at pobox.com> wrote:
>>
>> On Sep 30, 2015 2:28 AM, "Daπid" <davidmenhur at gmail.com> wrote:
>> [...]
>> > Is there a nice way to ship both versions? After all, most
>> > implementations of BLAS and friends do spawn OpenMP threads, so I don't
>> > think it would be outrageous to take advantage of it in more places;
>> > provided there is a nice way to fallback to a serial version when it is not
>> > available.
>>
>> This is incorrect -- the only common implementation of BLAS that uses
>> *OpenMP* threads is OpenBLAS, and even then it's not the default -- it only
>> happens if you run it in a special non-default configuration.
>
> Right, sorry. I wanted to say they spawn parallel threads. What do you mean
> by a non default configuration? Setting he OMP_NUM_THREADS?

I don't remember the details -- I think it might be a special setting
you have to enable when you build OpenBLAS.

>> The challenges to providing transparent multithreading in numpy generally
>> are:
>>
>> - gcc + OpenMP on linux still breaks multiprocessing. There's a patch to
>> fix this but they still haven't applied it; alternatively there's a
>> workaround you can use in multiprocessing (not using fork mode), but this
>> requires every user update their code and the workaround has other
>> limitations. We're unlikely to use OpenMP while this is the case.
>
> Any idea when is this going to be released?

Which? The gcc patch? I spent 2 full release cycles nagging them and
they still can't be bothered to make a decision either way, so :-(. If
anyone has some ideas for how to get traction in gcc-land then I'm
happy to pass on details...

> As I understand it, OpenBLAS doesn't have this problem, am I right?

Right, in the default configuration then OpenBLAS will use its own
internal thread pool code, and that code has the fixes needed to work
with fork-based multiprocessing. Of course if you configure OpenBLAS
to use OpenMP instead of its internal thread code then this no longer
applies...

-n

-- 
Nathaniel J. Smith -- http://vorpus.org



More information about the NumPy-Discussion mailing list