[Numpy-discussion] Cython-based OpenMP-accelerated quartic polynomial solver

Daπid davidmenhur at gmail.com
Thu Oct 1 02:54:14 EDT 2015

On 30 September 2015 at 18:20, Nathaniel Smith <njs at pobox.com> wrote:

> On Sep 30, 2015 2:28 AM, "Daπid" <davidmenhur at gmail.com> wrote:
> [...]
> > Is there a nice way to ship both versions? After all, most
> implementations of BLAS and friends do spawn OpenMP threads, so I don't
> think it would be outrageous to take advantage of it in more places;
> provided there is a nice way to fallback to a serial version when it is not
> available.
> This is incorrect -- the only common implementation of BLAS that uses
> *OpenMP* threads is OpenBLAS, and even then it's not the default -- it only
> happens if you run it in a special non-default configuration.
Right, sorry. I wanted to say they spawn parallel threads. What do you mean
by a non default configuration? Setting he OMP_NUM_THREADS?

> The challenges to providing transparent multithreading in numpy generally
> are:
> - gcc + OpenMP on linux still breaks multiprocessing. There's a patch to
> fix this but they still haven't applied it; alternatively there's a
> workaround you can use in multiprocessing (not using fork mode), but this
> requires every user update their code and the workaround has other
> limitations. We're unlikely to use OpenMP while this is the case.
Any idea when is this going to be released?

As I understand it, OpenBLAS doesn't have this problem, am I right?

> - parallel code in general is not very composable. If someone is calling a
> numpy operation from one thread, great, transparently using multiple
> threads internally is a win. If they're exploiting some higher-level
> structure in their problem to break it into pieces and process each in
> parallel, and then using numpy on each piece, then numpy spawning threads
> internally will probably destroy performance. And numpy is too low-level to
> know which case it's in. This problem exists to some extent already with
> multi-threaded BLAS, so people use various BLAS-specific knobs to manage it
> in ad hoc ways, but this doesn't scale.
> (Ironically OpenMP is more composable then most approaches to threading,
> but only if everyone is using it and, as per above, not everyone is and we
> currently can't.)
That is what I meant with providing also a single threaded version.
The user can
choose if they want the parallel or the serial, depending on the case.
choose if they want the parallel or the serial, depending on the case.
