[Numpy-discussion] IDL vs Python parallel computing

Wed May 7 20:48:20 EDT 2014

Just a quick question/possibility.

What about just parallelizing ufunc with only 1 inputs that is c or fortran
contiguous like trigonometric function? Is there a fast path in the ufunc
mechanism when the input is fortran/c contig? If that is the case, it would
be relatively easy to add an openmp pragma to parallelize that loop, with a
condition to a minimum number of element.

Anyway, I won't do it. I'm just outlining what I think is the most easy
case(depending of NumPy internal that I don't now enough) to implement and
I think the most frequent (so possible a quick fix for someone with the
knowledge of that code).

In Theano, we found in a few CPUs for the addition we need a minimum of
200k element for the parallelization of elemwise to be useful. We use that
number by default for all operation to make it easy. This is user
configurable. This warenty that with current generation, the threading
don't slow thing down. I think that this is more important, don't show user
slow down by default with a new version.

Fred

On Wed, May 7, 2014 at 2:27 PM, Julian Taylor <jtaylor.debian at googlemail.com
> wrote:

> On 07.05.2014 20:11, Sturla Molden wrote:
> > On 03/05/14 23:56, Siegfried Gonzi wrote:
> >
> > A more technical answer is that NumPy's internals does not play very
> > nicely with multithreading. For examples the array iterators used in
> > ufuncs store an internal state. Multithreading would imply an excessive
> > contention for this state, as well as induce false sharing of the
> > iterator object. Therefore, a multithreaded NumPy would have performance
> > problems due to synchronization as well as hierachical memory
> > collisions. Adding multithreading support to the current NumPy core
> > would just degrade the performance. NumPy will not be able to use
> > multithreading efficiently unless we redesign the iterators in NumPy
> > core. That is a massive undertaking which prbably means rewriting most
> > of NumPy's core C code. A better strategy would be to monkey-patch some
> > of the more common ufuncs with multithreaded versions.
>
>
> I wouldn't say that the iterator is a problem, the important iterator
> functions are threadsafe and there is support for multithreaded
> iteration using NpyIter_Copy so no data is shared between threads.
>
> I'd say the main issue is that there simply aren't many functions worth
> parallelizing in numpy. Most the commonly used stuff is already memory
> bandwidth bound with only one or two threads.
> The only things I can think of that would profit is sorting/partition
> and the special functions like sqrt, exp, log, etc.
>
> Generic efficient parallelization would require merging of operations
> improve the FLOPS/loads ratio. E.g. numexpr and theano are able to do so
> and thus also has builtin support for multithreading.
>
> That being said you can use Python threads with numpy as (especially in
> 1.9) most expensive functions release the GIL. But unless you are doing
> very flop intensive stuff you will probably have to manually block your
> operations to the last level cache size if you want to scale beyond one
> or two threads.
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20140507/a5c845f5/attachment.html>