[Numpy-discussion] OT: performance in C extension; OpenMP, or SSE ?
seb.haase at gmail.com
Thu Feb 17 04:39:38 EST 2011
On Thu, Feb 17, 2011 at 10:29 AM, Matthieu Brucher
<matthieu.brucher at gmail.com> wrote:
>> Do you think, one could get even better ?
>> And, where does the 7% slow-down (for single thread) come from ?
>> Is it possible to have the OpenMP option in a code, without _any_
>> penalty for 1 core machines ?
> There will always be a penalty for parallel code that runs on one core. You
> have at least the overhead for splitting the data.
I was referring to when
num_threads=1; // and
is explicitly called.
Then, where does the overhead come from ? --
The call to omp_set_dynamic(dynamic);
#pragma omp parallel for private(j, i,ax,ay, dif_x, dif_y)
or some magic done by
gcc ... -fopenmp
(I'm referring to Eric Carlson's earlier in this thread)
I'm wondering if one could have a C "if"-statement, e.g.
if(num_threads == 0) to then not do any of the omp_xxx() calls.
Obviously, the #pragma would have to be replaceable by some omp_xxx() call first
More information about the NumPy-Discussion