[Numpy-discussion] OT: performance in C extension; OpenMP, or SSE ?

Thu Feb 17 04:39:38 EST 2011

On Thu, Feb 17, 2011 at 10:29 AM, Matthieu Brucher
<matthieu.brucher at gmail.com> wrote:
>
>> Do you think, one could get even better ?
>> And, where does the 7% slow-down (for single thread) come from ?
>> Is it possible to have the OpenMP option in a code, without _any_
>> penalty for 1 core machines ?
>
> There will always be a penalty for parallel code that runs on one core. You
> have at least the overhead for splitting the data.
>
I was referring to when
num_threads=1; // and
omp_set_num_threads(num_threads);
is explicitly called.

Then, where does the overhead come from ? --
The call to    omp_set_dynamic(dynamic);
Or the
#pragma omp parallel for private(j, i,ax,ay, dif_x, dif_y)
or some magic done by
gcc ... -fopenmp
?
(I'm referring to Eric Carlson's earlier in this thread)

I'm wondering if one could have a C "if"-statement, e.g.
if(num_threads == 0)   to then not do any of the omp_xxx() calls.
Obviously, the #pragma would have to be replaceable by some omp_xxx() call first

Thanks,
- Sebastian