[Numpy-discussion] OT: performance in C extension; OpenMP, or SSE ?

Thu Feb 17 10:31:13 EST 2011

It may also be the sizes of the chunk OMP uses. You can/should specify
them.in the OMP pragma so that it is a multiple of the cache line size or
something close.

Matthieu

2011/2/17 Sebastian Haase <seb.haase at gmail.com>

> Hi,
> More surprises:
> shaase at iris:~/code/SwiggedDistOMP: gcc -O3 -c the_lib.c -fPIC -fopenmp
> -ffast-math
> shaase at iris:~/code/SwiggedDistOMP: gcc -shared -o the_lib.so the_lib.o
> -lgomp -lm
> shaase at iris:~/code/SwiggedDistOMP: priithon the_python_prog.py
> c_threads 0  time  0.000437839031219    # this is now, without
> #pragma omp parallel for ...
> c_threads 1  time  0.000865449905396
> c_threads 2  time  0.000520548820496
> c_threads 3  time  0.00033704996109
> c_threads 4  time  0.000620169639587
> c_threads 5  time  0.000465350151062
> c_threads 6  time  0.000696349143982
>
> This correct now the timing of, max OpenMP speed (3 threads) vs. no
> OpenMP to speedup of (only!) 1.3x
> Not 2.33x (which was the number I got when comparing OpenMP to the
> cdist function).
> The c code is now:
>
> the_lib.c
>
> ------------------------------------------------------------------------------------------
> #include <stdio.h>
> #include <time.h>
> #include <omp.h>
> #include <math.h>
>
> void dists2d(      double *a_ps, int na,
>                  double *b_ps, int nb,
>                  double *dist, int num_threads)
> {
>
>   int i, j;
>    double ax,ay, dif_x, dif_y;
>   int nx1=2;
>   int nx2=2;
>
>    if(num_threads>0)
>          {
>   int dynamic=0;
>   omp_set_dynamic(dynamic);
>    omp_set_num_threads(num_threads);
>
>
> #pragma omp parallel for private(j, i,ax,ay, dif_x, dif_y)
>        for(i=0;i<na;i++)
>         {
>               ax=a_ps[i*nx1];
>                ay=a_ps[i*nx1+1];
>               for(j=0;j<nb;j++)
>                 {     dif_x = ax - b_ps[j*nx2];
>                        dif_y = ay - b_ps[j*nx2+1];
>                        dist[2*i+j]  = sqrt(dif_x*dif_x+dif_y*dif_y);
>                 }
>         }
>          } else {
>        for(i=0;i<na;i++)
>         {
>               ax=a_ps[i*nx1];
>                ay=a_ps[i*nx1+1];
>               for(j=0;j<nb;j++)
>                 {     dif_x = ax - b_ps[j*nx2];
>                        dif_y = ay - b_ps[j*nx2+1];
>                        dist[2*i+j]  = sqrt(dif_x*dif_x+dif_y*dif_y);
>                 }
>         }
>   }
> }
> ------------------------------------------------------------------
> $ gcc -O3 -c the_lib.c -fPIC -fopenmp -ffast-math
> $ gcc -shared -o the_lib.so the_lib.o -lgomp -lm
>
> So, I guess I found a way of getting rid of the OpenMP overhead when
> run with 1 thread,
> and found that - if measured correctly, using same compiler settings
> and so on - the speedup is so small that there no point in doing
> OpenMP - again.
> (For my case, having (only) 4 cores)
>
>
> Cheers,
> Sebastian.
>
>
>
> On Thu, Feb 17, 2011 at 10:57 AM, Matthieu Brucher
> <matthieu.brucher at gmail.com> wrote:
> >
> >> Then, where does the overhead come from ? --
> >> The call to    omp_set_dynamic(dynamic);
> >> Or the
> >> #pragma omp parallel for private(j, i,ax,ay, dif_x, dif_y)
> >
> > It may be this. You initialize a thread pool, even if it has only one
> > thread, and there is the dynamic part, so OpenMP may create several
> chunks
> > instead of one big chunk.
> >
> > Matthieu
> > --
> > Information System Engineer, Ph.D.
> > Blog: http://matt.eifelle.com
> > LinkedIn: http://www.linkedin.com/in/matthieubrucher
> >
> > _______________________________________________
> > NumPy-Discussion mailing list
> > NumPy-Discussion at scipy.org
> > http://mail.scipy.org/mailman/listinfo/numpy-discussion
> >
> >
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>

-- 
Information System Engineer, Ph.D.
Blog: http://matt.eifelle.com
LinkedIn: http://www.linkedin.com/in/matthieubrucher
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20110217/b5af883c/attachment.html>