[Numpy-discussion] OT: performance in C extension; OpenMP, or SSE ?
seb.haase at gmail.com
Tue Feb 15 11:19:45 EST 2011
using __restrict__ with g++ did not change anything. How do I use
valgrind with C extensions?
I don't know what "PAPI profil" is ...?
On Tue, Feb 15, 2011 at 4:54 PM, Matthieu Brucher
<matthieu.brucher at gmail.com> wrote:
> My first move would be to add a restrict keyword to dist (i.e. dist is the
> only pointer to the specific memory location), and then declare dist_ inside
> the first loop also with a restrict.
> Then, I would run valgrind or a PAPI profil on your code to see what causes
> the issue (false sharing, ...)
> 2011/2/15 Sebastian Haase <seb.haase at gmail.com>
>> I assume that someone here could maybe help me, and I'm hoping it's
>> not too much off topic.
>> I have 2 arrays of 2d point coordinates and would like to calculate
>> all pairwise distances as fast as possible.
>> Going from Python/Numpy to a (Swigged) C extension already gave me a
>> 55x speedup.
>> (.9ms vs. 50ms for arrays of length 329 and 340).
>> I'm using gcc on Linux.
>> Now I'm wondering if I could go even faster !?
>> My hope that the compiler might automagically do some SSE2
>> optimization got disappointed.
>> Since I have a 4 core CPU I thought OpenMP might be an option;
>> I never used that, and after some playing around I managed to get
>> (only) 50% slowdown(!) :-(
>> My code in short is this:
>> (My SWIG typemaps use obj_to_array_no_conversion() from numpy.i)
>> -------<Ccode> ----------
>> void dists2d(
>> double *a_ps, int nx1, int na,
>> double *b_ps, int nx2, int nb,
>> double *dist, int nx3, int ny3) throw (char*)
>> if(nx1 != 2) throw (char*) "a must be of shape (n,2)";
>> if(nx2 != 2) throw (char*) "b must be of shape (n,2)";
>> if(nx3 != nb || ny3 != na) throw (char*) "c must be of shape (na,nb)";
>> double *dist_;
>> int i, j;
>> #pragma omp parallel private(dist_, j, i)
>> #pragma omp for nowait
>> //num_threads=omp_get_num_threads(); --> 4
>> dist_ = dist+i*nb; // dists_ is only
>> introduced for OpenMP
>> *dist_++ = sqrt( sq(a_ps[i*nx1] - b_ps[j*nx2]) +
>> - b_ps[j*nx2+1]) );
>> -------</Ccode> ----------
>> There is probably a simple mistake in this code - as I said I never
>> used OpenMP before.
>> It should be not too difficult to use OpenMP correctly here
>> or - maybe better -
>> is there a simple SSE(2,3,4) version that might be even better than
>> OpenMP... !?
>> I supposed, that I did not get the #pragma omp lines right - any idea ?
>> Or is it in general not possible to speed this kind of code up using
>> OpenMP !?
>> Sebastian Haase
>> NumPy-Discussion mailing list
>> NumPy-Discussion at scipy.org
> Information System Engineer, Ph.D.
> Blog: http://matt.eifelle.com
> LinkedIn: http://www.linkedin.com/in/matthieubrucher
> NumPy-Discussion mailing list
> NumPy-Discussion at scipy.org
More information about the NumPy-Discussion