[Numpy-discussion] OT: performance in C extension; OpenMP, or SSE ?

Tue Feb 15 16:42:00 EST 2011

Hi Eat,
I will surely try these routines tomorrow,
but I still think that neither scipy function does the complete
distance calculation of all possible pairs as done by my C code.
For 2 arrays, X and Y, of nX and nY 2d coordinates respectively, I
need to get nX times nY distances computed.
>From the online documentation I understand that
pdist calculates something like nX square distances for a single array
X of nX coordinates,
and cdist would calculate nX distances, where nX is required to equal nY.

Thanks for looking into this,
Sebastian

On Tue, Feb 15, 2011 at 9:11 PM, eat <e.antero.tammi at gmail.com> wrote:
> Hi,
>
> On Tue, Feb 15, 2011 at 5:50 PM, Sebastian Haase <seb.haase at gmail.com>
> wrote:
>>
>> Hi,
>> I assume that someone here could maybe help me, and I'm hoping it's
>> not too much off topic.
>> I have 2 arrays of 2d point coordinates and would like to calculate
>> all pairwise distances as fast as possible.
>> Going from Python/Numpy to a (Swigged) C extension already gave me a
>> 55x speedup.
>> (.9ms vs. 50ms for arrays of length 329 and 340).
>
> With my very modest machine (Intel Dual CPU E2200, 2.2 Ghz) utilizing
> scipy.spatial.distance.pdist will take some 1.5 ms for such arrays. Even the
> simple linear algebra based full matrix calculations can be done less than 5
> ms.
>
> My two cents,
> eat
>>
>> I'm using gcc on Linux.
>> Now I'm wondering if I could go even faster !?
>> My hope that the compiler might automagically do some SSE2
>> optimization got disappointed.
>> Since I have a 4 core CPU I thought OpenMP might be an option;
>> I never used that, and after some playing around I managed to get
>> (only) 50% slowdown(!) :-(
>>
>> My code in short is this:
>> (My SWIG typemaps use obj_to_array_no_conversion() from numpy.i)
>> -------<Ccode> ----------
>> void dists2d(
>>                   double *a_ps, int nx1, int na,
>>                   double *b_ps, int nx2, int nb,
>>                   double *dist, int nx3, int ny3)  throw (char*)
>> {
>>  if(nx1 != 2)  throw (char*) "a must be of shape (n,2)";
>>  if(nx2 != 2)  throw (char*) "b must be of shape (n,2)";
>>  if(nx3 != nb || ny3 != na)    throw (char*) "c must be of shape (na,nb)";
>>
>>  double *dist_;
>>  int i, j;
>>
>> #pragma omp parallel private(dist_, j, i)
>>  {
>> #pragma omp for nowait
>>        for(i=0;i<na;i++)
>>          {
>>                //num_threads=omp_get_num_threads();  --> 4
>>                dist_ = dist+i*nb;                 // dists_  is  only
>> introduced for OpenMP
>>                for(j=0;j<nb;j++)
>>                  {
>>                        *dist_++  = sqrt( sq(a_ps[i*nx1]   - b_ps[j*nx2]) +
>>                                                          sq(a_ps[i*nx1+1]
>> - b_ps[j*nx2+1]) );
>>                  }
>>          }
>>  }
>> }
>> -------</Ccode> ----------
>> There is probably a simple mistake in this code - as I said I never
>> used OpenMP before.
>> It should be not too difficult to use OpenMP correctly here
>>  or -  maybe better -
>> is there a simple SSE(2,3,4) version that might be even better than
>> OpenMP... !?
>>
>> I supposed, that I did not get the #pragma omp lines right - any idea ?
>> Or is it in general not possible to speed this kind of code up using
>> OpenMP !?
>>
>> Thanks,
>> Sebastian Haase
>> _______________________________________________
>> NumPy-Discussion mailing list
>> NumPy-Discussion at scipy.org
>> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>
>
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>
>