Mailman 3 Fastest distance matrix calc - NumPy-Discussion

13 Apr 2007

      I think someone posted some timings about this before but I don't recall.

The task is to compute the matrix D from two sets of vectors x (M,d)
and y (N,d).
The output should be  D where D[i,j] is norm(x[i]-y[j])

The Matlab NetLab toolkit uses something like this to compute it:

    d2 = (x*x).sum(1)[:,numpy.newaxis] + (y*y).sum(1) - 2*mult(x,y.T)

And then does
  d2[d2<0]=0
because roundoff in the above can sometimes create negative values.  And finally
  d = sqrt(d2)

But it just doesn't seem like the best way to do it.  Whatever was
posted before I don't remember anything about a subtract.outer
solution.  Seems like something along the lines of
     (subtract.outer(x,y)**2).sum(axis=0)
might be faster, and also avoid the need to check for negatives.

I'll do some timings if no one's already done it.  Just wanted to check first.

--bb

Fastest distance matrix calc

Bill Baxter

Timothy Hochberg

Christopher Barker

Matthieu Brucher

Sturla Molden

Markus Rosenstihl

Christopher Barker

Robert Kern

Keir Mierle

Bill Baxter

Bill Baxter

tags

participants (8)