[Numpy-discussion] Fastest distance matrix calc

Bill Baxter wbaxter at gmail.com
Fri Apr 13 06:43:44 EDT 2007


I think someone posted some timings about this before but I don't recall.

The task is to compute the matrix D from two sets of vectors x (M,d)
and y (N,d).
The output should be  D where D[i,j] is norm(x[i]-y[j])

The Matlab NetLab toolkit uses something like this to compute it:

    d2 = (x*x).sum(1)[:,numpy.newaxis] + (y*y).sum(1) - 2*mult(x,y.T)

And then does
  d2[d2<0]=0
because roundoff in the above can sometimes create negative values.  And finally
  d = sqrt(d2)

But it just doesn't seem like the best way to do it.  Whatever was
posted before I don't remember anything about a subtract.outer
solution.  Seems like something along the lines of
     (subtract.outer(x,y)**2).sum(axis=0)
might be faster, and also avoid the need to check for negatives.

I'll do some timings if no one's already done it.  Just wanted to check first.

--bb



More information about the NumPy-Discussion mailing list