Fastest distance matrix calc
I think someone posted some timings about this before but I don't recall. The task is to compute the matrix D from two sets of vectors x (M,d) and y (N,d). The output should be D where D[i,j] is norm(x[i]-y[j]) The Matlab NetLab toolkit uses something like this to compute it: d2 = (x*x).sum(1)[:,numpy.newaxis] + (y*y).sum(1) - 2*mult(x,y.T) And then does d2[d2<0]=0 because roundoff in the above can sometimes create negative values. And finally d = sqrt(d2) But it just doesn't seem like the best way to do it. Whatever was posted before I don't remember anything about a subtract.outer solution. Seems like something along the lines of (subtract.outer(x,y)**2).sum(axis=0) might be faster, and also avoid the need to check for negatives. I'll do some timings if no one's already done it. Just wanted to check first. --bb
On 4/13/07, Bill Baxter
I think someone posted some timings about this before but I don't recall.
The task is to compute the matrix D from two sets of vectors x (M,d) and y (N,d). The output should be D where D[i,j] is norm(x[i]-y[j])
The Matlab NetLab toolkit uses something like this to compute it:
d2 = (x*x).sum(1)[:,numpy.newaxis] + (y*y).sum(1) - 2*mult(x,y.T)
And then does d2[d2<0]=0 because roundoff in the above can sometimes create negative values. And finally d = sqrt(d2)
But it just doesn't seem like the best way to do it. Whatever was posted before I don't remember anything about a subtract.outer solution. Seems like something along the lines of (subtract.outer(x,y)**2).sum(axis=0) might be faster, and also avoid the need to check for negatives.
I'll do some timings if no one's already done it. Just wanted to check first.
I'm going to go out on a limb and contend, without running any timings, that for large M and N, a solution using a for loop will beat either of those. For example (untested): results = empty([M, N], float) # You could be fancy and swap axes depending on which array is larger, but # I'll leave that for someone else for i, v in enumerate(x): results[i] = sqrt(sum((v-y)**2, axis=-1)) Or something like that. The reason that I suspect this will be faster is that it has better locality, completely finishing a computation on a relatively small working set before moving onto the next one. The one liners have to pull the potentially large MxN array into the processor repeatedly. Or you could use numexpr. -- //=][=\\ tim.hochberg@ieee.org
Timothy Hochberg wrote:
results = empty([M, N], float) # You could be fancy and swap axes depending on which array is larger, but # I'll leave that for someone else for i, v in enumerate(x): results[i] = sqrt(sum((v-y)**2, axis=-1))
you can probably use numpy.hypot(v-y) to speed this up more... -Chris -- Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/OR&R (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception Chris.Barker@noaa.gov
Matthieu Brucher wrote:
you can probably use numpy.hypot(v-y) to speed this up more...
Tried it today, hypot takes two arguments :( Is there a function that does the square root of the sum of squares ?
then maybe you want: numpy.hypot(v-y,v-y), though you should probably make a temporary v-y, so you dont' make two of them. I've been assuming that hypot is written in C, rather than just a convenience function, but it it's the later, then it won't help here. -Chris -- Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/OR&R (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception Chris.Barker@noaa.gov
Christopher Barker wrote:
Matthieu Brucher wrote:
you can probably use numpy.hypot(v-y) to speed this up more...
Tried it today, hypot takes two arguments :( Is there a function that does the square root of the sum of squares ?
then maybe you want:
numpy.hypot(v-y,v-y), though you should probably make a temporary v-y, so you dont' make two of them.
I've been assuming that hypot is written in C, rather than just a convenience function, but it it's the later, then it won't help here.
It is written in C, but it doesn't do what you want. -- Robert Kern "I have come to believe that the whole world is an enigma, a harmless enigma that is made terrible by our own mad attempt to interpret it as though it had an underlying truth." -- Umberto Eco
On 4/13/07, Timothy Hochberg
On 4/13/07, Bill Baxter
wrote: I think someone posted some timings about this before but I don't recall.
http://aspn.activestate.com/ASPN/Cookbook/Python/Recipe/498246 [snip]
I'm going to go out on a limb and contend, without running any timings, that for large M and N, a solution using a for loop will beat either of those. For example (untested):
results = empty([M, N], float) # You could be fancy and swap axes depending on which array is larger, but # I'll leave that for someone else for i, v in enumerate(x): results[i] = sqrt(sum((v-y)**2, axis=-1)) Or something like that. The reason that I suspect this will be faster is that it has better locality, completely finishing a computation on a relatively small working set before moving onto the next one. The one liners have to pull the potentially large MxN array into the processor repeatedly.
In my experience, it is indeed the case that the for loop version is faster. The fastest of the three versions offered in the above url is the last: from numpy import mat, zeros, newaxis def calcDistanceMatrixFastEuclidean2(nDimPoints): nDimPoints = array(nDimPoints) n,m = nDimPoints.shape delta = zeros((n,n),'d') for d in xrange(m): data = nDimPoints[:,d] delta += (data - data[:,newaxis])**2 return sqrt(delta) This is easily extended to two different nDimPoints matricies. Cheers, Keir
Here's a bunch of dist matrix implementations and their timings.
The upshot is that for most purposes this seems to be the best or at
least not too far off (basically the cookbook solution Kier posted)
def dist2hd(x,y):
"""Generate a 'coordinate' of the solution at a time"""
d = npy.zeros((x.shape[0],y.shape[0]),dtype=x.dtype)
for i in xrange(x.shape[1]):
diff2 = x[:,i,None] - y[:,i]
diff2 **= 2
d += diff2
npy.sqrt(d,d)
return d
The only place where it's far from the best is for a small number of
points (~10) with high dimensionality (~100), which does come up in
machine learning contexts. For those cases this does much better
(factor of :
def dist2b3(x,y):
d = npy.dot(x,y.T)
d *= -2.0
d += (x*x).sum(1)[:,None]
d += (y*y).sum(1)
# Rounding errors occasionally cause negative entries in d
d[d<0] = 0
# in place sqrt
npy.sqrt(d,d)
return d
So given that, the obvious solution (if you don't want to delve into
non-numpy solutions) is to use a hybrid that just switches between
the two. Not sure what the proper switch is since it seems kind of
complicated, and probably depends some on cache specifics. But just
switching based on the dimension of the points seems to be pretty
effective:
def dist2hy(x,y):
if x.shape[1]<5:
d = npy.zeros((x.shape[0],y.shape[0]),dtype=x.dtype)
for i in xrange(x.shape[1]):
diff2 = x[:,i,None] - y[:,i]
diff2 **= 2
d += diff2
npy.sqrt(d,d)
return d
else:
d = npy.dot(x,y.T)
d *= -2.0
d += (x*x).sum(1)[:,None]
d += (y*y).sum(1)
# Rounding errors occasionally cause negative entries in d
d[d<0] = 0
# in place sqrt
npy.sqrt(d,d)
return d
All of this assumes 'C' contiguous data. All bets are off if you have
non-contiguous or 'F' ordered data. And maybe if x and y have very
different numbers of points.
--bb
On 4/17/07, Keir Mierle
On 4/13/07, Timothy Hochberg
wrote: On 4/13/07, Bill Baxter
wrote: I think someone posted some timings about this before but I don't recall.
http://aspn.activestate.com/ASPN/Cookbook/Python/Recipe/498246
[snip]
I'm going to go out on a limb and contend, without running any timings, that for large M and N, a solution using a for loop will beat either of those. For example (untested):
results = empty([M, N], float) # You could be fancy and swap axes depending on which array is larger, but # I'll leave that for someone else for i, v in enumerate(x): results[i] = sqrt(sum((v-y)**2, axis=-1)) Or something like that. The reason that I suspect this will be faster is that it has better locality, completely finishing a computation on a relatively small working set before moving onto the next one. The one liners have to pull the potentially large MxN array into the processor repeatedly.
In my experience, it is indeed the case that the for loop version is faster. The fastest of the three versions offered in the above url is the last:
from numpy import mat, zeros, newaxis def calcDistanceMatrixFastEuclidean2(nDimPoints): nDimPoints = array(nDimPoints) n,m = nDimPoints.shape delta = zeros((n,n),'d') for d in xrange(m): data = nDimPoints[:,d] delta += (data - data[:,newaxis])**2 return sqrt(delta)
This is easily extended to two different nDimPoints matricies.
Cheers, Keir _______________________________________________ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
Oops. Looks like I forgot to attach the test program that generated
that output so you can tell what dist2g actually does.
Funny thing is -- despite being written in C, hypot doesn't actually
win any of the test cases for which it's applicable.
--bb
On 4/17/07, Bill Baxter
Here's a bunch of dist matrix implementations and their timings. The upshot is that for most purposes this seems to be the best or at least not too far off (basically the cookbook solution Kier posted)
def dist2hd(x,y): """Generate a 'coordinate' of the solution at a time""" d = npy.zeros((x.shape[0],y.shape[0]),dtype=x.dtype) for i in xrange(x.shape[1]): diff2 = x[:,i,None] - y[:,i] diff2 **= 2 d += diff2 npy.sqrt(d,d) return d
The only place where it's far from the best is for a small number of points (~10) with high dimensionality (~100), which does come up in machine learning contexts. For those cases this does much better (factor of :
def dist2b3(x,y): d = npy.dot(x,y.T) d *= -2.0 d += (x*x).sum(1)[:,None] d += (y*y).sum(1) # Rounding errors occasionally cause negative entries in d d[d<0] = 0 # in place sqrt npy.sqrt(d,d) return d
So given that, the obvious solution (if you don't want to delve into non-numpy solutions) is to use a hybrid that just switches between the two. Not sure what the proper switch is since it seems kind of complicated, and probably depends some on cache specifics. But just switching based on the dimension of the points seems to be pretty effective:
def dist2hy(x,y): if x.shape[1]<5: d = npy.zeros((x.shape[0],y.shape[0]),dtype=x.dtype) for i in xrange(x.shape[1]): diff2 = x[:,i,None] - y[:,i] diff2 **= 2 d += diff2 npy.sqrt(d,d) return d
else: d = npy.dot(x,y.T) d *= -2.0 d += (x*x).sum(1)[:,None] d += (y*y).sum(1) # Rounding errors occasionally cause negative entries in d d[d<0] = 0 # in place sqrt npy.sqrt(d,d) return d
All of this assumes 'C' contiguous data. All bets are off if you have non-contiguous or 'F' ordered data. And maybe if x and y have very different numbers of points.
--bb
On 4/17/07, Keir Mierle
wrote: On 4/13/07, Timothy Hochberg
wrote: On 4/13/07, Bill Baxter
wrote: I think someone posted some timings about this before but I don't recall.
http://aspn.activestate.com/ASPN/Cookbook/Python/Recipe/498246
[snip]
I'm going to go out on a limb and contend, without running any timings, that for large M and N, a solution using a for loop will beat either of those. For example (untested):
results = empty([M, N], float) # You could be fancy and swap axes depending on which array is larger, but # I'll leave that for someone else for i, v in enumerate(x): results[i] = sqrt(sum((v-y)**2, axis=-1)) Or something like that. The reason that I suspect this will be faster is that it has better locality, completely finishing a computation on a relatively small working set before moving onto the next one. The one liners have to pull the potentially large MxN array into the processor repeatedly.
In my experience, it is indeed the case that the for loop version is faster. The fastest of the three versions offered in the above url is the last:
from numpy import mat, zeros, newaxis def calcDistanceMatrixFastEuclidean2(nDimPoints): nDimPoints = array(nDimPoints) n,m = nDimPoints.shape delta = zeros((n,n),'d') for d in xrange(m): data = nDimPoints[:,d] delta += (data - data[:,newaxis])**2 return sqrt(delta)
This is easily extended to two different nDimPoints matricies.
Cheers, Keir _______________________________________________ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
participants (8)
-
Bill Baxter
-
Christopher Barker
-
Keir Mierle
-
Markus Rosenstihl
-
Matthieu Brucher
-
Robert Kern
-
Sturla Molden
-
Timothy Hochberg