[Numpy-discussion] faster way for calculating distance matrix...

Jeffery D. Collins jcollins at boulder.net
Wed May 21 18:03:02 EDT 2003


Kasper Souren wrote:

>Hi,
>
>I was happy to see solutions coming up for Cliff's problems and it made me 
>think: maybe I can mention my problem here as well.
>
>>From an array of feature vectors I want to calculate it's distance matrix. 
>Something like [1]. Currently it can take quite a while to calculate the 
>stuff for a long array.
>
>Some questions:
>
>1) Is there a smart speed up possible? Like, a way to avoid the double loop? 
>It's no problem if this would lead to less generality (like the choice for a 
>distance function). I know there is a little speed up to be gained by leaving 
>out the array(f) thing, but that's not what I'm looking for.
>
>2) Is it possible (in Numeric or numarray) to define a class DiagonalMatrix 
>that at least saves half of the memory?
>
>3) If 1) is not possible, what would be the way to go for speeding it up by 
>writing it in C? weave because of its availability in scipy or would pyrex be 
>more interesting, or are there even more options..?
>
>bye,
>Kasper
>
>
>
>[1] Example program:
>
>import Numeric
>
>def euclidean_dist(a, b):
>	diff = a - b
>	return Numeric.dot(diff, diff)
>
>def calc_dist_matrix(f, distance_function):
>	W = Numeric.array(f)
>	length = W.shape[0]
>	S = Numeric.zeros((length, length)) * 1.0
>
>	for i in range(length):
>		for j in range(i):
>			S[j, i] = S[i, j] = distance_function(W[i], W[j])
>	return S
>
>print calc_dist_matrix(Numeric.sin(Numeric.arange(30)), euclidean_dist) 
>  
>

The following is a bit faster (at the cost of some memory):

def euclidean_dist(a, b):
    diff = a - b
    return diff**2

def calc_dist_matrix2(f, distance_function):
    W = Numeric.array(f)
    i,j = indices((len(W),)*2)
    S = distance_function(take(W,i), take(W,j))
    return S


-- 
Jeffery Collins (http://www.boulder.net/~jcollins)






More information about the NumPy-Discussion mailing list