On 4/13/07, Timothy Hochberg <tim.hochberg@ieee.org> wrote:
On 4/13/07, Bill Baxter <wbaxter@gmail.com> wrote:
I think someone posted some timings about this before but I don't recall.
http://aspn.activestate.com/ASPN/Cookbook/Python/Recipe/498246 [snip]
I'm going to go out on a limb and contend, without running any timings, that for large M and N, a solution using a for loop will beat either of those. For example (untested):
results = empty([M, N], float) # You could be fancy and swap axes depending on which array is larger, but # I'll leave that for someone else for i, v in enumerate(x): results[i] = sqrt(sum((v-y)**2, axis=-1)) Or something like that. The reason that I suspect this will be faster is that it has better locality, completely finishing a computation on a relatively small working set before moving onto the next one. The one liners have to pull the potentially large MxN array into the processor repeatedly.
In my experience, it is indeed the case that the for loop version is faster. The fastest of the three versions offered in the above url is the last: from numpy import mat, zeros, newaxis def calcDistanceMatrixFastEuclidean2(nDimPoints): nDimPoints = array(nDimPoints) n,m = nDimPoints.shape delta = zeros((n,n),'d') for d in xrange(m): data = nDimPoints[:,d] delta += (data - data[:,newaxis])**2 return sqrt(delta) This is easily extended to two different nDimPoints matricies. Cheers, Keir