Re: [Numpy-discussion] Fastest distance matrix calc

17 Apr 2007

      On 4/13/07, Timothy Hochberg <tim.hochberg@ieee.org> wrote:
...
On 4/13/07, Bill Baxter <wbaxter@gmail.com> wrote:
...
I think someone posted some timings about this before but I don't recall.
http://aspn.activestate.com/ASPN/Cookbook/Python/Recipe/498246

[snip]
...
I'm going to go out on a limb and contend, without running any timings, that
for large M and N, a solution using a for loop will beat either of those.
For example (untested):
results = empty([M, N], float)
# You could be fancy and swap axes depending on which array is larger, but
# I'll leave that for someone else
for i, v in enumerate(x):
    results[i] = sqrt(sum((v-y)**2, axis=-1))
 Or something like that. The reason that I suspect this will be faster is
that it has better locality, completely finishing a computation on a
relatively small working set before moving onto the next one. The one liners
have to pull the potentially large MxN array into the processor repeatedly.
In my experience, it is indeed the case that the for loop version is
faster. The fastest of the three versions offered in the above url is
the last:

from numpy import mat, zeros, newaxis
def calcDistanceMatrixFastEuclidean2(nDimPoints):
    nDimPoints = array(nDimPoints)
    n,m = nDimPoints.shape
    delta = zeros((n,n),'d')
    for d in xrange(m):
        data = nDimPoints[:,d]
        delta += (data - data[:,newaxis])**2
    return sqrt(delta)

This is easily extended to two different nDimPoints matricies.

Cheers,
Keir

Re: [Numpy-discussion] Fastest distance matrix calc

Keir Mierle