[Numpy-discussion] Faster

Keith Goodman kwgoodman at gmail.com
Fri May 2 22:02:44 EDT 2008


On Fri, May 2, 2008 at 6:29 PM, Charles R Harris
<charlesr.harris at gmail.com> wrote:
> Isn't the lengthy part finding the distance between clusters?  I can think
> of several ways to do that, but I think you will get a real speedup by doing
> that in c or c++. I have a module made in boost python that holds clusters
> and returns a list of lists containing their elements. Clusters are joined
> by joining any two elements, one from each. It wouldn't take much to add a
> distance function, but you could use the list of indices in each cluster to
> pull a subset out of the distance matrix and then find the minimum function
> in that. This also reminds me of Huffman codes.

You're right. Finding the distance is slow. Is there any way to speed
up the function below? It returns the row and column indices of the
min value of the NxN array x.

def dist(x):
    x = x + 1e10 * np.eye(x.shape[0])
    i, j = np.where(x == x.min())
    return i[0], j[0]

>> x = np.random.rand(500,500)
>> timeit dist(x)
100 loops, best of 3: 14.1 ms per loop

If the clustering gives me useful results, I'll ask you about your
boost code. I'll also take a look at Damian Eads's scipy-cluster.



More information about the NumPy-Discussion mailing list