[Numpy-discussion] distance matrix and (weighted) p-norm

Emanuele Olivetti emanuele at relativita.com
Wed Sep 3 09:13:41 EDT 2008


David Cournapeau wrote:
> Emanuele Olivetti wrote:
>> Hi,
>>
>> I'm trying to compute the distance matrix (weighted p-norm [*])
>> between two sets of vectors (data1 and data2). Example:
>>   
>
> You may want to look at scipy.cluster.distance, which has a bunch of
> distance matrix implementation. I believe most of them have optional
> compiled version, for fast execution.

Thanks for the pointer but the distance subpackage in cluster is about
the distance matrix of vectors in one set of vectors. So the resulting
matrix is symmetric. I need to compute distances between two
different sets of vectors (i.e. a non-symmetric distance matrix).
It is not clear to me how to use it in my case.

Then cluster.distance offers:
1) slow python double for loop for computing each entry of the matrix
2) or fast C implementation (numpy/cluster/distance/src/distance.c).

I guess I need to extend distance.c, then work on the wrapper and then
on distance.py. But after that it would be meaningless to have those
distances under 'cluster', since clustering doesn't need distances between
two sets of vectors.

In my original post I was looking for a fast python/numpy implementation
for my code. In special cases (like p==2, i.e. standard weighted euclidean
distance) there is a superfast implementation (e.g., see "Fastest distance
matrix calc" 2007 thread). But I'm not able to find something similar
for the general case.

Any other suggestions on how to speed up my example?

Thanks,

Emanuele




More information about the NumPy-Discussion mailing list