[SciPy-user] Calculating a lot of (squared) Mahalanobis distances

Damian Eads eads at soe.ucsc.edu
Fri Nov 7 12:58:44 EST 2008


Have you looked at the cdist function? It takes as input two sets of
vectors S1 and S2 and returns a n1 by n2 rectangular array back. The
ij'th entry is the distance between S1[i] and S2[j].



On Thu, Nov 6, 2008 at 7:36 PM, David Warde-Farley <dwf at cs.toronto.edu> wrote:
> Hi folks,
>
> I'm trying to calculate a lot of Mahalanobis distances (in essence,
> applying a positive definite quadratic x.T * A * x to a lot of vectors
> x) and trying to think of the fastest way to do it with numpy.
>
> If I've got a single vector x and a 2D array sigmainv, then I've got
> something like this.
>
> import numpy as np
> ...
> xmmu = x - mu
> dist = np.dot(xmmu, np.dot(sigmainv, xmmu))
>
> However if I've got a DxN 2d array of N different vectors for which I
> want this quantity, it seems I can either use a loop or do something
> like
>
> xmmu = x - mu[:,np.newaxis]
> dist = np.diag(xmmu, np.dot(sigmainv, xmmu)))
>
> It seems like a lot of wasted computation to throw out the off-
> diagonals. One thought I've had would be to diagonalize sigmainv and
> then do something tricky with scalar products and broadcasting the
> diagonal, but I am not sure whether that would save me much.
>
> Does anyone have any other tricks up their sleeve?
>
> Thanks,
>
> David
>
> _______________________________________________
> SciPy-user mailing list
> SciPy-user at scipy.org
> http://projects.scipy.org/mailman/listinfo/scipy-user
>



-- 
-----------------------------------------------------
Damian Eads                             Ph.D. Student
Jack Baskin School of Engineering, UCSC        E2-489
1156 High Street                 Machine Learning Lab
Santa Cruz, CA 95064    http://www.soe.ucsc.edu/~eads



More information about the SciPy-User mailing list