[SciPy-user] Carateristic distance in a cloud of points

Anne Archibald peridot.faceted at gmail.com
Sat Feb 24 16:13:57 EST 2007


On 24/02/07, Gael Varoquaux <gael.varoquaux at normalesup.org> wrote:
> I have a cloud of points (for instance given as a (n,3) shaped array,
> with columns formed by the x, y and z column vectors).
>
> I would like to find the mean distance in this cloud of points. I do not
> need an exact value, I am just interested in a typical distance.

This is actually quite tricky, depending on what you mean by a
"typical" distance - distances can have all sorts of distributions.
Imagine for example a cloud that is actually two small clouds a long
way apart, or a cloud with a few very distant outliers or a Julia set
(for which the distance behaves like a power law whose exponent is
related to the fractal dimension)... well, you get the point.

> I could do it in a brute force way:

This can be tidied slightly:
> ++++++++++++++++++++++++++++++++++++++++++
> from scipy import *
> x = arange(1, 5)
>
> points = c_[x, x, x]
> diffs = abs(points[newaxis, :] - points[:, newaxis])
There's no need for an absolute value here.
> dists = sqrt(diffs[..., 0]**2 + diffs[..., 1]**2 + diffs[..., 2]**2).ravel()
sqrt(sum(diffs**2,axis=2)).ravel() will do the same.
> dists = dists[dists>0]
> mean(dists)
> ++++++++++++++++++++++++++++++++++++++++++

> Are there any better ways of doing this ?

Well, depending what you want from "typical distance" the median might
do a better job (or not). Or you might be satisfied with a random
sample of 100 points (say):

p = points[random.randint(shape(points)[0],size=100)]
and then use the above procedure.

Alternatively, if you're willing to be crude:

lwh = ptp(points,axis=0) # size of the bounding box
d = sqrt(sum((lwh/2)**2))

I end up using sqrt(sum(X**2,axis=Y)) rather often, I wonder if
there's a tidy idiom for it? It's the L2 norm, after all...

Anne



More information about the SciPy-User mailing list