[Numpy-discussion] finding close together points.

Anne Archibald peridot.faceted at gmail.com
Tue Nov 10 19:48:14 EST 2009

2009/11/10 Christopher Barker <Chris.Barker at noaa.gov>:
> Hi all,
> I have a bunch of points in 2-d space, and I need to find out which
> pairs of points are within a certain distance of one-another (regular
> old Euclidean norm).

This is an eminently reasonable thing to want, and KDTree should
support it. Unfortunately it doesn't.

> scipy.spatial.KDTree.query_ball_tree() seems like it's built for this.
> However, I'm a bit confused. The first argument is a kdtree, but I'm
> calling it as a method of a kdtree -- I want to know which points in the
> tree I already have are closer that some r from each-other.
> If I call it as:
> tree.query_ball_tree(tree, r)
> I get a big list, that has all the points in it (some of them paired up
> with close neighbors.) It appears I'm getting the distances between all
> the points in the tree and itself, as though they were different trees.
> This is slow, takes a bunch of memory, and I then have to parse out the
> list to find the ones that are paired up.
> Is there a way to get just the close ones from the single tree?

Unfortunately not at the moment.

It's slow both because you're using the python implementation rather
than the C, and because you're getting all "pairs" where "pair"
includes pairing a point with itself (and also each distinct pair in
both orders). The tree really should allow self-queries that don't
return the point and itself.

The one good thing about using the python implementation rather than
the Cython one is that you can subclass it, providing a new method.
There's still a certain amount of boilerplate code to write, but it
shouldn't be too bad.

If this is still too slow, I have no problem incorporating additional
code into cKDTree; the only reason none of the ball queries are in
there is because ball queries must return variable-size results, so
you lose a certain amount of speed because you're forced to manipulate
python objects. But if there are relatively few result points, this
need not be much of a slowdown.


> thanks,
> -Chris
> --
> Christopher Barker, Ph.D.
> Oceanographer
> Emergency Response Division
> NOAA/NOS/OR&R            (206) 526-6959   voice
> 7600 Sand Point Way NE   (206) 526-6329   fax
> Seattle, WA  98115       (206) 526-6317   main reception
> Chris.Barker at noaa.gov
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion

More information about the NumPy-Discussion mailing list