[Numpy-discussion] finding close together points.

josef.pktd at gmail.com josef.pktd at gmail.com
Tue Nov 10 21:30:47 EST 2009

On Tue, Nov 10, 2009 at 7:48 PM, Anne Archibald
<peridot.faceted at gmail.com> wrote:
> 2009/11/10 Christopher Barker <Chris.Barker at noaa.gov>:
>> Hi all,
>> I have a bunch of points in 2-d space, and I need to find out which
>> pairs of points are within a certain distance of one-another (regular
>> old Euclidean norm).
> This is an eminently reasonable thing to want, and KDTree should
> support it. Unfortunately it doesn't.
>> scipy.spatial.KDTree.query_ball_tree() seems like it's built for this.
>> However, I'm a bit confused. The first argument is a kdtree, but I'm
>> calling it as a method of a kdtree -- I want to know which points in the
>> tree I already have are closer that some r from each-other.
>> If I call it as:
>> tree.query_ball_tree(tree, r)
>> I get a big list, that has all the points in it (some of them paired up
>> with close neighbors.) It appears I'm getting the distances between all
>> the points in the tree and itself, as though they were different trees.
>> This is slow, takes a bunch of memory, and I then have to parse out the
>> list to find the ones that are paired up.
>> Is there a way to get just the close ones from the single tree?

I used sparse_distance_matrix for the distance of a kdtree to itself
in the past.

Since it's a matrix it should be possible to just get the lower or
upper triangle, and it's sparse so memory is not so much of a problem.
But I remember it was also slow and only worth using if the matrix is


> Unfortunately not at the moment.
> It's slow both because you're using the python implementation rather
> than the C, and because you're getting all "pairs" where "pair"
> includes pairing a point with itself (and also each distinct pair in
> both orders). The tree really should allow self-queries that don't
> return the point and itself.
> The one good thing about using the python implementation rather than
> the Cython one is that you can subclass it, providing a new method.
> There's still a certain amount of boilerplate code to write, but it
> shouldn't be too bad.
> If this is still too slow, I have no problem incorporating additional
> code into cKDTree; the only reason none of the ball queries are in
> there is because ball queries must return variable-size results, so
> you lose a certain amount of speed because you're forced to manipulate
> python objects. But if there are relatively few result points, this
> need not be much of a slowdown.
> Anne
>> thanks,
>> -Chris
>> --
>> Christopher Barker, Ph.D.
>> Oceanographer
>> Emergency Response Division
>> NOAA/NOS/OR&R            (206) 526-6959   voice
>> 7600 Sand Point Way NE   (206) 526-6329   fax
>> Seattle, WA  98115       (206) 526-6317   main reception
>> Chris.Barker at noaa.gov
>> _______________________________________________
>> NumPy-Discussion mailing list
>> NumPy-Discussion at scipy.org
>> http://mail.scipy.org/mailman/listinfo/numpy-discussion
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion

More information about the NumPy-Discussion mailing list