[Numpy-discussion] finding close together points.

Anne Archibald peridot.faceted at gmail.com
Thu Nov 12 14:19:58 EST 2009

2009/11/12 Lou Pecora <lou_boog2000 at yahoo.com>:
> ----- Original Message ----
> From: Christopher Barker <Chris.Barker at noaa.gov>
> To: Discussion of Numerical Python <numpy-discussion at scipy.org>
> Sent: Thu, November 12, 2009 12:37:37 PM
> Subject: Re: [Numpy-discussion] finding close together points.
> Lou Pecora wrote:
>> a KD tree for 2D nearest neighbor seems like over kill.  You
>> might want to try the simple approach of using boxes of points to
>> narrow things down by sorting on the first component.
> yeah, we'll probably do something like that if we have to write the code
> ourselves. At the moment, we're using geohash:
> http://pypi.python.org/pypi/Geohash/
> (this is for points on the earth)
> and it's working OK. I was just hoping kdtree would work out of the box!
>> where for a
>> static data set it can match KD trees in speed
> Why does it have to be static -- it doesn't look hard to insert/remove
> points.
> --- Lou answers ----------------------
> It doesn't have to be static, but if I remember correctly, when you add points you have to resort or do a "smart" insert (which may require a lot of data "shifting").  Not hard, but the original paper claimed that if there are a lot of changes to the data set this becomes more expensive than the kd-tree.  If you can get the original paper I mentioned, it will have more info.  It's pretty clearly written and contains some nice comparisons to kd-trees for finding near neighbors.
> Bottom line is that you can still try it with changing data.  See what the run times are like.  I've used the approach for up to eight dimensions with about 10^5 data points.  It worked nicely.  I remember looking for a kd-tree library that would work out of the box (C or C++), but everything I found was not as simple as I would have liked.  The slice searching was a nice solution.  But maybe I just didn't know where to look or I'm not understanding some of the libraries I did find.
> Let us know how it goes.

Just a quick comment: none of the KDTrees in scipy.spatial support any
kind of modification right now, so if your data set changes you will
need to rebuild them completely. Construction is fairly cheap, but
it's very unlikely to compete with a data structure that supports
modification algorithms.


More information about the NumPy-Discussion mailing list