[SciPy-User] efficient computation of point cloud nearest neighbors

Anne Archibald aarchiba at physics.mcgill.ca
Mon May 30 14:20:09 EDT 2011


Hi,

In fact, I wrote KDTree and cKDTree based on a description of the
algorithm used in ANN, so it should have the same O behaviour. cKDTree
has certainly not seen the amount of tuning and polishing ANN has, but
it should be pretty clean.

For the OP's problem, the easiest way to get an answer is to make a
cKDTree of the point cloud and then just call query with an array of
all the points. This runs a separate query for each point, but each
query uses the same tree and the looping is done in C, so it should be
fairly fast.

If this is not fast enough, it might be worth trying a two-tree query
- that is, putting both the query points and the potential neighbours
in kd-trees. Then there's an algorithm that saves a lot of tree
traversal by using the spatial structure of the query points. (In this
case the two trees are even the same.) Such an algorithm is even
implemented, but unfortunately only in the pure python KDTree. If the
OP really needs this to be fast, then the best thing to do would
probably be to port KDTree.query_tree to cython. The algorithm is a
little messy but not too complicated.

I don't know how the FLANN optimizations fit into all this.

Anne

On 29 May 2011 14:32, Ralf Gommers <ralf.gommers at googlemail.com> wrote:
>
>
> On Sun, May 29, 2011 at 8:15 PM, Gael Varoquaux
> <gael.varoquaux at normalesup.org> wrote:
>>
>> On Sun, May 29, 2011 at 07:59:37PM +0200, Ralf Gommers wrote:
>> >    This is the second issue with ANN bindings reported in a week, so I
>> > had a
>> >    look at scikits.ann. Then I found
>> >    [2]http://blog.physionconsulting.com/?p=17. So it looks like there
>> > should
>> >    be a big "this is deprecated" warning on scikits.ann. It would be
>> > helpful
>> >    if someone can confirm that KdTree/cKdTree in scikits.spatial is
>> > about as
>> >    fast as ANN/FLANN.
>>
>> Regarding speed of nearest neighbors, a big caveat is that it depends a
>> lot on the dimensionality of the search space.
>>
>> For low dimensionality, KDTree is super fast. It breaks down at around
>> 10d, because the space starts becoming too 'empty': splitting it with
>> plane to separate in half-spaces with equal partitions of points ends up
>> quickly creating as many planes as they are points. The next thing is the
>> BallTree, that creates nested balls. In low dimensions it is as fast as
>> the KDTree, and it scales a bit higher, up to 20d is a good guess. A
>> BallTree implementation, contributed by Jake VanderPlas, can be found in
>> the scikits.learn. For references, see
>>
>> http://scikit-learn.sourceforge.net/modules/neighbors.html#efficient-implementation-the-ball-tree
>>
>> Above d ~ 20, a brute force search is quicker if you want exact nearest
>> neighbor. The scikits.learn's nearest neighbor search implements by
>> default an automatic switch.
>>
>> I have never tried ANN (approximate nearest neighbor) but I wouldn't be
>> surprised if it were faster than a brute force in this regime.
>>
>> All that to say the ANN probably has strong usecases and cannot always be
>> replaced by a KDTree.
>
> scikits.ann exposes a single class, kdtree. scipy.spatial.KDTree.query seems
> to be able to do approximate nearest neighbors:
>
>     KDTree.query(self, x, k=1, eps=0, p=2, distance_upper_bound=inf)
>     .....
>     eps : nonnegative float
>         Return approximate nearest neighbors; the kth returned value
>         is guaranteed to be no further than (1+eps) times the
>         distance to the real kth nearest neighbor.
>
> So it looks to me like scipy.spatial has things covered. Which is also what
> Barry Wark (scikits.ann author) seems to say in the blog post I linked to.
>
> Barry, could you confirm this and if appropriate put up some deprecation
> warnings?
>
> Thanks,
> Ralf
>
>
> _______________________________________________
> SciPy-User mailing list
> SciPy-User at scipy.org
> http://mail.scipy.org/mailman/listinfo/scipy-user
>
>



More information about the SciPy-User mailing list