[SciPy-User] efficient computation of point cloud nearest neighbors

Ralf Gommers ralf.gommers at googlemail.com
Sun May 29 14:32:58 EDT 2011


On Sun, May 29, 2011 at 8:15 PM, Gael Varoquaux <
gael.varoquaux at normalesup.org> wrote:

> On Sun, May 29, 2011 at 07:59:37PM +0200, Ralf Gommers wrote:
> >    This is the second issue with ANN bindings reported in a week, so I
> had a
> >    look at scikits.ann. Then I found
> >    [2]http://blog.physionconsulting.com/?p=17. So it looks like there
> should
> >    be a big "this is deprecated" warning on scikits.ann. It would be
> helpful
> >    if someone can confirm that KdTree/cKdTree in scikits.spatial is about
> as
> >    fast as ANN/FLANN.
>
> Regarding speed of nearest neighbors, a big caveat is that it depends a
> lot on the dimensionality of the search space.
>
> For low dimensionality, KDTree is super fast. It breaks down at around
> 10d, because the space starts becoming too 'empty': splitting it with
> plane to separate in half-spaces with equal partitions of points ends up
> quickly creating as many planes as they are points. The next thing is the
> BallTree, that creates nested balls. In low dimensions it is as fast as
> the KDTree, and it scales a bit higher, up to 20d is a good guess. A
> BallTree implementation, contributed by Jake VanderPlas, can be found in
> the scikits.learn. For references, see
>
> http://scikit-learn.sourceforge.net/modules/neighbors.html#efficient-implementation-the-ball-tree
>
> Above d ~ 20, a brute force search is quicker if you want exact nearest
> neighbor. The scikits.learn's nearest neighbor search implements by
> default an automatic switch.
>
> I have never tried ANN (approximate nearest neighbor) but I wouldn't be
> surprised if it were faster than a brute force in this regime.
>
> All that to say the ANN probably has strong usecases and cannot always be
> replaced by a KDTree.
>

scikits.ann exposes a single class, kdtree. scipy.spatial.KDTree.query seems
to be able to do approximate nearest neighbors:

    KDTree.query(self, x, k=1, eps=0, p=2, distance_upper_bound=inf)
    .....
    eps : nonnegative float
        Return approximate nearest neighbors; the kth returned value
        is guaranteed to be no further than (1+eps) times the
        distance to the real kth nearest neighbor.

So it looks to me like scipy.spatial has things covered. Which is also what
Barry Wark (scikits.ann author) seems to say in the blog post I linked to.

Barry, could you confirm this and if appropriate put up some deprecation
warnings?

Thanks,
Ralf
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.scipy.org/pipermail/scipy-user/attachments/20110529/8939fc11/attachment.html>


More information about the SciPy-User mailing list