[SciPy-User] Finding closest point in array - inverse of KDTree

Robert Kern robert.kern at gmail.com
Wed Oct 11 19:24:59 EDT 2017

On Wed, Oct 11, 2017 at 10:01 AM, Ant <antoniy.py at gmail.com> wrote:
> Hello all,
> I have the same question I posted on stack overflow:
> I have a very large ndarray A, and a sorted list of points k (a small
list, about 30 points).
> For every element of A, I want to determine the closest element in the
list of points k, together with the index. So something like:
>     >>> A = np.asarray([3, 4, 5, 6])
>     >>> k = np.asarray([4.1, 3])
>     >>> values, indices
>     [3, 4.1, 4.1, 4.1], [1, 0, 0, 0]
> Now, the problem is that A is very very large. So I can't do something
inefficient like adding one dimension to A, take the abs difference to k,
and then take the minimum of each column.
> For now I have been using np.searchsorted, as shown in the second answer
but even this is too slow.
> I thought of using scipy.spatial.KDTree:
>     >>> d = scipy.spatial.KDTree(k)
>     >>> d.query(A)
> This turns out to be much slower than the searchsorted solution.
> On the other hand, the array A is always the same, only k changes. So it
would be beneficial to use some auxiliary structure (like a "inverse
KDTree") on A, and then query the results on the small array k.
> Is there something like that?

The KDTree and BallTree implementations in scikit-learn have
implementations for querying with other trees. Unfortunately, these
implementations are hidden behind an interface that builds the query tree
on demand and then throws it away. You'd have to subclass in Cython and
expose the `dualtree` implementations as a Python-exposed method.


Robert Kern
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/scipy-user/attachments/20171011/b144f292/attachment-0001.html>

More information about the SciPy-User mailing list