<div dir="ltr">Would you guys consider in scope for scipy to have implementation of faster nearest neighbor search methods than KdTree? <div><br></div><div>Some methods are fairly simple... e.g principal axis tree which use the principal direction of the dataset to split the dataset into smaller subsets.  As soon as intrinsic dimensionality is significantly smaller than the dimension of the space, it is significantly faster. </div><div><br></div><div>Besides, only having to compute the (an approximate) principal axis is much faster than doing an actual PCA.<br></div><div class="gmail_extra"><br><div class="gmail_quote">On Tue, Sep 6, 2016 at 4:14 AM, Jacob Vanderplas <span dir="ltr"><<a href="mailto:jakevdp@cs.washington.edu" target="_blank">jakevdp@cs.washington.edu</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div dir="ltr">From my own casual benchmarks, the new scipy cKDTree is much faster than any of the scikit-learn options, though it still only supports axis-aligned euclidean-like metrics (where sklearn's BallTree supports dozens of additional metrics). The cKDTree also has a limited range of query types compared to scikit-learn's trees,<div>   Jake</div></div><div class="gmail_extra"><br clear="all"><div><div data-smartmail="gmail_signature"><div dir="ltr"><div><div dir="ltr"> <font size="1">Jake VanderPlas</font></div><div dir="ltr"><font size="1"> Senior Data Science Fellow<br></font><div><div><font size="1"> Director of Research in Physical Sciences</font></div><div><font size="1"> </font><span style="font-size:x-small">University of Washington </span><span style="font-size:x-small">eScience Institute</span></div></div></div></div></div></div></div><div><div class="h5">

<br><div class="gmail_quote">On Mon, Sep 5, 2016 at 12:46 AM, Daπid <span dir="ltr"><<a href="mailto:davidmenhur@gmail.com" target="_blank">davidmenhur@gmail.com</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><span>On 4 September 2016 at 23:00, Robert Lucente <<a href="mailto:rlucente@pipeline.com" target="_blank">rlucente@pipeline.com</a>> wrote:<br>

> Please note that I am a newbie and just a lurker.<br>

><br>

> I noticed in a recent email that cKDTree was mentioned.<br>

><br>

> Q: What is the relationship if any between SciPy an scikit-learn when it comes to cKDTree?<br>

><br>

> The reason that I ask are the following 2 links<br>

><br>

> <a href="https://jakevdp.github.io/blog/2013/04/29/benchmarking-nearest-neighbor-searches-in-python/" rel="noreferrer" target="_blank">https://jakevdp.github.io/blog<wbr>/2013/04/29/benchmarking-neare<wbr>st-neighbor-searches-in-python<wbr>/</a><br>

><br>

> <a href="https://github.com/scikit-learn/scikit-learn/issues/3682" rel="noreferrer" target="_blank">https://github.com/scikit-lear<wbr>n/scikit-learn/issues/3682</a><br>

<br>

</span>Note that these benchmarks are from 2013 and 2014. Scipy's KDTree has<br>

seen its performance recently improved, twice. Scikit's last update to<br>

its KDTree was in 2015. So, we need to run the benchmarks again.<br>

<br>

/David.<br>

<div><div>______________________________<wbr>_________________<br>

SciPy-Dev mailing list<br>

<a href="mailto:SciPy-Dev@scipy.org" target="_blank">SciPy-Dev@scipy.org</a><br>

<a href="https://mail.scipy.org/mailman/listinfo/scipy-dev" rel="noreferrer" target="_blank">https://mail.scipy.org/mailman<wbr>/listinfo/scipy-dev</a><br>

</div></div></blockquote></div><br></div></div></div>

<br>______________________________<wbr>_________________<br>

SciPy-Dev mailing list<br>

<a href="mailto:SciPy-Dev@scipy.org">SciPy-Dev@scipy.org</a><br>

<a href="https://mail.scipy.org/mailman/listinfo/scipy-dev" rel="noreferrer" target="_blank">https://mail.scipy.org/<wbr>mailman/listinfo/scipy-dev</a><br>

<br></blockquote></div><br></div></div>