<div dir="ltr"><div>Thanks for the feedback. PR for 'sorted' keyword is here</div><div><a href="https://github.com/scipy/scipy/pull/8908">https://github.com/scipy/scipy/pull/8908</a></div><div><br></div><div>With no sorting, count_ball_point() vs. len(query_ball_point()) is</div><div>~3x faster for 1d problems,</div><div>~25% faster for 3d problems, and</div><div>~10% faster for 7d problems.<br></div><div><br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">import scipy.spatial as ss<br><br><div>x = np.random.randn(10000, 1)</div>xtree = ss.cKDTree(x)<br><br>%timeit xtree.count_ball_point(x, .02, p=np.inf)<br># 21.8 ms ± 190 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)<br><br>%timeit [len(q) for q in xtree.query_ball_point(x, .02, p=np.inf)]<br># 66.9 ms ± 967 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)<br><br><br>x = np.random.randn(10000, 3)<br>xtree = ss.cKDTree(x)<br><br>%timeit xtree.count_ball_point(x, .2, p=np.inf)<br># 42.3 ms ± 1.4 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)<br><br>%timeit [len(q) for q in xtree.query_ball_point(x, .2, p=np.inf)]<br># 53.1 ms ± 407 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)<br><br><br>x = np.random.randn(10000, 7)<br>xtree = ss.cKDTree(x)<br><br>%timeit xtree.count_ball_point(x, 1., p=np.inf)<br># 537 ms ± 857 µs per loop (mean ± std. dev. of 7 runs, 1 loop each)<br><br>%timeit [len(q) for q in xtree.query_ball_point(x, 1., p=np.inf)]<br># 587 ms ± 6.17 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)</blockquote><span style="font-family:monospace,monospace"><div><span style="font-family:monospace,monospace"></span></div></span></div><div class="gmail_extra"><br clear="all"><div><div class="gmail_signature" data-smartmail="gmail_signature"><div dir="ltr"><div><div dir="ltr"><div dir="ltr">--</div><div dir="ltr">Jesse Livezey<br><div>he/him/his</div></div></div></div></div></div></div>
<br><div class="gmail_quote">On Sat, Jun 9, 2018 at 1:12 PM, Ralf Gommers <span dir="ltr"><<a href="mailto:ralf.gommers@gmail.com" target="_blank">ralf.gommers@gmail.com</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div dir="ltr"><div>Hi Jesse,</div><div><br></div><div><div class="gmail_extra"><br><div class="gmail_quote"><span class="">On Tue, May 22, 2018 at 10:56 AM, Jesse Livezey <span dir="ltr"><<a href="mailto:jesse.livezey@gmail.com" target="_blank">jesse.livezey@gmail.com</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div dir="ltr"><div>Hi everyone,</div><div><br></div><div>I'm using cKDTrees for a project and noticed two potential ways to improve the code and have written one additional count method that might be useful to others.</div><div><br></div><div>I have written code and some tests for all three items and can contribute if there is interest.</div><div><br></div><div>1) Allowing an array of rs in cKDTree.query_ball_point(). I started a PR <a href="https://github.com/scipy/scipy/pull/8818" target="_blank">here</a>. In principle, this should speed up multiple queries with different rs, but see 2.</div><div><br></div><div>2) I noticed that for the cases when cKDTree.query_ball_point() returns large numbers of points (>100), it is faster to loop over queries in python. This is true both with and without my PR. This is largely because single point queries do not sort the return indices, but multi-point queries DO sort them (see details <a href="https://github.com/scipy/scipy/issues/8838" target="_blank">here</a>). Removing the sorted() leads to considerable speedups, but is not backwards compatible. However, the sorted behavior is not in the method description and is not even internally consistent, so maybe it could just be removed or made optional?<br></div></div></blockquote><div><br></div></span><div>I agree with Sturla's reply on the issue; adding a keyword that defaults to the current behavior and allows turning off sorting is probably the way to go.</div><span class=""><div> <br></div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div dir="ltr"><div></div><div><br></div><div>3) I have written a cKDTree.count_ball_point() method that behaves like query_ball_point() but just returns the number of points rather than their indices. This is much faster than calling len(query_ball_point()).</div></div></blockquote><div><br></div></span><div>How much faster is it after adding the keyword for (2)?</div><div><br></div><div>Cheers,<br></div><div>Ralf</div><div><br></div></div></div></div></div>
<br>______________________________<wbr>_________________<br>
SciPy-Dev mailing list<br>
<a href="mailto:SciPy-Dev@python.org">SciPy-Dev@python.org</a><br>
<a href="https://mail.python.org/mailman/listinfo/scipy-dev" rel="noreferrer" target="_blank">https://mail.python.org/<wbr>mailman/listinfo/scipy-dev</a><br>
<br></blockquote></div><br></div>