Interest in improvements to the cKDTree codebase?
Hi everyone, I'm using cKDTrees for a project and noticed two potential ways to improve the code and have written one additional count method that might be useful to others. I have written code and some tests for all three items and can contribute if there is interest. 1) Allowing an array of rs in cKDTree.query_ball_point(). I started a PR here <https://github.com/scipy/scipy/pull/8818>. In principle, this should speed up multiple queries with different rs, but see 2. 2) I noticed that for the cases when cKDTree.query_ball_point() returns large numbers of points (>100), it is faster to loop over queries in python. This is true both with and without my PR. This is largely because single point queries do not sort the return indices, but multi-point queries DO sort them (see details here <https://github.com/scipy/scipy/issues/8838>). Removing the sorted() leads to considerable speedups, but is not backwards compatible. However, the sorted behavior is not in the method description and is not even internally consistent, so maybe it could just be removed or made optional? 3) I have written a cKDTree.count_ball_point() method that behaves like query_ball_point() but just returns the number of points rather than their indices. This is much faster than calling len(query_ball_point()). Let me know if any of this is of interest. Jesse -- Jesse Livezey he/him/his
Hi Jesse, On Tue, May 22, 2018 at 10:56 AM, Jesse Livezey <jesse.livezey@gmail.com> wrote:
Hi everyone,
I'm using cKDTrees for a project and noticed two potential ways to improve the code and have written one additional count method that might be useful to others.
I have written code and some tests for all three items and can contribute if there is interest.
1) Allowing an array of rs in cKDTree.query_ball_point(). I started a PR here <https://github.com/scipy/scipy/pull/8818>. In principle, this should speed up multiple queries with different rs, but see 2.
2) I noticed that for the cases when cKDTree.query_ball_point() returns large numbers of points (>100), it is faster to loop over queries in python. This is true both with and without my PR. This is largely because single point queries do not sort the return indices, but multi-point queries DO sort them (see details here <https://github.com/scipy/scipy/issues/8838>). Removing the sorted() leads to considerable speedups, but is not backwards compatible. However, the sorted behavior is not in the method description and is not even internally consistent, so maybe it could just be removed or made optional?
I agree with Sturla's reply on the issue; adding a keyword that defaults to the current behavior and allows turning off sorting is probably the way to go.
3) I have written a cKDTree.count_ball_point() method that behaves like query_ball_point() but just returns the number of points rather than their indices. This is much faster than calling len(query_ball_point()).
How much faster is it after adding the keyword for (2)? Cheers, Ralf
Thanks for the feedback. PR for 'sorted' keyword is here https://github.com/scipy/scipy/pull/8908 With no sorting, count_ball_point() vs. len(query_ball_point()) is ~3x faster for 1d problems, ~25% faster for 3d problems, and ~10% faster for 7d problems. import scipy.spatial as ss
x = np.random.randn(10000, 1) xtree = ss.cKDTree(x)
%timeit xtree.count_ball_point(x, .02, p=np.inf) # 21.8 ms ± 190 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)
%timeit [len(q) for q in xtree.query_ball_point(x, .02, p=np.inf)] # 66.9 ms ± 967 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)
x = np.random.randn(10000, 3) xtree = ss.cKDTree(x)
%timeit xtree.count_ball_point(x, .2, p=np.inf) # 42.3 ms ± 1.4 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)
%timeit [len(q) for q in xtree.query_ball_point(x, .2, p=np.inf)] # 53.1 ms ± 407 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)
x = np.random.randn(10000, 7) xtree = ss.cKDTree(x)
%timeit xtree.count_ball_point(x, 1., p=np.inf) # 537 ms ± 857 µs per loop (mean ± std. dev. of 7 runs, 1 loop each)
%timeit [len(q) for q in xtree.query_ball_point(x, 1., p=np.inf)] # 587 ms ± 6.17 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
-- Jesse Livezey he/him/his On Sat, Jun 9, 2018 at 1:12 PM, Ralf Gommers <ralf.gommers@gmail.com> wrote:
Hi Jesse,
On Tue, May 22, 2018 at 10:56 AM, Jesse Livezey <jesse.livezey@gmail.com> wrote:
Hi everyone,
I'm using cKDTrees for a project and noticed two potential ways to improve the code and have written one additional count method that might be useful to others.
I have written code and some tests for all three items and can contribute if there is interest.
1) Allowing an array of rs in cKDTree.query_ball_point(). I started a PR here <https://github.com/scipy/scipy/pull/8818>. In principle, this should speed up multiple queries with different rs, but see 2.
2) I noticed that for the cases when cKDTree.query_ball_point() returns large numbers of points (>100), it is faster to loop over queries in python. This is true both with and without my PR. This is largely because single point queries do not sort the return indices, but multi-point queries DO sort them (see details here <https://github.com/scipy/scipy/issues/8838>). Removing the sorted() leads to considerable speedups, but is not backwards compatible. However, the sorted behavior is not in the method description and is not even internally consistent, so maybe it could just be removed or made optional?
I agree with Sturla's reply on the issue; adding a keyword that defaults to the current behavior and allows turning off sorting is probably the way to go.
3) I have written a cKDTree.count_ball_point() method that behaves like query_ball_point() but just returns the number of points rather than their indices. This is much faster than calling len(query_ball_point()).
How much faster is it after adding the keyword for (2)?
Cheers, Ralf
_______________________________________________ SciPy-Dev mailing list SciPy-Dev@python.org https://mail.python.org/mailman/listinfo/scipy-dev
participants (2)
-
Jesse Livezey -
Ralf Gommers