[Numpy-discussion] record arrays initialization
Keith Goodman
kwgoodman at gmail.com
Thu May 3 13:00:11 EDT 2012
On Wed, May 2, 2012 at 4:46 PM, Kevin Jacobs <jacobs at bioinformed.com>
<bioinformed at gmail.com> wrote:
> The cKDTree implementation is more than 4 times faster than the brute-force
> approach:
>
> T = scipy.spatial.cKDTree(targets)
>
> In [11]: %timeit foo1(element, targets) # Brute force
> 1000 loops, best of 3: 385 us per loop
>
> In [12]: %timeit foo2(element, T) # cKDTree
> 10000 loops, best of 3: 83.5 us per loop
>
> In [13]: 385/83.5
> Out[13]: 4.610778443113772
Brute force plus cython beats cKDTree for knn with k=1 and Euclidean distance:
I[5] targets = np.random.uniform(0, 8, (5000, 7))
I[6] element = np.arange(1, 8, dtype=np.float64)
I[7] T = scipy.spatial.cKDTree(targets)
I[8] timeit T.query(element)
10000 loops, best of 3: 36.1 us per loop
I[9] timeit nn(targets, element)
10000 loops, best of 3: 28.5 us per loop
What about lower dimensions (2 instead of 7) where cKDTree gets faster?
I[18] element = np.arange(1,3, dtype=np.float64)
I[19] targets = np.random.uniform(0,8,(5000,2))
I[20] T = scipy.spatial.cKDTree(targets)
I[21] timeit T.query(element)
10000 loops, best of 3: 27.5 us per loop
I[22] timeit nn(targets, element)
100000 loops, best of 3: 11.6 us per loop
I could add nn to the bottleneck package. Is the k=1, Euclidean
distance case too specialized?
Prototype (messy) code is here:
https://github.com/kwgoodman/bottleneck/issues/45
For a smaller speed up (~2x) foo1 could use bottleneck's sum or
squares function, bn.ss().
More information about the NumPy-Discussion
mailing list