# [Numpy-discussion] record arrays initialization

Keith Goodman kwgoodman at gmail.com
Thu May 3 13:00:11 EDT 2012

```On Wed, May 2, 2012 at 4:46 PM, Kevin Jacobs <jacobs at bioinformed.com>
<bioinformed at gmail.com> wrote:

> The cKDTree implementation is more than 4 times faster than the brute-force
> approach:
>
> T = scipy.spatial.cKDTree(targets)
>
> In : %timeit foo1(element, targets)   # Brute force
> 1000 loops, best of 3: 385 us per loop
>
> In : %timeit foo2(element, T)         # cKDTree
> 10000 loops, best of 3: 83.5 us per loop
>
> In : 385/83.5
> Out: 4.610778443113772

Brute force plus cython beats cKDTree for knn with k=1 and Euclidean distance:

I targets = np.random.uniform(0, 8, (5000, 7))
I element = np.arange(1, 8, dtype=np.float64)

I T = scipy.spatial.cKDTree(targets)
I timeit T.query(element)
10000 loops, best of 3: 36.1 us per loop

I timeit nn(targets, element)
10000 loops, best of 3: 28.5 us per loop

What about lower dimensions (2 instead of 7) where cKDTree gets faster?

I element = np.arange(1,3, dtype=np.float64)
I targets = np.random.uniform(0,8,(5000,2))

I T = scipy.spatial.cKDTree(targets)
I timeit T.query(element)
10000 loops, best of 3: 27.5 us per loop

I timeit nn(targets, element)
100000 loops, best of 3: 11.6 us per loop

I could add nn to the bottleneck package. Is the k=1, Euclidean
distance case too specialized?

Prototype (messy) code is here:
https://github.com/kwgoodman/bottleneck/issues/45

For a smaller speed up (~2x) foo1 could use bottleneck's sum or
squares function, bn.ss().

```