[Numpy-discussion] record arrays initialization

Keith Goodman kwgoodman at gmail.com
Thu May 3 13:00:11 EDT 2012


On Wed, May 2, 2012 at 4:46 PM, Kevin Jacobs <jacobs at bioinformed.com>
<bioinformed at gmail.com> wrote:

> The cKDTree implementation is more than 4 times faster than the brute-force
> approach:
>
> T = scipy.spatial.cKDTree(targets)
>
> In [11]: %timeit foo1(element, targets)   # Brute force
> 1000 loops, best of 3: 385 us per loop
>
> In [12]: %timeit foo2(element, T)         # cKDTree
> 10000 loops, best of 3: 83.5 us per loop
>
> In [13]: 385/83.5
> Out[13]: 4.610778443113772

Brute force plus cython beats cKDTree for knn with k=1 and Euclidean distance:

    I[5] targets = np.random.uniform(0, 8, (5000, 7))
    I[6] element = np.arange(1, 8, dtype=np.float64)

    I[7] T = scipy.spatial.cKDTree(targets)
    I[8] timeit T.query(element)
    10000 loops, best of 3: 36.1 us per loop

    I[9] timeit nn(targets, element)
    10000 loops, best of 3: 28.5 us per loop

What about lower dimensions (2 instead of 7) where cKDTree gets faster?

    I[18] element = np.arange(1,3, dtype=np.float64)
    I[19] targets = np.random.uniform(0,8,(5000,2))

    I[20] T = scipy.spatial.cKDTree(targets)
    I[21] timeit T.query(element)
    10000 loops, best of 3: 27.5 us per loop

    I[22] timeit nn(targets, element)
    100000 loops, best of 3: 11.6 us per loop

I could add nn to the bottleneck package. Is the k=1, Euclidean
distance case too specialized?

Prototype (messy) code is here:
https://github.com/kwgoodman/bottleneck/issues/45

For a smaller speed up (~2x) foo1 could use bottleneck's sum or
squares function, bn.ss().



More information about the NumPy-Discussion mailing list