
Partly as an excuse to learn cython, and partly because I need to eke out some extra performance of a neighborhood search, I tried to code up a brute force neighborhood search in cython around an N-dimensional point p. I need to incrementally add a point, do a search, add another point, do another search, so some of the algorithms like those in scipy.stats.spatial which assume a static data structure with lots of searches over it probably won't help me. I wrote some cython code to grow a Npoints x Ndimensions numpy array (doubling in size every time I exceed Npoints) and then doing a brute force request for all points within a radius r using a euclidean distance. The code is working fine, but it is still slower than a simple numpy implementation (damn you numpy performance!) Here is the numpy code:: def find_neighbors_numpy(data, point, radius): """ do a plain ol numpy lookup to compare performance and output *data* is a numpoints x numdims array *point* is a numdims length vector radius is the max distance distance return an array of indices into data which are within radius """ numpoints, n = data.shape distance = data - point r = np.sqrt((distance*distance).sum(axis=1)) return np.nonzero(r<=radius)[0] This requires 6 passed through the data, so I should be able to beat it with a properly crafted cython algorithm. I have a class NNBF (Nearest Neighbor Brute Force) with an interface like NUMDIM = 6 nn = nnbf.NNBF(NUMDIM) print 'loading data... this could take a while' # this could take a while for i in range(200000): x = np.random.rand(NUMDIM) nn.add(x) x = np.random.rand(NUMDIM) radius = 0.2 ind = nn.find_neighbors(x, radius) (in my real use case I would be doing a search after every add) when I run this vs numpy, numpy is a little faster testing nnbf... 10 trials: mean=0.0420, min=0.0400 testing numpy... 10 trials: mean=0.0420, min=0.0300 You can grab the code, the python prototype, the test case and setup file at http://matplotlib.sf.net/nnbf.zip. You will need a fairly recent cython to build it: http://www.cython.org/Cython-0.10.3.tar.gz:: # build the extension and run the test code wget http://matplotlib.sf.net/nnbf.zip unzip nnbf.zip cd nnbf python setup.py build_ext --inplace python test_nnbf.py I'm pasting the cython code nnbf.pyx below. If anyone can advise me as to how to remove the bottlenecks (I've decorated the code with some XXX questions) that would be great. I've read the tutorial at http://wiki.cython.org/tutorials/numpy but I think I am still missing something important. I tried to follow Anne's code in ckdtree.pyx which makes use of a raw_data structure but I am not sure how/if to incorporate this into my code -- perhaps there is some juice there.... """ A brute force nearest neighbor routine with incremental add. The internal array data structure grows as you add points """ import numpy as np cimport numpy as np cdef extern from "math.h": float sqrt(float) cdef inline int is_neighbor(int n, double*row, double*pp, double d2max): """ return 1 if the sum-of-squares of n length array row[j]-pp[j] <= d2max """ cdef int j cdef double d, d2 d2 = 0. for j in range(n): d = row[j] - pp[j] d2 += d*d if d2>d2max: return 0 return 1 cdef class NNBF: cdef readonly object data #cdef double* raw_data cdef readonly int n, numrows, numpoints def __init__(self, n): """ create a buffer to hold n dimensional points """ #cdef np.ndarray[double, ndim=2] inner_data self.n = n self.numrows = 100 # XXX how to create empty as contiguous w/o copy? self.data = np.empty((self.numrows, self.n), dtype=np.float) #inner_data = self.data #self.raw_data = <double*>inner_data.data self.numpoints = 0 def add(NNBF self, object point): """ add a point to the buffer, grow if necessary """ #cdef np.ndarray[double, ndim=2] inner_data cdef np.ndarray[double, ndim=1] pp pp = np.asarray(point).astype(np.float) self.data[self.numpoints] = pp self.numpoints += 1 if self.numpoints==self.numrows: ## XXX do I need to do memory management here, eg free ## raw_data if I were using it? self.numrows *= 2 newdata = np.empty((self.numrows, self.n), np.float) newdata[:self.numpoints] = self.data self.data = newdata #self.raw_data = <double*>inner_data.data def get_data(NNBF self): """ return a copy of data added so far as a numpoints x n array """ return self.data[:self.numpoints] def find_neighbors(NNBF self, object point, double radius): """ return a list of indices into data which are within radius from point """ cdef int i, neighbor cdef double d2max cdef np.ndarray[double, ndim=1] pp cdef np.ndarray[double, ndim=1] row if len(point)!=self.n: raise ValueError('Expected a length %d vector'%self.n) pp = np.asarray(point).astype(np.float) d2max = radius*radius neighbors = [] for i in range(self.numpoints): # XXX : is there a more efficient way to access the row # data? Can/should we be using raw_data here? row = self.data[i] neighbor = is_neighbor( self.n, <double*>row.data, <double*>pp.data, d2max) # if the number of points in the cluster is small, the # python list performance should not kill us if neighbor: neighbors.append(i) return neighbors

John Hunter wrote:
Partly as an excuse to learn cython, and partly because I need to eke out some extra performance of a neighborhood search, I tried to code up a brute force neighborhood search in cython around an N-dimensional point p. I need to incrementally add a point, do a search, add another point, do another search, so some of the algorithms like those in scipy.stats.spatial which assume a static data structure with lots of searches over it probably won't help me.
I wrote some cython code to grow a Npoints x Ndimensions numpy array (doubling in size every time I exceed Npoints) and then doing a brute force request for all points within a radius r using a euclidean distance. The code is working fine, but it is still slower than a simple numpy implementation (damn you numpy performance!)
A couple small changes speed it up quite a bit: efiring@manini:~/temp/nnbf$ python test_nnbf.py loading data... this could take a while testing nnbf... 10 trials: mean=0.0150, min=0.0100 testing numpy... 10 trials: mean=0.0660, min=0.0600 It is all a matter of keeping Python objects and function calls out of inner loops. I suspect there is quite a bit more that could be done in that regard, but I haven't looked. Eric """ A brute force nearest neighbor routine with incremental add. The internal array data structure grows as you add points """ import numpy as np cimport numpy as np cdef extern from "math.h": float sqrt(float) cdef inline int is_neighbor(int n, double*row, double*pp, double d2max): """ return 1 if the sum-of-squares of n length array row[j]-pp[j] <= d2max """ cdef int j cdef double d, d2 d2 = 0. for j in range(n): d = row[j] - pp[j] d2 += d*d if d2>d2max: return 0 return 1 cdef class NNBF: cdef readonly object data #cdef double* raw_data cdef readonly int n, numrows, numpoints def __init__(self, n): """ create a buffer to hold n dimensional points """ #cdef np.ndarray[double, ndim=2] inner_data self.n = n self.numrows = 100 # XXX how to create mepty as contiguous w/o copy? self.data = np.empty((self.numrows, self.n), dtype=np.float) #inner_data = self.data #self.raw_data = <double*>inner_data.data self.numpoints = 0 def add(NNBF self, object point): """ add a point to the buffer, grow if necessary """ #cdef np.ndarray[double, ndim=2] inner_data cdef np.ndarray[double, ndim=1] pp pp = np.asarray(point).astype(np.float) self.data[self.numpoints] = pp self.numpoints += 1 if self.numpoints==self.numrows: ## XXX do I need to do memory management here, eg free ## raw_data if I were using it? self.numrows *= 2 newdata = np.empty((self.numrows, self.n), np.float) newdata[:self.numpoints] = self.data self.data = newdata #self.raw_data = <double*>inner_data.data def get_data(NNBF self): """ return a copy of data added so far as a numpoints x n array """ return self.data[:self.numpoints] def find_neighbors(NNBF self, object point, double radius): """ return a list of indices into data which are within radius from point """ cdef int i, neighbor, n cdef double d2max cdef np.ndarray[double, ndim=1] pp #cdef np.ndarray[double, ndim=1] row cdef double * dataptr dataptr = <double*> self.data.data if len(point)!=self.n: raise ValueError('Expected a length %d vector'%self.n) pp = np.asarray(point).astype(np.float) d2max = radius*radius neighbors = [] n = self.n for i in range(self.numpoints): # XXX : is there a more efficient way to access the row # data? Can/should we be using raw_data here? #row = self.data[i] neighbor = is_neighbor( n, #<double*>row.data, dataptr + i, <double*>pp.data, d2max) # if the number of points in the cluster is small, the # python list performance should not kill us if neighbor: neighbors.append(i) return neighbors

On Wed, Jan 7, 2009 at 5:37 PM, Eric Firing <efiring@hawaii.edu> wrote:
A couple small changes speed it up quite a bit:
efiring@manini:~/temp/nnbf$ python test_nnbf.py loading data... this could take a while testing nnbf... 10 trials: mean=0.0150, min=0.0100 testing numpy... 10 trials: mean=0.0660, min=0.0600
It is all a matter of keeping Python objects and function calls out of inner loops. I suspect there is quite a bit more that could be done in that regard, but I haven't looked.
Much faster, but no longer correct, as you'll see if you uncomment out the nose test test_neighbors that compare actual vs desired. Is the pointer arithmetic correct: dataptr + i I would have thought perhaps: dataptr + i*n but this is segfaulting. Do we need to use a stride? JDH

John Hunter wrote:
On Wed, Jan 7, 2009 at 5:37 PM, Eric Firing <efiring@hawaii.edu> wrote:
A couple small changes speed it up quite a bit:
efiring@manini:~/temp/nnbf$ python test_nnbf.py loading data... this could take a while testing nnbf... 10 trials: mean=0.0150, min=0.0100 testing numpy... 10 trials: mean=0.0660, min=0.0600
It is all a matter of keeping Python objects and function calls out of inner loops. I suspect there is quite a bit more that could be done in that regard, but I haven't looked.
Much faster, but no longer correct, as you'll see if you uncomment out the nose test test_neighbors that compare actual vs desired.
Is the pointer arithmetic correct:
dataptr + i
I would have thought perhaps:
dataptr + i*n
but this is segfaulting. Do we need to use a stride?
Sorry, I was too hasty. Yes, it seems like i*n should be correct, but it isn't; we are missing something simple and fundamental here. I don't see it immediately, and won't be able to look at it for a while. Eric
JDH _______________________________________________ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion

On Thu, Jan 8, 2009 at 12:34 PM, Eric Firing <efiring@hawaii.edu> wrote:
John Hunter wrote:
On Wed, Jan 7, 2009 at 5:37 PM, Eric Firing <efiring@hawaii.edu> wrote:
A couple small changes speed it up quite a bit:
efiring@manini:~/temp/nnbf$ python test_nnbf.py loading data... this could take a while testing nnbf... 10 trials: mean=0.0150, min=0.0100 testing numpy... 10 trials: mean=0.0660, min=0.0600
It is all a matter of keeping Python objects and function calls out of inner loops. I suspect there is quite a bit more that could be done in that regard, but I haven't looked.
Much faster, but no longer correct, as you'll see if you uncomment out the nose test test_neighbors that compare actual vs desired.
Is the pointer arithmetic correct:
dataptr + i
I would have thought perhaps:
dataptr + i*n
but this is segfaulting. Do we need to use a stride?
Sorry, I was too hasty. Yes, it seems like i*n should be correct, but it isn't; we are missing something simple and fundamental here. I don't see it immediately, and won't be able to look at it for a while.
OK, the code at
svn co https://matplotlib.svn.sourceforge.net/svnroot/matplotlib/trunk/py4science/e...
now passes the correctness tests and is approx ten time faster than the numpy version. I borrowed the "raw_data" idiom from Anne's ckdtree.pyx, though I don't really understand why it is different that what we were doing with the data.data ptr. I am also not sure if I need to be doing any memory management when I resize the buffer and reset raw_data.... """ A brute force nearest neighbor routine with incremental add. The internal array data structure grows as you add points """ import numpy as np cimport numpy as np cdef extern from "math.h": float sqrt(float) cdef inline int is_neighbor(int n, double*row, double*pp, double d2max): """ return 1 if the sum-of-squares of n length array row[j]-pp[j] <= d2max """ cdef int j cdef double d, d2 d2 = 0. for j in range(n): d = row[j] - pp[j] d2 += d*d if d2>d2max: return 0 return 1 cdef class NNBF: cdef readonly object data cdef double* raw_data cdef readonly int n, numrows, numpoints def __init__(self, n): """ create a buffer to hold n dimensional points """ cdef np.ndarray[double, ndim=2] inner_data self.n = n self.numrows = 10000 # XXX how to create empty as contiguous w/o copy? data = np.empty((self.numrows, self.n), dtype=np.float) self.data = np.ascontiguousarray(data, dtype=np.float) inner_data = self.data self.raw_data = <double*>inner_data.data self.numpoints = 0 def add(NNBF self, object point): """ add a point to the buffer, grow if necessary """ cdef np.ndarray[double, ndim=2] inner_data cdef np.ndarray[double, ndim=1] pp pp = np.array(point).astype(np.float) self.data[self.numpoints] = pp self.numpoints += 1 if self.numpoints==self.numrows: ## XXX do I need to do memory management here, eg free ## raw_data if I were using it? self.numrows *= 2 newdata = np.empty((self.numrows, self.n), np.float) newdata[:self.numpoints] = self.data self.data = np.ascontiguousarray(newdata, dtype=np.float) inner_data = self.data self.raw_data = <double*>inner_data.data def get_data(NNBF self): """ return a copy of data added so far as a numpoints x n array """ return self.data[:self.numpoints] def find_neighbors(NNBF self, object point, double radius): """ return a list of indices into data which are within radius from point """ cdef int i, neighbor, n cdef double d2max cdef np.ndarray[double, ndim=1] pp # avoid python array indexing in the inner loop if len(point)!=self.n: raise ValueError('Expected a length %d vector'%self.n) pp = np.asarray(point).astype(np.float) d2max = radius*radius neighbors = [] # don't do a python lookup inside the loop n = self.n for i in range(self.numpoints): neighbor = is_neighbor( n, self.raw_data + i*n, <double*>pp.data, d2max) # if the number of points in the cluster is small, the # python list performance should not kill us if neighbor: neighbors.append(i) return neighbors def find_neighbors_numpy(self, point, radius): """ do a plain ol numpy lookup to compare performance and output *data* is a numpoints x numdims array *point* is a numdims length vector radius is the max distance distance return an array of indices into data which are within radius """ data = self.get_data() distance = data - point r = np.sqrt((distance*distance).sum(axis=1)) return np.nonzero(r<=radius)[0]

Some of the problems you encounter could probably be remedied by better support in Cython for some situations. I've filed two feature request tickets for myself, but I have no idea when or if I'll get around to them. http://trac.cython.org/cython_trac/ticket/177 http://trac.cython.org/cython_trac/ticket/178 Dag Sverre John Hunter wrote:
On Thu, Jan 8, 2009 at 12:34 PM, Eric Firing <efiring@hawaii.edu> wrote:
John Hunter wrote:
On Wed, Jan 7, 2009 at 5:37 PM, Eric Firing <efiring@hawaii.edu> wrote:
A couple small changes speed it up quite a bit:
efiring@manini:~/temp/nnbf$ python test_nnbf.py loading data... this could take a while testing nnbf... 10 trials: mean=0.0150, min=0.0100 testing numpy... 10 trials: mean=0.0660, min=0.0600
It is all a matter of keeping Python objects and function calls out of inner loops. I suspect there is quite a bit more that could be done in that regard, but I haven't looked. Much faster, but no longer correct, as you'll see if you uncomment out the nose test test_neighbors that compare actual vs desired.
Is the pointer arithmetic correct:
dataptr + i
I would have thought perhaps:
dataptr + i*n
but this is segfaulting. Do we need to use a stride? Sorry, I was too hasty. Yes, it seems like i*n should be correct, but it isn't; we are missing something simple and fundamental here. I don't see it immediately, and won't be able to look at it for a while.
OK, the code at
svn co https://matplotlib.svn.sourceforge.net/svnroot/matplotlib/trunk/py4science/e...
now passes the correctness tests and is approx ten time faster than the numpy version.
I borrowed the "raw_data" idiom from Anne's ckdtree.pyx, though I don't really understand why it is different that what we were doing with the data.data ptr. I am also not sure if I need to be doing any memory management when I resize the buffer and reset raw_data....
""" A brute force nearest neighbor routine with incremental add. The internal array data structure grows as you add points """
import numpy as np cimport numpy as np
cdef extern from "math.h": float sqrt(float)
cdef inline int is_neighbor(int n, double*row, double*pp, double d2max): """ return 1 if the sum-of-squares of n length array row[j]-pp[j] <= d2max """ cdef int j cdef double d, d2
d2 = 0.
for j in range(n): d = row[j] - pp[j] d2 += d*d if d2>d2max: return 0 return 1
cdef class NNBF: cdef readonly object data cdef double* raw_data cdef readonly int n, numrows, numpoints
def __init__(self, n): """ create a buffer to hold n dimensional points """ cdef np.ndarray[double, ndim=2] inner_data
self.n = n self.numrows = 10000 # XXX how to create empty as contiguous w/o copy? data = np.empty((self.numrows, self.n), dtype=np.float) self.data = np.ascontiguousarray(data, dtype=np.float) inner_data = self.data self.raw_data = <double*>inner_data.data self.numpoints = 0
def add(NNBF self, object point): """ add a point to the buffer, grow if necessary """ cdef np.ndarray[double, ndim=2] inner_data cdef np.ndarray[double, ndim=1] pp pp = np.array(point).astype(np.float)
self.data[self.numpoints] = pp self.numpoints += 1 if self.numpoints==self.numrows: ## XXX do I need to do memory management here, eg free ## raw_data if I were using it? self.numrows *= 2 newdata = np.empty((self.numrows, self.n), np.float) newdata[:self.numpoints] = self.data self.data = np.ascontiguousarray(newdata, dtype=np.float) inner_data = self.data self.raw_data = <double*>inner_data.data
def get_data(NNBF self): """ return a copy of data added so far as a numpoints x n array """ return self.data[:self.numpoints]
def find_neighbors(NNBF self, object point, double radius): """ return a list of indices into data which are within radius from point """ cdef int i, neighbor, n cdef double d2max cdef np.ndarray[double, ndim=1] pp
# avoid python array indexing in the inner loop if len(point)!=self.n: raise ValueError('Expected a length %d vector'%self.n)
pp = np.asarray(point).astype(np.float)
d2max = radius*radius neighbors = []
# don't do a python lookup inside the loop n = self.n
for i in range(self.numpoints): neighbor = is_neighbor( n, self.raw_data + i*n, <double*>pp.data, d2max)
# if the number of points in the cluster is small, the # python list performance should not kill us if neighbor: neighbors.append(i)
return neighbors
def find_neighbors_numpy(self, point, radius): """ do a plain ol numpy lookup to compare performance and output
*data* is a numpoints x numdims array *point* is a numdims length vector radius is the max distance distance
return an array of indices into data which are within radius """ data = self.get_data() distance = data - point r = np.sqrt((distance*distance).sum(axis=1)) return np.nonzero(r<=radius)[0] _______________________________________________ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
-- Dag Sverre
participants (3)
-
Dag Sverre Seljebotn
-
Eric Firing
-
John Hunter