[Numpy-discussion] my cython is slow

Thu Jan 8 14:56:02 EST 2009

On Thu, Jan 8, 2009 at 12:34 PM, Eric Firing <efiring at hawaii.edu> wrote:
> John Hunter wrote:
>> On Wed, Jan 7, 2009 at 5:37 PM, Eric Firing <efiring at hawaii.edu> wrote:
>>
>>> A couple small changes speed it up quite a bit:
>>>
>>> efiring at manini:~/temp/nnbf$   python test_nnbf.py
>>> loading data... this could take a while
>>> testing nnbf...
>>>    10 trials: mean=0.0150, min=0.0100
>>> testing numpy...
>>>    10 trials: mean=0.0660, min=0.0600
>>>
>>> It is all a matter of keeping Python objects and function calls out of inner
>>> loops.  I suspect there is quite a bit more that could be done in that
>>> regard, but I haven't looked.
>>
>> Much  faster, but no longer correct, as you'll see if you uncomment
>> out the nose test test_neighbors that compare actual vs desired.
>>
>> Is the pointer arithmetic correct:
>>
>>   dataptr + i
>>
>> I would have thought perhaps:
>>
>>   dataptr + i*n
>>
>> but this is segfaulting.  Do we need to use a stride?
>
> Sorry, I was too hasty.  Yes, it seems like i*n should be correct, but
> it isn't; we are missing something simple and fundamental here.  I don't
> see it immediately, and won't be able to look at it for a while.

OK, the code at

  > svn co https://matplotlib.svn.sourceforge.net/svnroot/matplotlib/trunk/py4science/examples/pyrex/nnbf

now passes the correctness tests and is approx ten time faster than
the numpy version.

I borrowed the "raw_data" idiom from Anne's  ckdtree.pyx, though I
don't really understand why it is different that what we were doing
with the data.data ptr.  I am also not sure if I need to be doing any
memory management when I resize the buffer and reset raw_data....

"""
A brute force nearest neighbor routine with incremental add.  The
internal array data structure grows as you add points
"""

import numpy as np
cimport numpy as np

cdef extern from "math.h":
     float sqrt(float)

cdef inline int is_neighbor(int n, double*row, double*pp, double d2max):
    """
    return 1 if the sum-of-squares of n length array row[j]-pp[j] <= d2max
    """
    cdef int j
    cdef double d, d2

    d2 = 0.

    for j in range(n):
        d = row[j] - pp[j]
        d2 += d*d
        if d2>d2max:
            return 0
    return 1

cdef class NNBF:
    cdef readonly object data
    cdef double* raw_data
    cdef readonly int n, numrows, numpoints

    def __init__(self, n):
        """
        create a buffer to hold n dimensional points
        """
        cdef np.ndarray[double, ndim=2] inner_data

        self.n = n
        self.numrows = 10000
        #  XXX how to create empty as contiguous w/o copy?
        data = np.empty((self.numrows, self.n), dtype=np.float)
        self.data = np.ascontiguousarray(data, dtype=np.float)
        inner_data = self.data
        self.raw_data = <double*>inner_data.data
        self.numpoints = 0

    def add(NNBF self, object point):
        """
        add a point to the buffer, grow if necessary
        """
        cdef np.ndarray[double, ndim=2] inner_data
        cdef np.ndarray[double, ndim=1] pp
        pp = np.array(point).astype(np.float)

        self.data[self.numpoints] = pp
        self.numpoints += 1
        if self.numpoints==self.numrows:
            ## XXX do I need to do memory management here, eg free
            ## raw_data if I were using it?
            self.numrows *= 2
            newdata = np.empty((self.numrows, self.n), np.float)
            newdata[:self.numpoints] = self.data
            self.data = np.ascontiguousarray(newdata, dtype=np.float)
            inner_data = self.data
            self.raw_data = <double*>inner_data.data

    def get_data(NNBF self):
        """
        return a copy of data added so far as a numpoints x n array
        """
        return self.data[:self.numpoints]

    def find_neighbors(NNBF self, object point, double radius):
        """
        return a list of indices into data which are within radius
        from point
        """
        cdef int i, neighbor, n
        cdef double d2max
        cdef np.ndarray[double, ndim=1] pp

        # avoid python array indexing in the inner loop
        if len(point)!=self.n:
            raise ValueError('Expected a length %d vector'%self.n)

        pp = np.asarray(point).astype(np.float)

        d2max = radius*radius
        neighbors = []

        # don't do a python lookup inside the loop
        n = self.n

        for i in range(self.numpoints):
            neighbor = is_neighbor(
                n,
                self.raw_data + i*n,
                <double*>pp.data,
                d2max)

            # if the number of points in the cluster is small, the
            # python list performance should not kill us
            if neighbor:
                neighbors.append(i)

        return neighbors

    def find_neighbors_numpy(self, point, radius):
        """
        do a plain ol numpy lookup to compare performance and output

          *data* is a numpoints x numdims array
          *point* is a numdims length vector
          radius is the max distance distance

        return an array of indices into data which are within radius
        """
        data = self.get_data()
        distance = data - point
        r = np.sqrt((distance*distance).sum(axis=1))
        return np.nonzero(r<=radius)[0]