Le 21 juil. 2014 à 10:09, Moritz Emanuel Beber a écrit :

Dear all,

My basic problem is that I would like to compute distances between vectors with missing values. You can find more detail in my question on SO (http://stackoverflow.com/questions/24781461/compute-the-pairwise-distance-in-scipy-with-missing-values). Since it seems this is not directly possible with scipy at the moment, I started to Cythonize my function. Currently, the below function is not much faster than my pure Python implementation, so I thought I'd ask the experts here. Note that even though I'm computing the euclidean distance, I'd like to make use of different distance metrics.

So my current attempt at Cythonizing is:

import numpy
cimport numpy
cimport cython
from numpy.linalg import norm

numpy.import_array()

@cython.boundscheck(False)
@cython.wraparound(False)
def masked_euclidean(numpy.ndarray[numpy.double_t, ndim=2] data):
    cdef Py_ssize_t m = data.shape[0]
    cdef Py_ssize_t i = 0
    cdef Py_ssize_t j = 0
    cdef Py_ssize_t k = 0
    cdef numpy.ndarray[numpy.double_t] dm = numpy.zeros(m * (m - 1) // 2, dtype=numpy.double)
    cdef numpy.ndarray[numpy.uint8_t, ndim=2, cast=True] mask = numpy.isfinite(data) # boolean
    for i in range(m - 1):
        for j in range(i + 1, m):
            curr = numpy.logical_and(mask[i], mask[j])
            u = data[i][curr]
            v = data[j][curr]
            dm[k] = norm(u - v)
            k += 1
    return dm

Maybe the lack of speed-up is due to the Python function 'norm'? So my question is, how to improve the Cython implementation? Or is there a completely different way of approaching this problem?

Thanks in advance,

I would suggest using the python --anotate option (or -a option of python magic in IPython notebook) ,it will show you the generated c-code with hints of which line is slow and why as a nice syntax highlighted html page.

You are right that `norm`, is slow, but apparently so is gitItem on data[]  and numpy.logical_and

-- 
M

Moritz
_______________________________________________
SciPy-Dev mailing list
SciPy-Dev@scipy.org
http://mail.scipy.org/mailman/listinfo/scipy-dev