Sturla Molden wrote:
We recently has a discussion regarding an optimization of NumPy's median to average O(n) complexity. After some searching, I found out there is a selection algorithm competitive in speed with Hoare's quick select. It has the advantage of being a lot simpler to implement. In plain Python:
import numpy as np
def wirthselect(array, k):
""" Niklaus Wirth's selection algortithm """
a = np.ascontiguousarray(array) if (a is array): a = a.copy()
l = 0 m = a.shape[0] - 1 while l < m: x = a[k] i = l j = m while 1: while a[i] < x: i += 1 while x < a[j]: j -= 1 if i <= j: tmp = a[i] a[i] = a[j] a[j] = tmp i += 1 j -= 1 if i > j: break if j < k: l = i if k < i: m = j
return a
Now, the median can be obtained in average O(n) time as:
def median(x):
""" median in average O(n) time """
n = x.shape[0] k = n >> 1 s = wirthselect(x, k) if n & 1: return s[k] else: return 0.5*(s[k]+s[:k].max())
The beauty of this is that Wirth select is extremely easy to migrate to Cython:
import numpy ctypedef numpy.double_t T # or whatever
def wirthselect(numpy.ndarray[T, ndim=1] array, int k):
cdef int i, j, l, m cdef T x, tmp cdef T *a
_array = np.ascontiguousarray(array) if (_array is array): _array = _array.copy() a = <T *> _array.data
l = 0 m = <int> a.shape[0] - 1
Nitpick: This will fail on large arrays. I guess numpy.npy_intp is the right type to use in this case? -- Dag Sverre