Sturla Molden wrote:

We recently has a discussion regarding an optimization of NumPy's median to average O(n) complexity. After some searching, I found out there is a selection algorithm competitive in speed with Hoare's quick select. It has the advantage of being a lot simpler to implement. In plain Python:

import numpy as np

def wirthselect(array, k):

""" Niklaus Wirth's selection algortithm """

a = np.ascontiguousarray(array) if (a is array): a = a.copy()

l = 0 m = a.shape[0] - 1 while l < m: x = a[k] i = l j = m while 1: while a[i] < x: i += 1 while x < a[j]: j -= 1 if i <= j: tmp = a[i] a[i] = a[j] a[j] = tmp i += 1 j -= 1 if i > j: break if j < k: l = i if k < i: m = j

return a

Now, the median can be obtained in average O(n) time as:

def median(x):

""" median in average O(n) time """

n = x.shape[0] k = n >> 1 s = wirthselect(x, k) if n & 1: return s[k] else: return 0.5*(s[k]+s[:k].max())

The beauty of this is that Wirth select is extremely easy to migrate to Cython:

import numpy ctypedef numpy.double_t T # or whatever

def wirthselect(numpy.ndarray[T, ndim=1] array, int k):

cdef int i, j, l, m cdef T x, tmp cdef T *a

_array = np.ascontiguousarray(array) if (_array is array): _array = _array.copy() a = <T *> _array.data

l = 0 m = <int> a.shape[0] - 1

Nitpick: This will fail on large arrays. I guess numpy.npy_intp is the right type to use in this case? -- Dag Sverre