efficient running median

Wed Oct 14 10:53:00 EDT 2009

Janto Dreijer wrote:

> On Oct 13, 6:12 pm, Peter Otten <__pete... at web.de> wrote:
>> Janto Dreijer wrote:
>> > I'm looking for code that will calculate the running median of a
>> > sequence, efficiently. (I'm trying to subtract the running median from
>> > a signal to correct for gradual drift).
>>
>> > My naive attempt (taking the median of a sliding window) is
>> > unfortunately too slow as my sliding windows are quite large (~1k) and
>> > so are my sequences (~50k). On my PC it takes about 18 seconds per
>> > sequence. 17 of those seconds is spent in sorting the sliding windows.
>>
>> > I've googled around and it looks like there are some recent journal
>> > articles on it, but no code. Any suggestions?
>>
>> If you aren't using numpy, try that. Showing your code might also be a
>> good idea...
>>
>> Peter
> 
> I placed the test code and its output here:
> http://bitbucket.org/janto/snippets/src/tip/running_median.py

That gives me something to tinker ;)

> I also have a version that uses numpy. On random data it seems to be
> about twice as fast as the pure python one. It spends half the time
> sorting and the other half converting the windows from python lists to
> numpy arrays.
> If the data is already sorted, the version using python's builtin sort
> outperforms the numpy convert-and-sort by about 5 times. Strangely
> satisfying :)

I was thinking of using as many of numpy's bulk operations as possible:

def running_median_numpy(seq):
	data = array(seq, dtype=float)
	result = []
	for i in xrange(1, window_size):
		window = data[:i]
		result.append(median(window))
	for i in xrange(len(data)-window_size+1):
		window = data[i:i+window_size]
		result.append(median(window))
	return result

But it didn't help as much as I had hoped.
The fastest I came up with tries hard to keep the data sorted:

def running_median_insort(seq):
	seq = iter(seq)
	d = deque()
	s = []
	result = []
	for item in islice(seq, window_size):
		d.append(item)
		insort(s, item)
		result.append(s[len(d)//2])
	m = window_size // 2
	for item in seq:
		old = d.popleft()
		d.append(item)
		del s[bisect_left(s, old)]
		insort(s, item)
		result.append(s[m])
	return result

Some numbers:

10.197 seconds for running_median_scipy_medfilt
25.043 seconds for running_median_python
13.040 seconds for running_median_python_msort
14.280 seconds for running_median_python_scipy_median
4.024 seconds for running_median_numpy
0.221 seconds for running_median_insort

What would be an acceptable performance, by the way?

Peter

PS: code not tested for correctness