efficient running median
Peter Otten
__peter__ at web.de
Wed Oct 14 10:53:00 EDT 2009
Janto Dreijer wrote:
> On Oct 13, 6:12 pm, Peter Otten <__pete... at web.de> wrote:
>> Janto Dreijer wrote:
>> > I'm looking for code that will calculate the running median of a
>> > sequence, efficiently. (I'm trying to subtract the running median from
>> > a signal to correct for gradual drift).
>>
>> > My naive attempt (taking the median of a sliding window) is
>> > unfortunately too slow as my sliding windows are quite large (~1k) and
>> > so are my sequences (~50k). On my PC it takes about 18 seconds per
>> > sequence. 17 of those seconds is spent in sorting the sliding windows.
>>
>> > I've googled around and it looks like there are some recent journal
>> > articles on it, but no code. Any suggestions?
>>
>> If you aren't using numpy, try that. Showing your code might also be a
>> good idea...
>>
>> Peter
>
> I placed the test code and its output here:
> http://bitbucket.org/janto/snippets/src/tip/running_median.py
That gives me something to tinker ;)
> I also have a version that uses numpy. On random data it seems to be
> about twice as fast as the pure python one. It spends half the time
> sorting and the other half converting the windows from python lists to
> numpy arrays.
> If the data is already sorted, the version using python's builtin sort
> outperforms the numpy convert-and-sort by about 5 times. Strangely
> satisfying :)
I was thinking of using as many of numpy's bulk operations as possible:
def running_median_numpy(seq):
data = array(seq, dtype=float)
result = []
for i in xrange(1, window_size):
window = data[:i]
result.append(median(window))
for i in xrange(len(data)-window_size+1):
window = data[i:i+window_size]
result.append(median(window))
return result
But it didn't help as much as I had hoped.
The fastest I came up with tries hard to keep the data sorted:
def running_median_insort(seq):
seq = iter(seq)
d = deque()
s = []
result = []
for item in islice(seq, window_size):
d.append(item)
insort(s, item)
result.append(s[len(d)//2])
m = window_size // 2
for item in seq:
old = d.popleft()
d.append(item)
del s[bisect_left(s, old)]
insort(s, item)
result.append(s[m])
return result
Some numbers:
10.197 seconds for running_median_scipy_medfilt
25.043 seconds for running_median_python
13.040 seconds for running_median_python_msort
14.280 seconds for running_median_python_scipy_median
4.024 seconds for running_median_numpy
0.221 seconds for running_median_insort
What would be an acceptable performance, by the way?
Peter
PS: code not tested for correctness
More information about the Python-list
mailing list