[SciPy-User] Moving median code
Wes McKinney
wesmckinn at gmail.com
Sun May 1 15:17:39 EDT 2011
On Sun, May 1, 2011 at 3:11 PM, Wes McKinney <wesmckinn at gmail.com> wrote:
> On Sun, May 1, 2011 at 3:08 PM, Keith Goodman <kwgoodman at gmail.com> wrote:
>> On Mon, Apr 25, 2011 at 8:31 AM, J. David Lee <johnl at cs.wisc.edu> wrote:
>>> In working on a project recently, I wrote a moving median code that is
>>> about 10x faster than scipy.ndimage.median_filter. It utilizes a linked
>>> list structure to store values and track the median value. If anyone is
>>> interested, I've posted the code here (about 150 lines):
>>>
>>> http://pages.cs.wisc.edu/~johnl/median_code/median_code.c
>>
>> I wrapped your moving window median code in cython. Here is how the
>> timing scales with window size:
>>
>>>> a = np.random.rand(1e6)
>>>> timeit move_median(a, window=10)
>> 10 loops, best of 3: 44.4 ms per loop
>>>> timeit move_median(a, window=100)
>> 10 loops, best of 3: 179 ms per loop
>>>> timeit move_median(a, window=1000)
>> 1 loops, best of 3: 1.96 s per loop
>>>> timeit move_median(a, window=10000)
>> 1 loops, best of 3: 48.3 s per loop
>>
>> It would be interesting to compare the timings with a double heap
>> algorithm [1] and with a C version of Wes's skiplist. I'll try to wrap
>> the double heap next when I get a chance. BTW, what license are you
>> using for your code?
>>
>> Here's the code I used. It's the first wrap I've done so suggestions
>> welcomed. Now that I sort of know how, I'm going to wrap everything I
>> see!
>>
>> import numpy as np
>> cimport numpy as np
>> import cython
>>
>> cdef extern from "cmove_median.c":
>> struct mm_node
>> struct mm_list:
>> np.npy_int64 len
>> mm_node *head
>> mm_node *tail
>> mm_node *min_node
>> mm_node *med_node
>> void mm_init_median(mm_list *mm)
>> void mm_insert_init(mm_list *mm, np.npy_float64 val)
>> void mm_update(mm_list *mm, np.npy_float64 val)
>> np.npy_float64 mm_get_median(mm_list *mm)
>> void mm_free(mm_list *mm)
>> np.npy_float64 mm_get_median(mm_list *mm)
>> mm_list mm_new(np.npy_int64 len)
>>
>> def move_median(np.ndarray[np.float64_t, ndim=1] a, int window):
>> cdef int n = a.size, i
>> if window > n:
>> raise ValueError("`window` must be less than a.size.")
>> if window < 2:
>> raise ValueError("I get a segfault when `window` is 1.")
>> cdef np.ndarray[np.float64_t, ndim=1] y = np.empty(n)
>> cdef mm_list mm = mm_new(window)
>> for i in range(window):
>> mm_insert_init(cython.address(mm), a[i])
>> y[i] = np.nan
>> mm_init_median(cython.address(mm))
>> y[window-1] = mm_get_median(cython.address(mm))
>> for i in range(window, n):
>> mm_update(cython.address(mm), a[i])
>> y[i] = mm_get_median(cython.address(mm))
>> mm_free(cython.address(mm))
>> return y
>>
>> [1] http://home.tiac.net/~cri_a/source_code/index.html#winmedian_pkg
>> _______________________________________________
>> SciPy-User mailing list
>> SciPy-User at scipy.org
>> http://mail.scipy.org/mailman/listinfo/scipy-user
>>
>
> assuming a compatible license do you want to make a little github repo
> like you mentioned? I have been interested in this problem for a while
>
you can see that the skiplist method is very much O(N log W). a lot of
python overhead though, obviously
In [16]: timeit rolling_median(arr, 10)
1 loops, best of 3: 2.95 s per loop
In [17]: timeit rolling_median(arr, 100)
1 loops, best of 3: 4.06 s per loop
In [18]: timeit rolling_median(arr, 1000)
1 loops, best of 3: 5.56 s per loop
In [19]: timeit rolling_median(arr, 10000)
1 loops, best of 3: 7.78 s per loop
probably the most optimal solution would be to have 2 algorithms: one
for small window sizes (where linear updates are OK) and then switch
over when your big-O constants "cross over" for large enough window
size
More information about the SciPy-User
mailing list