[SciPy-User] Moving median code

Wes McKinney wesmckinn at gmail.com
Mon Apr 25 13:08:12 EDT 2011


On Mon, Apr 25, 2011 at 1:03 PM, Keith Goodman <kwgoodman at gmail.com> wrote:
> On Mon, Apr 25, 2011 at 9:41 AM, Wes McKinney <wesmckinn at gmail.com> wrote:
>> On Mon, Apr 25, 2011 at 12:27 PM, Wes McKinney <wesmckinn at gmail.com> wrote:
>>> On Mon, Apr 25, 2011 at 12:25 PM, J. David Lee <johnl at cs.wisc.edu> wrote:
>>>> On 04/25/2011 11:01 AM, Keith Goodman wrote:
>>>>> On Mon, Apr 25, 2011 at 8:31 AM, J. David Lee<johnl at cs.wisc.edu>  wrote:
>>>>>> In working on a project recently, I wrote a moving median code that is
>>>>>> about 10x faster than scipy.ndimage.median_filter. It utilizes a linked
>>>>>> list structure to store values and track the median value. If anyone is
>>>>>> interested, I've posted the code here (about 150 lines):
>>>>>>
>>>>>> http://pages.cs.wisc.edu/~johnl/median_code/median_code.c
>>>>>
>>>>> So your code takes 1d input where the window size is the length of the
>>>>> input and then you move a step forward by dropping the oldest data
>>>>> point and adding a new one and then update the median? Just making
>>>>> sure I understand.
>>>>
>>>> Your understanding is correct, although I'm reusing the oldest node:
>>>> moving it to the beginning of the list (by index), updating its value,
>>>> and then moving it to the appropriate place in the value list using its
>>>> new value. Any time two nodes are swapped I'm checking if one is the
>>>> median value, and then swapping the median pointer if it is.
>>>
>>> I wonder how this compares speedwise with the indexable skiplist approach:
>>>
>>> http://rhettinger.wordpress.com/2010/02/06/lost-knowledge/
>>>
>>> I converted to Cython and put in pandas here last year:
>>>
>>> https://github.com/wesm/pandas/blob/master/pandas/lib/src/skiplist.pyx
>>> https://github.com/wesm/pandas/blob/master/pandas/lib/src/moments.pyx#L442
>>>
>>> This code definitely has "too much Python", I think, so a pure C
>>> version would be much faster.
>>
>> I took a peak at the C code-- looks like it's O(N * W). Maybe we
>> should invest in a pure C O(N log W) approach using the skiplist?
>
> Seems like there is a fair amount of interest in a moving median of
> numpy arrays.
>
> Due to my ignorance of the relative merits of each algorithm, could we
> wrap the C code of all three versions in cython and see which is
> fastest? Two algorithms are already coded in C. Making a cython
> function that only works on 1d, float64 input might serve as a good
> test bed.
>
> I can set up a temp github repo for a moving window median prototype
> if anyone is interested in working on it together.

Go for it...not sure when I'll have time to help but probably in a few weeks.

> I've never cython wrapped C code. If I had, I'd already have a moving
> window median.

It's very simple actually-- Cython docs have everything you need to know.

> _______________________________________________
> SciPy-User mailing list
> SciPy-User at scipy.org
> http://mail.scipy.org/mailman/listinfo/scipy-user
>



More information about the SciPy-User mailing list