A faster median (Wirth's method)
We recently has a discussion regarding an optimization of NumPy's median to average O(n) complexity. After some searching, I found out there is a selection algorithm competitive in speed with Hoare's quick select. It has the advantage of being a lot simpler to implement. In plain Python:
import numpy as np
def wirthselect(array, k):
""" Niklaus Wirth's selection algortithm """
a = np.ascontiguousarray(array) if (a is array): a = a.copy()
l = 0 m = a.shape[0]  1 while l < m: x = a[k] i = l j = m while 1: while a[i] < x: i += 1 while x < a[j]: j = 1 if i <= j: tmp = a[i] a[i] = a[j] a[j] = tmp i += 1 j = 1 if i > j: break if j < k: l = i if k < i: m = j
return a
Now, the median can be obtained in average O(n) time as:
def median(x):
""" median in average O(n) time """
n = x.shape[0] k = n >> 1 s = wirthselect(x, k) if n & 1: return s[k] else: return 0.5*(s[k]+s[:k].max())
The beauty of this is that Wirth select is extremely easy to migrate to Cython:
import numpy ctypedef numpy.double_t T # or whatever
def wirthselect(numpy.ndarray[T, ndim=1] array, int k):
cdef int i, j, l, m cdef T x, tmp cdef T *a
_array = np.ascontiguousarray(array) if (_array is array): _array = _array.copy() a = <T *> _array.data
l = 0 m = <int> a.shape[0]  1 with nogil: while l < m: x = a[k] i = l j = m while 1: while a[i] < x: i += 1 while x < a[j]: j = 1 if i <= j: tmp = a[i] a[i] = a[j] a[j] = tmp i += 1 j = 1 if i > j: break if j < k: l = i if k < i: m = j
return _array
For example, we could have a small script that generates withselect for all NumPy dtypes (T as template), and use a dict as jump table.
Chad, you can continue to write quick select using NumPy's C quick sort in numpy/core/src/_sortmodule.c.src. When you are done, it might be about 10% faster than this. :)
Reference: http://ndevilla.free.fr/median/median.pdf
Best regards, Sturla Molden
Sturla Molden skrev:
We recently has a discussion regarding an optimization of NumPy's median to average O(n) complexity. After some searching, I found out there is a selection algorithm competitive in speed with Hoare's quick select.
Reference: http://ndevilla.free.fr/median/median.pdf
After som more googling, I found this text by Wirth himself:
http://www.oberon2005.ru/book/ads2004.pdf
The method is described on page 61 (57 in the PDF) as Hoare's quick select. So it seems it's just a less optimized version than that of Numerical Receipes, and the first reference (Devillard) was confused. Anyhow, it still has the advantage of looking nice in Cython and being very different from the Numerical Receipes code. We can rename wirthselect to quickselect then. Sorry for the confusion. I should have checked the source better.
Sturla Molden
Sturla Molden wrote:
We recently has a discussion regarding an optimization of NumPy's median to average O(n) complexity. After some searching, I found out there is a selection algorithm competitive in speed with Hoare's quick select. It has the advantage of being a lot simpler to implement. In plain Python:
import numpy as np
def wirthselect(array, k):
""" Niklaus Wirth's selection algortithm """ a = np.ascontiguousarray(array) if (a is array): a = a.copy() l = 0 m = a.shape[0]  1 while l < m: x = a[k] i = l j = m while 1: while a[i] < x: i += 1 while x < a[j]: j = 1 if i <= j: tmp = a[i] a[i] = a[j] a[j] = tmp i += 1 j = 1 if i > j: break if j < k: l = i if k < i: m = j return a
Now, the median can be obtained in average O(n) time as:
def median(x):
""" median in average O(n) time """ n = x.shape[0] k = n >> 1 s = wirthselect(x, k) if n & 1: return s[k] else: return 0.5*(s[k]+s[:k].max())
The beauty of this is that Wirth select is extremely easy to migrate to Cython:
import numpy ctypedef numpy.double_t T # or whatever
def wirthselect(numpy.ndarray[T, ndim=1] array, int k):
cdef int i, j, l, m cdef T x, tmp cdef T *a _array = np.ascontiguousarray(array) if (_array is array): _array = _array.copy() a = <T *> _array.data l = 0 m = <int> a.shape[0]  1
Nitpick: This will fail on large arrays. I guess numpy.npy_intp is the right type to use in this case?
Dag Sverre Seljebotn skrev:
Nitpick: This will fail on large arrays. I guess numpy.npy_intp is the right type to use in this case?
By the way, here is a more polished version, does it look ok?
http://projects.scipy.org/numpy/attachment/ticket/1213/generate_qselect.py http://projects.scipy.org/numpy/attachment/ticket/1213/quickselect.pyx
Cython needs something like Java's generics by the way :)
Regards, Sturla Molden
Sturla Molden skrev:
http://projects.scipy.org/numpy/attachment/ticket/1213/generate_qselect.py http://projects.scipy.org/numpy/attachment/ticket/1213/quickselect.pyx
My suggestion for a new median function is here: http://projects.scipy.org/numpy/attachment/ticket/1213/median.py
The quickselect extension module can be compiled from C source: http://projects.scipy.org/numpy/attachment/ticket/1213/quickselect.c.gz
Feel free to look at it when you have time. :)
Regards, Sturla Molden
Hello Sturla, I had a quick look at your code. Looks fine.
A few notes...
In "select" you should replace numpy with np.
In "_median" how can you, if n==2, use s[] if s is not defined? What if n==1? Also, I think when returning an empty array, it should be of the same type you would get in the other cases.
You could replace _median with the following.
Best, Luca
def _median(x, inplace): assert(x.ndim == 1) n = x.shape[0] if n > 2: k = n >> 1 s = select(x, k, inplace=inplace) if n & 1: return s[k] else: return 0.5*(s[k]+s[:k].max()) elif n == 0: return np.empty(0, dtype=x.dtype) elif n == 2: return 0.5*(x[0]+x[1]) else: # n == 1 return x[0]
Citi, Luca skrev:
Hello Sturla, In "_median" how can you, if n==2, use s[] if s is not defined? What if n==1?
That was a typo.
Also, I think when returning an empty array, it should be of the same type you would get in the other cases.
Currently median returns numpy.nan for empty input arrays. I'll do that instead. I want it to behave exactly as the current implementation, except for the sorting.
Sturla
Sturla Molden wrote:
Dag Sverre Seljebotn skrev:
Nitpick: This will fail on large arrays. I guess numpy.npy_intp is the right type to use in this case?
By the way, here is a more polished version, does it look ok?
http://projects.scipy.org/numpy/attachment/ticket/1213/generate_qselect.py http://projects.scipy.org/numpy/attachment/ticket/1213/quickselect.pyx
I didn't look at the algorithm, but the types look OK (except for the gil as you say). Comments:
a) Is the cast to numpy.npy_intp really needed? I'm pretty sure shape is defined as numpy.npy_intp*. b) If you want higher performance with contiguous arrays (which occur a lot as inplace=False is default I guess) you can do
np.ndarray[T, ndim=1, mode="c"]
to tell the compiler the array is contiguous. That doubles the number of function instances though...
Cython needs something like Java's generics by the way :)
Yes, we all long for that. It will come as soon as somebody volunteers I suppose  it shouldn't be all that difficult, but I don't think any of the existing devs will be up for it any time soon.
Dag Sverre
Dag Sverre Seljebotn skrev:
a) Is the cast to numpy.npy_intp really needed? I'm pretty sure shape is
defined as numpy.npy_intp*.
I don't know Cython internals in detail but you do, I so take your word for it. I thought shape was a tuple of Python ints.
b) If you want higher performance with contiguous arrays (which occur a lot as inplace=False is default I guess) you can do
np.ndarray[T, ndim=1, mode="c"]
to tell the compiler the array is contiguous. That doubles the number of function instances though...
Thanks. I could either double the number of specialized select functions, or I could make a local copy using numpy.ascontiguousarray in the select function.
Quickselect touch the discontiguous array on average 2*n times, whereas numpy.ascontiguousarray touch the discontiguous array n times (but in orderly). Then there is the question of cache use: Contiguous arrays are the more friendly case, and numpy.ascontiguousarray is more friendly than quickselect. Also if quickselect is not done inplace (the common case for medians), we always have contigous arrays, so mode="c" is almost always wanted. And when quickselect is done inplace, we usually have a contiguous input. This is also why I used a C pointer instead of your buffer syntax in the first version. Then I changed my mind, not sure why. So I'll try with a local copy first then. I don't think we want close to a megabyte of Cython generated gibberish C just for the median.
Sturla Molden
On Wed, 2 Sep 2009, Dag Sverre Seljebotn wrote:
Sturla Molden wrote:
Dag Sverre Seljebotn skrev:
Nitpick: This will fail on large arrays. I guess numpy.npy_intp is the right type to use in this case?
By the way, here is a more polished version, does it look ok?
http://projects.scipy.org/numpy/attachment/ticket/1213/generate_qselect.py http://projects.scipy.org/numpy/attachment/ticket/1213/quickselect.pyx
I didn't look at the algorithm, but the types look OK (except for the gil as you say). Comments:
a) Is the cast to numpy.npy_intp really needed? I'm pretty sure shape is defined as numpy.npy_intp*. b) If you want higher performance with contiguous arrays (which occur a lot as inplace=False is default I guess) you can do
np.ndarray[T, ndim=1, mode="c"]
to tell the compiler the array is contiguous. That doubles the number of function instances though...
Cython needs something like Java's generics by the way :)
Yes, we all long for that. It will come as soon as somebody volunteers I suppose  it shouldn't be all that difficult, but I don't think any of the existing devs will be up for it any time soon.
Danilo's C++ project has some baby steps in that direction, though it'll need to be expanded quite a bit to handle this.
 Robert
On Tue, Sep 1, 2009 at 2:37 PM, Sturla Molden sturla@molden.no wrote:
Dag Sverre Seljebotn skrev:
Nitpick: This will fail on large arrays. I guess numpy.npy_intp is the right type to use in this case?
By the way, here is a more polished version, does it look ok?
http://projects.scipy.org/numpy/attachment/ticket/1213/generate_qselect.py http://projects.scipy.org/numpy/attachment/ticket/1213/quickselect.pyx
This is my favorite numpy/scipy ticket. So I am happy that I can contribute in a small way by pointing out a bug. The search for the kth smallest element is only done over the first k elements (that's the bug) instead of over the entire array. Specifically "while l < k" should be "while l < r".
I added a median function to the Bottleneck package: https://github.com/kwgoodman/bottleneck
Timings:
import bottleneck as bn arr = np.random.rand(100, 100) timeit np.median(arr)
1000 loops, best of 3: 762 us per loop
timeit bn.median(arr)
10000 loops, best of 3: 198 us per loop
What other functions could be built from a selection algorithm?
nanmedian scoreatpercentile quantile knn select others?
But before I add more functions to the package I need to figure out how to make a cython apply_along_axis function. For the first release I am hand coding the 1d, 2d, and 3d cases. Boring to write, hard to maintain, and doesn't solve the nd case.
Does anyone have a cython apply_along_axis that takes a cython reducing function as input? The ticket has an example but I couldn't get it to run. If no one has one (the horror!) I'll begin to work on one sometime after the first release.
I am very interested in this result. I have wanted to know how to do an apply_along_axis function for a while now.
On Tue, Nov 30, 2010 at 11:21 AM, Keith Goodman kwgoodman@gmail.com wrote:
On Tue, Sep 1, 2009 at 2:37 PM, Sturla Molden sturla@molden.no wrote:
Dag Sverre Seljebotn skrev:
Nitpick: This will fail on large arrays. I guess numpy.npy_intp is the right type to use in this case?
By the way, here is a more polished version, does it look ok?
http://projects.scipy.org/numpy/attachment/ticket/1213/generate_qselect.py
http://projects.scipy.org/numpy/attachment/ticket/1213/quickselect.pyx
This is my favorite numpy/scipy ticket. So I am happy that I can contribute in a small way by pointing out a bug. The search for the kth smallest element is only done over the first k elements (that's the bug) instead of over the entire array. Specifically "while l < k" should be "while l < r".
I added a median function to the Bottleneck package: https://github.com/kwgoodman/bottleneck
Timings:
import bottleneck as bn arr = np.random.rand(100, 100) timeit np.median(arr)
1000 loops, best of 3: 762 us per loop
timeit bn.median(arr)
10000 loops, best of 3: 198 us per loop
What other functions could be built from a selection algorithm?
nanmedian scoreatpercentile quantile knn select others?
But before I add more functions to the package I need to figure out how to make a cython apply_along_axis function. For the first release I am hand coding the 1d, 2d, and 3d cases. Boring to write, hard to maintain, and doesn't solve the nd case.
Does anyone have a cython apply_along_axis that takes a cython reducing function as input? The ticket has an example but I couldn't get it to run. If no one has one (the horror!) I'll begin to work on one sometime after the first release. _______________________________________________ NumPyDiscussion mailing list NumPyDiscussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpydiscussion
On Tue, Nov 30, 2010 at 11:25 AM, John Salvatier jsalvati@u.washington.edu wrote:
I am very interested in this result. I have wanted to know how to do an
My first thought was to write the reducing function like this
cdef np.float64_t namean(np.ndarray[np.float64_t, ndim=1] a):
but cython doesn't allow np.ndarray in a cdef.
That's why the ticket (URL earlier in the thread) uses a (the?) buffer interface to the array. The particular example in the ticket is not a reducing function, it works on the data in place. We can change that.
I can set up a sandbox in the Bottleneck project if anyone (John!) is interested in working on this. I plan to get a first (preview) release out soon and then take a break. The first project for the second release is this; second project is templating. Then the third release is just turning the crank and adding more functions.
Hi,
On Tue, Nov 30, 2010 at 11:35 AM, Keith Goodman kwgoodman@gmail.com wrote:
On Tue, Nov 30, 2010 at 11:25 AM, John Salvatier jsalvati@u.washington.edu wrote:
I am very interested in this result. I have wanted to know how to do an
My first thought was to write the reducing function like this
cdef np.float64_t namean(np.ndarray[np.float64_t, ndim=1] a):
but cython doesn't allow np.ndarray in a cdef.
Sorry for the illconsidered hasty reply, but do you mean that this:
import numpy as np cimport numpy as cnp
cdef cnp.float64_t namean(cnp.ndarray[cnp.float64_t, ndim=1] a): return np.nanmean(a) # just a placeholder
is not allowed? It works for me. Is it a cython version thing? (I've got 0.13),
See you,
Matthew
On Tue, Nov 30, 2010 at 11:58 AM, Matthew Brett matthew.brett@gmail.com wrote:
Hi,
On Tue, Nov 30, 2010 at 11:35 AM, Keith Goodman kwgoodman@gmail.com wrote:
On Tue, Nov 30, 2010 at 11:25 AM, John Salvatier jsalvati@u.washington.edu wrote:
I am very interested in this result. I have wanted to know how to do an
My first thought was to write the reducing function like this
cdef np.float64_t namean(np.ndarray[np.float64_t, ndim=1] a):
but cython doesn't allow np.ndarray in a cdef.
Sorry for the illconsidered hasty reply, but do you mean that this:
import numpy as np cimport numpy as cnp
cdef cnp.float64_t namean(cnp.ndarray[cnp.float64_t, ndim=1] a): return np.nanmean(a) # just a placeholder
is not allowed? It works for me. Is it a cython version thing? (I've got 0.13),
Oh, that's nice! I'm using 0.11.2. OK, time to upgrade.
I think last time I looked into how to apply a function along an axis I thought that the PyArray_IterAllButAxis would not work for that task ( http://docs.scipy.org/doc/numpy/reference/capi.array.html#PyArray_IterAllBu...), but I think perhaps I misunderstood it. I'm looking into how to use it.
On Tue, Nov 30, 2010 at 12:06 PM, Keith Goodman kwgoodman@gmail.com wrote:
On Tue, Nov 30, 2010 at 11:58 AM, Matthew Brett matthew.brett@gmail.com wrote:
Hi,
On Tue, Nov 30, 2010 at 11:35 AM, Keith Goodman kwgoodman@gmail.com
wrote:
On Tue, Nov 30, 2010 at 11:25 AM, John Salvatier jsalvati@u.washington.edu wrote:
I am very interested in this result. I have wanted to know how to do an
My first thought was to write the reducing function like this
cdef np.float64_t namean(np.ndarray[np.float64_t, ndim=1] a):
but cython doesn't allow np.ndarray in a cdef.
Sorry for the illconsidered hasty reply, but do you mean that this:
import numpy as np cimport numpy as cnp
cdef cnp.float64_t namean(cnp.ndarray[cnp.float64_t, ndim=1] a): return np.nanmean(a) # just a placeholder
is not allowed? It works for me. Is it a cython version thing? (I've got 0.13),
Oh, that's nice! I'm using 0.11.2. OK, time to upgrade. _______________________________________________ NumPyDiscussion mailing list NumPyDiscussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpydiscussion
import numpy as np cimport numpy as cnp
cdef cnp.float64_t namean(cnp.ndarray[cnp.float64_t, ndim=1] a): return np.nanmean(a) # just a placeholder
is not allowed? It works for me. Is it a cython version thing? (I've got 0.13),
Oh, that's nice! I'm using 0.11.2. OK, time to upgrade.
Oh wow, does that mean that http://trac.cython.org/cython_trac/ticket/177 is fixed? I couldn't find anything in the release notes about that, but it would be great news. Does the cdef function acquire and hold the buffer?
Felix
@Keith Goodman
I think I figured it out. I believe something like the following will do what you want, iterating across one axis specially, so it can apply a median function along an axis. This code in particular is for calculating a moving average and seems to work (though I haven't checked my math). Let me know if you find any problems.
def ewma(a, d, int axis = 1):
out = np.empty(a.shape, dtype)
cdef np.flatiter ita, ito
ita = np.PyArray_IterAllButAxis(a, &axis) ito = np.PyArray_IterAllButAxis(out, &axis)
cdef int i cdef int axis_length = a.shape[axis] cdef int a_axis_stride = a.strides[axis]/a.itemsize cdef int o_axis_stride = out.strides[axis]/out.itemsize
cdef double avg = 0.0 cdef double weight = 1.0  np.exp(d)
while np.PyArray_ITER_NOTDONE(ita):
avg = 0.0 for i in range(axis_length):
avg += (<dtype_t*>np.PyArray_ITER_DATA (ita))[i * a_axis_stride ] * weight + avg * (1  weight) (<dtype_t*>np.PyArray_ITER_DATA (ito))[i * o_axis_stride ] = avg
np.PyArray_ITER_NEXT(ita) np.PyArray_ITER_NEXT(ito)
return out
On Tue, Nov 30, 2010 at 9:46 PM, Felix Schlesinger schlesin@cshl.eduwrote:
import numpy as np cimport numpy as cnp
cdef cnp.float64_t namean(cnp.ndarray[cnp.float64_t, ndim=1] a): return np.nanmean(a) # just a placeholder
is not allowed? It works for me. Is it a cython version thing? (I've got 0.13),
Oh, that's nice! I'm using 0.11.2. OK, time to upgrade.
Oh wow, does that mean that http://trac.cython.org/cython_trac/ticket/177 is fixed? I couldn't find anything in the release notes about that, but it would be great news. Does the cdef function acquire and hold the buffer?
Felix
NumPyDiscussion mailing list NumPyDiscussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpydiscussion
On Mon, Aug 31, 2009 at 9:06 PM, Sturla Moldensturla@molden.no wrote:
We recently has a discussion regarding an optimization of NumPy's median to average O(n) complexity. After some searching, I found out there is a selection algorithm competitive in speed with Hoare's quick select. It has the advantage of being a lot simpler to implement. In plain Python:
Chad, you can continue to write quick select using NumPy's C quick sort in numpy/core/src/_sortmodule.c.src. When you are done, it might be about 10% faster than this. :)
I was sick for a bit last week, so got stalled on my version, but I'll be working on it this weekend. I'm going for a more general partition function, that could have slightly more general use cases than just a median. Nevertheless, its good to see there could be several options, hopefully at least one of which can be put into numpy.
By the way, as far as I can tell, the above algorithm is exactly the same idea as a nonrecursive Hoare (ie. quicksort) selection: Do the partition, then only proceed to the subpartition that must contain the nth element. My version is a bit more general, allowing partitioning on a range of elements rather than just one, but the concept is the same. The numpy quicksort already does non recursive sorting.
I'd also like to, if possible, have a specialized 2D version, since image media filtering is one of my interests, and the C version works on 1D (raveled) arrays only.
C
On Wed, Sep 2, 2009 at 1:25 PM, Chad Netzer chad.netzer@gmail.com wrote:
On Mon, Aug 31, 2009 at 9:06 PM, Sturla Moldensturla@molden.no wrote:
We recently has a discussion regarding an optimization of NumPy's median to average O(n) complexity. After some searching, I found out there is a selection algorithm competitive in speed with Hoare's quick select. It has the advantage of being a lot simpler to implement. In plain Python:
Chad, you can continue to write quick select using NumPy's C quick sort in numpy/core/src/_sortmodule.c.src. When you are done, it might be about 10% faster than this. :)
I was sick for a bit last week, so got stalled on my version, but I'll be working on it this weekend. I'm going for a more general partition function, that could have slightly more general use cases than just a median. Nevertheless, its good to see there could be several options, hopefully at least one of which can be put into numpy.
By the way, as far as I can tell, the above algorithm is exactly the same idea as a nonrecursive Hoare (ie. quicksort) selection: Do the partition, then only proceed to the subpartition that must contain the nth element. My version is a bit more general, allowing partitioning on a range of elements rather than just one, but the concept is the same. The numpy quicksort already does non recursive sorting.
I'd also like to, if possible, have a specialized 2D version, since image media filtering is one of my interests, and the C version works on 1D (raveled) arrays only.
There are special hardwired medians for 2,3,5,9 elements, which covers a lot of image processing. They aren't in numpy, though ;) David has implemented a NeighborhoodIter that could help extract the elements if you want to deal with images.
Chuck
Chad Netzer skrev:
By the way, as far as I can tell, the above algorithm is exactly the same idea as a nonrecursive Hoare (ie. quicksort) selection: Do the partition, then only proceed to the subpartition that must contain the nth element. My version is a bit more general, allowing partitioning on a range of elements rather than just one, but the concept is the same. The numpy quicksort already does non recursive sorting.
I'd also like to, if possible, have a specialized 2D version, since image media filtering is one of my interests, and the C version works on 1D (raveled) arrays only.
I agree. NumPy (or SciPy) could have a select module similar to the sort module. If the select function takes an axis argument similar to the sort functions, only a small change to the current np.median would needed.
Take a look at this:
http://projects.scipy.org/numpy/attachment/ticket/1213/_selectmodule.pyx
Here is a select function that takes an axis argument. There are specialized versions for 1D, 2D, and 3D. Input can be contiguous or not. For 4D and above, axes are found by recursion on the shape array. Thus it should be fast regardless of dimensions.
I haven't tested the Cython code /thoroughly/, but at least it does compile.
Sturla Molden
On Thu, Sep 3, 2009 at 00:09, Sturla Moldensturla@molden.no wrote:
Chad Netzer skrev:
I'd also like to, if possible, have a specialized 2D version, since image media filtering is one of my interests, and the C version works on 1D (raveled) arrays only.
I agree. NumPy (or SciPy) could have a select module similar to the sort module. If the select function takes an axis argument similar to the sort functions, only a small change to the current np.median would needed.
Take a look at this:
http://projects.scipy.org/numpy/attachment/ticket/1213/_selectmodule.pyx
Here is a select function that takes an axis argument. There are specialized versions for 1D, 2D, and 3D. Input can be contiguous or not. For 4D and above, axes are found by recursion on the shape array. Thus it should be fast regardless of dimensions.
When he is talking about 2D, I believe he is referring to median filtering rather than computing the median along an axis. I.e., replacing each pixel with the median of a specified neighborhood around the pixel.
On Wed, Sep 2, 2009 at 10:28 PM, Robert Kernrobert.kern@gmail.com wrote:
When he is talking about 2D, I believe he is referring to median filtering rather than computing the median along an axis. I.e., replacing each pixel with the median of a specified neighborhood around the pixel.
That's right, Robert. Basically, I meant doing a median on a square (or rectangle) "view" of an array, without first having to ravel(), thus generally saving a copy. But actually, since my selection based median overwrites the source array, it may not save a copy anyway. But Charles Harris's earlier suggestion of some hard coded medians for common filter template sizes (ie 3x3, 5x5, etc.) may be a nice addition to scipy, especially if it can be generalized somewhat to other filters.
C
Chad Netzer wrote:
But Charles Harris's earlier suggestion of some hard coded medians for common filter template sizes (ie 3x3, 5x5, etc.) may be a nice addition to scipy, especially if it can be generalized somewhat to other filters.
For 2D images try looking into PIL : ImageFilter.MedianFilter
Cheers,
Jon
Chad Netzer skrev:
That's right, Robert. Basically, I meant doing a median on a square (or rectangle) "view" of an array, without first having to ravel(), thus generally saving a copy. But actually, since my selection based median overwrites the source array, it may not save a copy anyway.
Avoiding copies of timy buffers is futile optimization.
QuickSelect is overkill for tiny buffers like common filter kernels. Insertion sort is fine.
Getting rid of loops and multiple function calls in Python helps a lot.
If memory is not an issue, with np.median you can actully create an quite an efficient median filter using a 3D ndarray. For example if you use an image of 640 x 480 pixels and want a 9 pixel median filter, you can put shifted images in an 640 x 480 x 9 ndarray, and call median with axis=2.
Sturla Molden
On Thu, Sep 3, 2009 at 12:14 AM, Sturla Molden sturla@molden.no wrote:
Chad Netzer skrev:
That's right, Robert. Basically, I meant doing a median on a square (or rectangle) "view" of an array, without first having to ravel(), thus generally saving a copy. But actually, since my selection based median overwrites the source array, it may not save a copy anyway.
Avoiding copies of timy buffers is futile optimization.
QuickSelect is overkill for tiny buffers like common filter kernels. Insertion sort is fine.
Shell sort is nice for sizes in the awkward range.
Chuck
Robert Kern skrev:
When he is talking about 2D, I believe he is referring to median filtering rather than computing the median along an axis. I.e., replacing each pixel with the median of a specified neighborhood around the pixel.
That's not something numpy's median function should be specialized to do. IMHO, median filtering belongs to scipy.
Sturla
participants (12)

Chad Netzer

Charles R Harris

Citi, Luca

Dag Sverre Seljebotn

Felix Schlesinger

John Salvatier

Jon Wright

Keith Goodman

Matthew Brett

Robert Bradshaw

Robert Kern

Sturla Molden