On Tue, Sep 1, 2009 at 2:37 PM, Sturla Molden <sturla@molden.no> wrote:
Dag Sverre Seljebotn skrev:
Nitpick: This will fail on large arrays. I guess numpy.npy_intp is the right type to use in this case?
By the way, here is a more polished version, does it look ok?
http://projects.scipy.org/numpy/attachment/ticket/1213/generate_qselect.py http://projects.scipy.org/numpy/attachment/ticket/1213/quickselect.pyx
This is my favorite numpy/scipy ticket. So I am happy that I can contribute in a small way by pointing out a bug. The search for the k-th smallest element is only done over the first k elements (that's the bug) instead of over the entire array. Specifically "while l < k" should be "while l < r". I added a median function to the Bottleneck package: https://github.com/kwgoodman/bottleneck Timings:
import bottleneck as bn arr = np.random.rand(100, 100) timeit np.median(arr) 1000 loops, best of 3: 762 us per loop timeit bn.median(arr) 10000 loops, best of 3: 198 us per loop
What other functions could be built from a selection algorithm? nanmedian scoreatpercentile quantile knn select others? But before I add more functions to the package I need to figure out how to make a cython apply_along_axis function. For the first release I am hand coding the 1d, 2d, and 3d cases. Boring to write, hard to maintain, and doesn't solve the nd case. Does anyone have a cython apply_along_axis that takes a cython reducing function as input? The ticket has an example but I couldn't get it to run. If no one has one (the horror!) I'll begin to work on one sometime after the first release.