[Numpy-discussion] k maximal elements

Keith Goodman kwgoodman at gmail.com
Mon Jun 6 09:58:11 EDT 2011


On Mon, Jun 6, 2011 at 6:44 AM, Keith Goodman <kwgoodman at gmail.com> wrote:
> On Sun, Jun 5, 2011 at 11:15 PM, Alex Ter-Sarkissov <ater1980 at gmail.com> wrote:
>> I have a vector of positive integers length n. Is there a simple (i.e.
>> without sorting/ranking) of 'pulling out' k larrgest (or smallest) values.
>> Something like
>>
>> sum(x[sum(x,1)>(max(sum(x,1)+min(sum(x,1))))/2,])
>>
>> but smarter
>
> You could use bottleneck [1]. It has a partial sort written in cython:
>
>    >> a = np.arange(10)
>    >> np.random.shuffle(a)
>    >> a
>       array([6, 3, 8, 5, 4, 2, 0, 1, 9, 7])
>    >> import bottleneck as bn
>    >> k = 3
>    >> b = bn.partsort(a, a.size-k)
>    >> b[-k:]
>       array([8, 9, 7])
>
> Or you could get the indices:
>
>    >> idx = bn.argpartsort(a, a.size-k)
>    >> idx[-k:]
>       array([2, 8, 9])
>
> [1] The partial sort will be in Bottleneck 0.5.0. There is a beta2 at
> https://github.com/kwgoodman/bottleneck/downloads. Release version
> will be posted at http://pypi.python.org/pypi/Bottleneck

Oh, and here are some timings:

    >> a = np.random.rand(1e6)
    >> timeit a.sort()
    10 loops, best of 3: 18.3 ms per loop
    >> timeit bn.partsort(a, 100)
    100 loops, best of 3: 3.71 ms per loop
    >>
    >> a = np.random.rand(1e4)
    >> timeit a.sort()
    1000 loops, best of 3: 170 us per loop
    >> timeit bn.partsort(a, 10)
    10000 loops, best of 3: 19.4 us per loop



More information about the NumPy-Discussion mailing list