[Numpy-discussion] k maximal elements
Keith Goodman
kwgoodman at gmail.com
Mon Jun 6 09:58:11 EDT 2011
On Mon, Jun 6, 2011 at 6:44 AM, Keith Goodman <kwgoodman at gmail.com> wrote:
> On Sun, Jun 5, 2011 at 11:15 PM, Alex Ter-Sarkissov <ater1980 at gmail.com> wrote:
>> I have a vector of positive integers length n. Is there a simple (i.e.
>> without sorting/ranking) of 'pulling out' k larrgest (or smallest) values.
>> Something like
>>
>> sum(x[sum(x,1)>(max(sum(x,1)+min(sum(x,1))))/2,])
>>
>> but smarter
>
> You could use bottleneck [1]. It has a partial sort written in cython:
>
> >> a = np.arange(10)
> >> np.random.shuffle(a)
> >> a
> array([6, 3, 8, 5, 4, 2, 0, 1, 9, 7])
> >> import bottleneck as bn
> >> k = 3
> >> b = bn.partsort(a, a.size-k)
> >> b[-k:]
> array([8, 9, 7])
>
> Or you could get the indices:
>
> >> idx = bn.argpartsort(a, a.size-k)
> >> idx[-k:]
> array([2, 8, 9])
>
> [1] The partial sort will be in Bottleneck 0.5.0. There is a beta2 at
> https://github.com/kwgoodman/bottleneck/downloads. Release version
> will be posted at http://pypi.python.org/pypi/Bottleneck
Oh, and here are some timings:
>> a = np.random.rand(1e6)
>> timeit a.sort()
10 loops, best of 3: 18.3 ms per loop
>> timeit bn.partsort(a, 100)
100 loops, best of 3: 3.71 ms per loop
>>
>> a = np.random.rand(1e4)
>> timeit a.sort()
1000 loops, best of 3: 170 us per loop
>> timeit bn.partsort(a, 10)
10000 loops, best of 3: 19.4 us per loop
More information about the NumPy-Discussion
mailing list