Keeping track of the N largest values
Robert Kern
robert.kern at gmail.com
Sat Dec 25 12:09:09 EST 2010
On 12/25/10 10:42 AM, Roy Smith wrote:
> I'm processing a stream of N numbers and want to keep track of the K
> largest. There's too many numbers in the stream (i.e. N is too large)
> to keep in memory at once. K is small (100 would be typical).
>
>> From a theoretical point of view, I should be able to do this in N log K
> time. What I'm doing now is essentially:
>
> top = [-1] # Assume all x are>= 0
> for x in input():
> if x<= top[0]:
> continue
> top.append(x)
> if len(top)> K:
> top.sort()
> top.pop(0)
>
> I can see pathological cases (say, all input values the same) where
> running time would be N K log K, but on average (N>> K and random
> distribution of values), this should be pretty close to N.
>
> Is there a better way to do this, either from a theoretical running time
> point of view, or just a nicer way to code this in Python?
heapq.nlargest()
http://docs.python.org/library/heapq#heapq.nlargest
--
Robert Kern
"I have come to believe that the whole world is an enigma, a harmless enigma
that is made terrible by our own mad attempt to interpret it as though it had
an underlying truth."
-- Umberto Eco
More information about the Python-list
mailing list