Keeping track of the N largest values

Roy Smith roy at panix.com
Sat Dec 25 14:16:27 EST 2010


In article <Xns9E59A44D7CC49duncanbooth at 127.0.0.1>,
 Duncan Booth <duncan.booth at invalid.invalid> wrote:

> Roy Smith <roy at panix.com> wrote:
> 
> > 
> > I'm processing a stream of N numbers and want to keep track of the K 
> > largest.  There's too many numbers in the stream (i.e. N is too large)
> > to keep in memory at once.  K is small (100 would be typical).
> > ... 
> > Is there a better way to do this, either from a theoretical running
> > time point of view, or just a nicer way to code this in Python?
> 
> How about:
> 
>   from heapq import nlargest
>   top = nlargest(K, input())
> 
> It uses a heap so avoids completely resorting each time.

Hmmm, that looks like it would solve the problem as stated, but there's 
another twist.  In addition to finding the K largest values, I *also* 
need to do some other processing on all the values (which I didn't show 
in the original example, incorrectly thinking it not germane to my 
question).  The problem with nlargest() is that it doesn't give me a 
hook to do that.

PS: I'm assuming heapq.nlargest(n, iterable) operates with memory 
proportional to n, and not to the iterator length.  That's the only 
reasonable conclusion, but the docs don't actually come out and say it.



More information about the Python-list mailing list