Keeping track of the N largest values
Roy Smith
roy at panix.com
Sat Dec 25 14:16:27 EST 2010
In article <Xns9E59A44D7CC49duncanbooth at 127.0.0.1>,
Duncan Booth <duncan.booth at invalid.invalid> wrote:
> Roy Smith <roy at panix.com> wrote:
>
> >
> > I'm processing a stream of N numbers and want to keep track of the K
> > largest. There's too many numbers in the stream (i.e. N is too large)
> > to keep in memory at once. K is small (100 would be typical).
> > ...
> > Is there a better way to do this, either from a theoretical running
> > time point of view, or just a nicer way to code this in Python?
>
> How about:
>
> from heapq import nlargest
> top = nlargest(K, input())
>
> It uses a heap so avoids completely resorting each time.
Hmmm, that looks like it would solve the problem as stated, but there's
another twist. In addition to finding the K largest values, I *also*
need to do some other processing on all the values (which I didn't show
in the original example, incorrectly thinking it not germane to my
question). The problem with nlargest() is that it doesn't give me a
hook to do that.
PS: I'm assuming heapq.nlargest(n, iterable) operates with memory
proportional to n, and not to the iterator length. That's the only
reasonable conclusion, but the docs don't actually come out and say it.
More information about the Python-list
mailing list