In article <email@example.com>, David Eppstein firstname.lastname@example.org wrote:
On the other hand, if you really want to find the n best items in a data stream large enough that you care about using only space O(n), it might also be preferable to take constant amortized time per item rather than the O(log n) that heapq would use, and it's not very difficult nor does it require any fancy data structures. Some time back I needed some Java code for this, haven't had an excuse to port it to Python. In case anyone's interested, it's online at http://www.ics.uci.edu/~eppstein/161/KBest.java.
BTW, the central idea here is to use a random quicksort pivot to shrink the list, when it grows too large.
In python, this could be done without randomization as simply as
def addToNBest(L,x,N): L.append(x) if len(L) > 2*N: L.sort() del L[N:]
It's not constant amortized time due to the sort, but that's probably more than made up for due to the speed of compiled sort versus interpreted randomized pivot.