[Python-Dev] Re: heaps
David Eppstein
eppstein@ics.uci.edu
Sun, 04 May 2003 00:54:21 -0700
In article <17342817.1052002018@[10.0.1.2]>,
David Eppstein <eppstein@ics.uci.edu> wrote:
> On the other hand, if you really want to find the n best items in a data
> stream large enough that you care about using only space O(n), it might
> also be preferable to take constant amortized time per item rather than the
> O(log n) that heapq would use, and it's not very difficult nor does it
> require any fancy data structures. Some time back I needed some Java code
> for this, haven't had an excuse to port it to Python. In case anyone's
> interested, it's online at
> <http://www.ics.uci.edu/~eppstein/161/KBest.java>.
BTW, the central idea here is to use a random quicksort pivot to shrink
the list, when it grows too large.
In python, this could be done without randomization as simply as
def addToNBest(L,x,N):
L.append(x)
if len(L) > 2*N:
L.sort()
del L[N:]
It's not constant amortized time due to the sort, but that's probably
more than made up for due to the speed of compiled sort versus
interpreted randomized pivot.
--
David Eppstein http://www.ics.uci.edu/~eppstein/
Univ. of California, Irvine, School of Information & Computer Science