[Python-Dev] Re: heaps

Sun, 04 May 2003 20:26:29 -0700

On 5/4/03 10:20 PM -0400 Tim Peters <tim.one@comcast.net> wrote:
> In practice, it's usually much faster than that.  Over time, it gets rarer
> and rarer for
>
>     person > q[0]
>
> to be true (the new person has to be larger than the N-th largest seen so
> far, and that bar gets raised whenever a new person manages to hurdle it),

Good point.  If any permutation of the input sequence is equally likely, 
and you're selecting the best k out of n items, the expected number of 
times you have to hit the data structure in your heapq solution is roughly 
k ln n, so the total expected time is O(n + k log k log n), with a really 
small constant factor on the O(n) term.  The sorting solution I suggested 
has total time O(n log k), and even though sorting is built-in and fast it 
can't compete when k is small.   Random pivoting is O(n + k), but with a 
larger constant factor, so your heapq solution looks like a winner.

For fairness, it might be interesting to try another run of your test in 
which the input sequence is sorted in increasing order rather than random. 
I.e., replace the random generation of seq by
    seq = range(M)
I'd try it myself, but I'm still running python 2.2 and haven't installed 
heapq.  I'd have to know more about your application to have an idea 
whether the sorted or randomly-permuted case is more representative.

-- 
David Eppstein                      http://www.ics.uci.edu/~eppstein/
Univ. of California, Irvine, School of Information & Computer Science