[Python-Dev] Re: heaps
David Eppstein
eppstein@ics.uci.edu
Sun, 04 May 2003 20:26:29 -0700
On 5/4/03 10:20 PM -0400 Tim Peters <tim.one@comcast.net> wrote:
> In practice, it's usually much faster than that. Over time, it gets rarer
> and rarer for
>
> person > q[0]
>
> to be true (the new person has to be larger than the N-th largest seen so
> far, and that bar gets raised whenever a new person manages to hurdle it),
Good point. If any permutation of the input sequence is equally likely,
and you're selecting the best k out of n items, the expected number of
times you have to hit the data structure in your heapq solution is roughly
k ln n, so the total expected time is O(n + k log k log n), with a really
small constant factor on the O(n) term. The sorting solution I suggested
has total time O(n log k), and even though sorting is built-in and fast it
can't compete when k is small. Random pivoting is O(n + k), but with a
larger constant factor, so your heapq solution looks like a winner.
For fairness, it might be interesting to try another run of your test in
which the input sequence is sorted in increasing order rather than random.
I.e., replace the random generation of seq by
seq = range(M)
I'd try it myself, but I'm still running python 2.2 and haven't installed
heapq. I'd have to know more about your application to have an idea
whether the sorted or randomly-permuted case is more representative.
--
David Eppstein http://www.ics.uci.edu/~eppstein/
Univ. of California, Irvine, School of Information & Computer Science