[Python-Dev] Re: heaps

Mon, 05 May 2003 23:00:24 -0700

In article <LNBBLJKPBEHFEDALKOLCIEAFEFAB.tim.one@comcast.net>,
 Tim Peters <tim.one@comcast.net> wrote:

> > For fairness, it might be interesting to try another run of your test
> > in which the input sequence is sorted in increasing order rather
> > than random.
> 
> Comparing the worst case of one against the best case of the other isn't my
> idea of fairness <wink>, but sure.

Well, it doesn't seem any fairer to use random data to compare an 
algorithm with an average time bound that depends on an assumption of 
randomness in the data...anyway, the point was more to understand the 
limiting cases.  If one algorithm is usually 3x faster than the other, 
and is never more than 10x slower, that's better than being usually 3x 
faster but sometimes 1000x slower, for instance.

> > I'd have to know more about your application to
> > have an idea whether the sorted or randomly-permuted case is more
> > representative.
> 
> Of course --  so would I <wink>.

My Java KBest code was written to make data subsets for a half-dozen web 
pages (same data selected according to different criteria).  Of these 
six instances, one is presented the data in roughly ascending order, one 
in descending order, and the other four are less clear but probably not 
random.

Robustness in the face of this sort of variation is why I prefer any 
average-case assumptions in my code's performance to depend only on 
randomness from a random number generator, and not arbitrariness in the 
actual input.  But I'm not sure I'd usually be willing to pay a 3x 
penalty for that robustness.

> Here's a surprise:  I coded a variant of the quicksort-like partitioning
> method, at the bottom of this mail.  On the largest-1000 of a million
> random-float case, times were remarkably steady across trials (i.e., using a
> different set of a million random floats each time):
> 
> heapq                    0.96 seconds
> sort (micro-optimized)   3.4  seconds
> KBest (below)            2.6  seconds

Huh.  You're almost convincing me that asymptotic analysis works even in 
the presence of Python's compiled-vs-interpreted anomalies.  The other 
surprise is that (unlike, say, the sort or heapq versions) your KBest 
doesn't look significantly more concise than my earlier Java 
implementation.

-- 
David Eppstein                      http://www.ics.uci.edu/~eppstein/
Univ. of California, Irvine, School of Information & Computer Science