[Python-Dev] Re: heaps
David Eppstein
eppstein@ics.uci.edu
Mon, 05 May 2003 23:00:24 -0700
In article <LNBBLJKPBEHFEDALKOLCIEAFEFAB.tim.one@comcast.net>,
Tim Peters <tim.one@comcast.net> wrote:
> > For fairness, it might be interesting to try another run of your test
> > in which the input sequence is sorted in increasing order rather
> > than random.
>
> Comparing the worst case of one against the best case of the other isn't my
> idea of fairness <wink>, but sure.
Well, it doesn't seem any fairer to use random data to compare an
algorithm with an average time bound that depends on an assumption of
randomness in the data...anyway, the point was more to understand the
limiting cases. If one algorithm is usually 3x faster than the other,
and is never more than 10x slower, that's better than being usually 3x
faster but sometimes 1000x slower, for instance.
> > I'd have to know more about your application to
> > have an idea whether the sorted or randomly-permuted case is more
> > representative.
>
> Of course -- so would I <wink>.
My Java KBest code was written to make data subsets for a half-dozen web
pages (same data selected according to different criteria). Of these
six instances, one is presented the data in roughly ascending order, one
in descending order, and the other four are less clear but probably not
random.
Robustness in the face of this sort of variation is why I prefer any
average-case assumptions in my code's performance to depend only on
randomness from a random number generator, and not arbitrariness in the
actual input. But I'm not sure I'd usually be willing to pay a 3x
penalty for that robustness.
> Here's a surprise: I coded a variant of the quicksort-like partitioning
> method, at the bottom of this mail. On the largest-1000 of a million
> random-float case, times were remarkably steady across trials (i.e., using a
> different set of a million random floats each time):
>
> heapq 0.96 seconds
> sort (micro-optimized) 3.4 seconds
> KBest (below) 2.6 seconds
Huh. You're almost convincing me that asymptotic analysis works even in
the presence of Python's compiled-vs-interpreted anomalies. The other
surprise is that (unlike, say, the sort or heapq versions) your KBest
doesn't look significantly more concise than my earlier Java
implementation.
--
David Eppstein http://www.ics.uci.edu/~eppstein/
Univ. of California, Irvine, School of Information & Computer Science