Heap Implementation

Steven D'Aprano steve+comp.lang.python at pearwood.info
Tue Feb 2 00:38:11 EST 2016

On Tuesday 02 February 2016 06:32, Sven R. Kunze wrote:

> On 31.01.2016 02:48, Steven D'Aprano wrote:
>> On Sunday 31 January 2016 09:47, Sven R. Kunze wrote:
>>> @all
>>> What's the best/standardized tool in Python to perform benchmarking?
>> timeit
> Thanks, Steven.
> Maybe, I am doing it wrong but I get some weird results:

You need to understand that in any modern desktop, server or laptop computer 
(embedded devices may be different) there is *constantly* a large amount of 
processing going on in the background. Your OS is constantly responding to 
network events, managing memory, skipping from one process to another, 
scheduling tasks; your anti-virus and other background programs are working; 
your window environment is watching the mouse, etc. So each time you run a 
test, it comes down to pure dumb luck how many other programs are busy at 
the same time. (You can, sometimes, influence that by quitting all other 
applications, unplugging the network cable, and keeping your hands off the 
keyboard and mouse while the test is running. But who does that?)

All that adds noise to the measured times. It's not unusual for timings to 
differ by a factor of ten from one run to the next. The speed will also 
differ depending on processor cache misses, and many other factors that are 
effectively impossible to predict with any certainty. This makes timing 
measurements on modern systems an inherently noisy process.

In effect, each measurement you take is made up of two components:

* the actual time that the code would take if it had exclusive 
  access to the machine with no other programs running, call it t;

* and the noise added by the system, call it Δt.

It is impossible to measure t alone, you always get t+Δt. The aim in timing 
is to reduce the effect of the noise as much as possible.

The way to do this with timeit is:

- don't run extraneous code as part of your test (just measure what 
  you care about, with as little scaffolding around it);

- run that code snippet as many times as you can bear in a loop, 
  *but* let timeit manage the loop; don't try doing it by hand;

- only care about "best of N", where N is as large a number as you 
  can stand;

- averages, standard deviation, and other statistics are pretty 
  much useless here, the only statistic that you care about is
  the minimum.

Out of those four guidelines, you missed three of them:

(1) Your test code includes scaffolding, the "for _ in range..." loop. 
You're timing how expensive it is to build a range object and loop over it.

(2) You picked a tiny number for the number of loops: ten. Timeit defaults 
to one million, which is good for *really* fast and tiny snippets.

I normally reduce that to 10000, but run a larger number of trials. If the 
timing from each trial is too small, I increase the number of loops, and if 
it takes too long (I am impatient) I decrease it, but never below 100.

(3) You then use the repeat() method to calculate the "best of one", which 
is pretty much pointless. There is a timeit() method for "best of one", if 
that's ever useful.

I normally run 5 or 7 trials, and report only the best (smallest).

Here's your code:

>  >>> min(timeit.Timer('for _ in range(10000): heappop(h)', 'from heapq
> import heappop; h=list(range(10000000))').repeat(10, 1)),
> min(timeit.Timer('for _ in range(10000): h.pop()', 'from xheap import
> Heap; h=Heap(range(10000000))').repeat(10, 1))
> (0.017267618000005314, 0.01615345600021101)

Your code would be easier to read and understand if it were split over 
multiple lines. This is not Perl and there's no advantage to one-liners.

Pulling your code apart and re-writing it in a way which I feel is more 

t1 = Timer(
    'for _ in range(10000): heappop(h)',  # code being tested
    'from heapq import heappop; h=list(range(10000000))'  # setup

t2 = Timer(
    'for _ in range(10000): h.pop()',
    'from xheap import Heap; h=Heap(range(10000000))'

min(t1.repeat(10, 1))
min(t2.repeat(10, 1))

Something like this will probably be less noisy:

setup = """
from heapq import heappop
from xheap import Heap
a = list(range(10000000))
h = Heap(a)

t1 = Timer("heappop(a)", setup)
t2 = Timer("h.pop()", setup)

# print best of 7 trials, where each trial runs the code snippet 10000 times
print(min(t1.repeat(10000, 7)))
print(min(t2.repeat(10000, 7)))

The times printed will be in seconds per 10000 runs; divide it by 10 to get 
the time in *milliseconds* per run.

Note that this will *not* eliminate all noise or variation from one timing 
test to the other, but it will (I hope!) reduce it. If you're still seeing a 
lot of variation, try turning the garbage collector off and quitting as many 
other applications as possible.

If anyone else can suggest any other techniques for reducing noise, I'd love 
to hear about them.

Also note that times generated by timeit with one version of Python may not 
be measuring the same thing as those using a different version. I believe 
that Python 3.4 and better will do a better job of measuring only the time 
the code snippet process is active, rather than wall-time. But that's also 
operating system dependent.

> How can it be that my wrapper is sometimes faster and sometimes slower
> than heapq? I wouldn't mind slower, but faster*?

Maybe your code genuinely is "about as fast" as heapq, so the differences 
you're seeing are just due to noise in the system.


More information about the Python-list mailing list