steve+comp.lang.python at pearwood.info
Tue Feb 2 00:38:11 EST 2016
On Tuesday 02 February 2016 06:32, Sven R. Kunze wrote:
> On 31.01.2016 02:48, Steven D'Aprano wrote:
>> On Sunday 31 January 2016 09:47, Sven R. Kunze wrote:
>>> What's the best/standardized tool in Python to perform benchmarking?
> Thanks, Steven.
> Maybe, I am doing it wrong but I get some weird results:
You need to understand that in any modern desktop, server or laptop computer
(embedded devices may be different) there is *constantly* a large amount of
processing going on in the background. Your OS is constantly responding to
network events, managing memory, skipping from one process to another,
scheduling tasks; your anti-virus and other background programs are working;
your window environment is watching the mouse, etc. So each time you run a
test, it comes down to pure dumb luck how many other programs are busy at
the same time. (You can, sometimes, influence that by quitting all other
applications, unplugging the network cable, and keeping your hands off the
keyboard and mouse while the test is running. But who does that?)
All that adds noise to the measured times. It's not unusual for timings to
differ by a factor of ten from one run to the next. The speed will also
differ depending on processor cache misses, and many other factors that are
effectively impossible to predict with any certainty. This makes timing
measurements on modern systems an inherently noisy process.
In effect, each measurement you take is made up of two components:
* the actual time that the code would take if it had exclusive
access to the machine with no other programs running, call it t;
* and the noise added by the system, call it Δt.
It is impossible to measure t alone, you always get t+Δt. The aim in timing
is to reduce the effect of the noise as much as possible.
The way to do this with timeit is:
- don't run extraneous code as part of your test (just measure what
you care about, with as little scaffolding around it);
- run that code snippet as many times as you can bear in a loop,
*but* let timeit manage the loop; don't try doing it by hand;
- only care about "best of N", where N is as large a number as you
- averages, standard deviation, and other statistics are pretty
much useless here, the only statistic that you care about is
Out of those four guidelines, you missed three of them:
(1) Your test code includes scaffolding, the "for _ in range..." loop.
You're timing how expensive it is to build a range object and loop over it.
(2) You picked a tiny number for the number of loops: ten. Timeit defaults
to one million, which is good for *really* fast and tiny snippets.
I normally reduce that to 10000, but run a larger number of trials. If the
timing from each trial is too small, I increase the number of loops, and if
it takes too long (I am impatient) I decrease it, but never below 100.
(3) You then use the repeat() method to calculate the "best of one", which
is pretty much pointless. There is a timeit() method for "best of one", if
that's ever useful.
I normally run 5 or 7 trials, and report only the best (smallest).
Here's your code:
> >>> min(timeit.Timer('for _ in range(10000): heappop(h)', 'from heapq
> import heappop; h=list(range(10000000))').repeat(10, 1)),
> min(timeit.Timer('for _ in range(10000): h.pop()', 'from xheap import
> Heap; h=Heap(range(10000000))').repeat(10, 1))
> (0.017267618000005314, 0.01615345600021101)
Your code would be easier to read and understand if it were split over
multiple lines. This is not Perl and there's no advantage to one-liners.
Pulling your code apart and re-writing it in a way which I feel is more
t1 = Timer(
'for _ in range(10000): heappop(h)', # code being tested
'from heapq import heappop; h=list(range(10000000))' # setup
t2 = Timer(
'for _ in range(10000): h.pop()',
'from xheap import Heap; h=Heap(range(10000000))'
Something like this will probably be less noisy:
setup = """
from heapq import heappop
from xheap import Heap
a = list(range(10000000))
h = Heap(a)
t1 = Timer("heappop(a)", setup)
t2 = Timer("h.pop()", setup)
# print best of 7 trials, where each trial runs the code snippet 10000 times
The times printed will be in seconds per 10000 runs; divide it by 10 to get
the time in *milliseconds* per run.
Note that this will *not* eliminate all noise or variation from one timing
test to the other, but it will (I hope!) reduce it. If you're still seeing a
lot of variation, try turning the garbage collector off and quitting as many
other applications as possible.
If anyone else can suggest any other techniques for reducing noise, I'd love
to hear about them.
Also note that times generated by timeit with one version of Python may not
be measuring the same thing as those using a different version. I believe
that Python 3.4 and better will do a better job of measuring only the time
the code snippet process is active, rather than wall-time. But that's also
operating system dependent.
> How can it be that my wrapper is sometimes faster and sometimes slower
> than heapq? I wouldn't mind slower, but faster*?
Maybe your code genuinely is "about as fast" as heapq, so the differences
you're seeing are just due to noise in the system.
More information about the Python-list