oscar.j.benjamin at gmail.com
Tue Feb 2 09:53:02 EST 2016
On 2 February 2016 at 05:38, Steven D'Aprano
<steve+comp.lang.python at pearwood.info> wrote:
> In effect, each measurement you take is made up of two components:
> * the actual time that the code would take if it had exclusive
> access to the machine with no other programs running, call it t;
> * and the noise added by the system, call it Δt.
> It is impossible to measure t alone, you always get t+Δt. The aim in timing
> is to reduce the effect of the noise as much as possible.
That's a reasonable model in some cases but not all. In particular the
"actual time" part is itself not a constant but rather depends on all
sorts of things like RAM caching etc. On the CPython side there could
be all kinds of optimisations that would kick in at different times to
change the "actual time". Likewise there are optimisations at OS and
hardware levels that kick in at different times.
> The way to do this with timeit is:
> - don't run extraneous code as part of your test (just measure what
> you care about, with as little scaffolding around it);
> - run that code snippet as many times as you can bear in a loop,
> *but* let timeit manage the loop; don't try doing it by hand;
There are many situations where running something in a loop over and
over triggers all kinds of optimisations. PyPy's jit warmup is an
example but the effect can occur at the hardware level as well.
Performance of code run repeatedly in a tight loop can be different to
the same code called once or called sporadically in a long-running
This is especially true of code that operates on a data structure. In
real usage the data structure might be accessed rarely but a
tight-profiling loop forces the entire structure into the CPU's
> - only care about "best of N", where N is as large a number as you
> can stand;
> - averages, standard deviation, and other statistics are pretty
> much useless here, the only statistic that you care about is
> the minimum.
This argument makes sense if you adopt the t+Δt model described above
but that model it is just a model. The minimum from micro-benchmarking
in a tight-loop is not necessarily a more representative statistic
than mean, median or whatever else. Standard deviation is actually
useful for assessing the variability of time taken which can also be
important and gives additional information about performance: I would
like to minimise both the expected time taken and also the variability
in that simultaneously if possible.
You seem to be describing timeit's way of profiling which is
reasonable but it's not the only way.
More information about the Python-list