[Python-Dev] Stop using timeit, use perf.timeit!
tjreedy at udel.edu
Fri Jun 10 12:55:22 EDT 2016
On 6/10/2016 9:20 AM, Steven D'Aprano wrote:
> On Fri, Jun 10, 2016 at 01:13:10PM +0200, Victor Stinner wrote:
>> Last weeks, I made researchs on how to get stable and reliable
>> benchmarks, especially for the corner case of microbenchmarks. The
>> first result is a serie of article, here are the first three:
> Thank you for this! I am very interested in benchmarking.
> I strongly question your statement in the third:
> But how can we compare performances if results are random?
> Take the minimum?
> No! You must never (ever again) use the minimum for
> benchmarking! Compute the average and some statistics like
> the standard deviation:
> [end quote]
> While I'm happy to see a real-world use for the statistics module, I
> disagree with your logic.
> The problem is that random noise can only ever slow the code down, it
> cannot speed it up. To put it another way, the random errors in the
> timings are always positive.
> Suppose you micro-benchmark some code snippet and get a series of
> timings. We can model the measured times as:
> measured time t = T + ε
> where T is the unknown "true" timing we wish to estimate,
For comparative timings, we do not care about T. So arguments about the
best estimate of T mist the point.
What we do wish to estimate is the relationship between two Ts, T0 for
'control', and T1 for 'treatment', in particular T1/T0. I suspect
Viktor is correct that mean(t1)/mean(t0) is better than min(t1)/min(t0)
as an estimate of the true ratio T1/T0 (for a particular machine).
But given that we have matched pairs of measurements with the same
hashseed and address, it may be better yet to estimate T1/T0 from the
ratios t1i/t0i, where i indexes experimental conditions. But it has
been a long time since I have read about estimation of ratios. What I
remember is that this is a nasty subject.
It is also the case that while an individual with one machine wants the
best ratio for that machine, we need to make CPython patch decisions for
the universe of machines that run Python.
> and ε is some variable error due to noise in the system.
> But ε is always positive, never negative,
lognormal might be a first guess. But what we really have is
contributions from multiple factors,
Terry Jan Reedy
More information about the Python-Dev