[Python-Dev] Stop using timeit, use perf.timeit!

Fri Jun 10 12:55:22 EDT 2016

On 6/10/2016 9:20 AM, Steven D'Aprano wrote:
> On Fri, Jun 10, 2016 at 01:13:10PM +0200, Victor Stinner wrote:
>> Hi,
>>
>> Last weeks, I made researchs on how to get stable and reliable
>> benchmarks, especially for the corner case of microbenchmarks. The
>> first result is a serie of article, here are the first three:
>
> Thank you for this! I am very interested in benchmarking.
>
>> https://haypo.github.io/journey-to-stable-benchmark-system.html
>> https://haypo.github.io/journey-to-stable-benchmark-deadcode.html
>> https://haypo.github.io/journey-to-stable-benchmark-average.html
>
> I strongly question your statement in the third:
>
>     [quote]
>     But how can we compare performances if results are random?
>     Take the minimum?
>
>     No! You must never (ever again) use the minimum for
>     benchmarking! Compute the average and some statistics like
>     the standard deviation:
>     [end quote]
>
>
> While I'm happy to see a real-world use for the statistics module, I
> disagree with your logic.
>
> The problem is that random noise can only ever slow the code down, it
> cannot speed it up. To put it another way, the random errors in the
> timings are always positive.
>
> Suppose you micro-benchmark some code snippet and get a series of
> timings. We can model the measured times as:
>
>     measured time t = T + ε
>
> where T is the unknown "true" timing we wish to estimate,

For comparative timings, we do not care about T.  So arguments about the 
best estimate of T mist the point.

What we do wish to estimate is the relationship between two Ts, T0 for 
'control', and T1 for 'treatment', in particular T1/T0.  I suspect 
Viktor is correct that mean(t1)/mean(t0) is better than min(t1)/min(t0) 
as an estimate of the true ratio T1/T0 (for a particular machine).

But given that we have matched pairs of measurements with the same 
hashseed and address, it may be better yet to estimate T1/T0 from the 
ratios t1i/t0i, where i indexes experimental conditions.  But it has 
been a long time since I have read about estimation of ratios.  What I 
remember is that this is a nasty subject.

It is also the case that while an individual with one machine wants the 
best ratio for that machine, we need to make CPython patch decisions for 
the universe of machines that run Python.

> and ε is some variable error due to noise in the system.
 > But ε is always positive,  never negative,

lognormal might be a first guess. But what we really have is 
contributions from multiple factors,

-- 
Terry Jan Reedy