On Sun, Sep 30, 2012 at 9:35 PM, Steven D'Aprano <steve@pearwood.info> wrote:

On Sun, Sep 30, 2012 at 07:12:47PM -0400, Brett Cannon wrote:

> > python3 perf.py -T --basedir ../benchmarks -f -b py3k
> ../cpython/builds/2.7-wide/bin/python ../cpython/builds/3.3/bin/python3.3

> ### call_method ###
> Min: 0.491433 -> 0.414841: 1.18x faster
> Avg: 0.493640 -> 0.416564: 1.19x faster
> Significant (t=127.21)
> Stddev: 0.00170 -> 0.00162: 1.0513x smaller

I'm not sure if this is the right place to discuss this,

The speed mailing list would be best.

but what is the
justification for recording the average and std deviation of the
benchmarks?

Because the tests, when run in a more rigorous fashion, run many more iterations so the average is used to even out bumps thanks to executing, e.g. 50 times. And the stddev is there to know how variable the results were in the end.

If the benchmarks are based on timeit, the timeit docs warn against
taking any statistic other than the minimum.

They don't use timeit.