
On Sun, Sep 30, 2012 at 9:35 PM, Steven D'Aprano <steve@pearwood.info>wrote:
On Sun, Sep 30, 2012 at 07:12:47PM -0400, Brett Cannon wrote:
python3 perf.py -T --basedir ../benchmarks -f -b py3k ../cpython/builds/2.7-wide/bin/python ../cpython/builds/3.3/bin/python3.3
### call_method ### Min: 0.491433 -> 0.414841: 1.18x faster Avg: 0.493640 -> 0.416564: 1.19x faster Significant (t=127.21) Stddev: 0.00170 -> 0.00162: 1.0513x smaller
I'm not sure if this is the right place to discuss this,
The speed mailing list would be best.
but what is the justification for recording the average and std deviation of the benchmarks?
Because the tests, when run in a more rigorous fashion, run many more iterations so the average is used to even out bumps thanks to executing, e.g. 50 times. And the stddev is there to know how variable the results were in the end.
If the benchmarks are based on timeit, the timeit docs warn against taking any statistic other than the minimum.
They don't use timeit.