[Python-Dev] Benchmarking Python 3.3 against Python 2.7 (wide build)

Mon Oct 1 03:51:32 CEST 2012

On Sun, Sep 30, 2012 at 9:35 PM, Steven D'Aprano <steve at pearwood.info>wrote:

> On Sun, Sep 30, 2012 at 07:12:47PM -0400, Brett Cannon wrote:
>
> > > python3 perf.py -T --basedir ../benchmarks -f -b py3k
> > ../cpython/builds/2.7-wide/bin/python ../cpython/builds/3.3/bin/python3.3
>
> > ### call_method ###
> > Min: 0.491433 -> 0.414841: 1.18x faster
> > Avg: 0.493640 -> 0.416564: 1.19x faster
> > Significant (t=127.21)
> > Stddev: 0.00170 -> 0.00162: 1.0513x smaller
>
> I'm not sure if this is the right place to discuss this,

The speed mailing list would be best.

> but what is the
> justification for recording the average and std deviation of the
> benchmarks?
>

Because the tests, when run in a more rigorous fashion, run many more
iterations so the average is used to even out bumps thanks to executing,
e.g. 50 times. And the stddev is there to know how variable the results
were in the end.

>
> If the benchmarks are based on timeit, the timeit docs warn against
> taking any statistic other than the minimum.
>

They don't use timeit.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-dev/attachments/20120930/cd42b02e/attachment.html>