On Sat, Sep 1, 2012 at 7:21 PM, Brett Cannon <brett(a)python.org> wrote:
> Now that I can run benchmarks against Python 2.7 and 3.3 simultaneously, I'm
> ready to start updating the benchmarks. This involves two parts.
>
> One is moving benchmarks from PyPy over to the unladen repo on
> hg.python.org/benchmarks. But I wanted to first make sure people don't view
> the benchmarks as immutable (e.g. as Octane does:
> https://developers.google.com/octane/faq). Since the benchmarks are always
> relative between two interpreters their immutability isn't critical compared
> to if we were to report some overall score. But it also means that any
> changes made would throw off historical comparisons. For instance, if I take
> PyPy's Mako benchmark (which does a lot more work), should it be named
> mako_v2, or should we just replace mako wholesale?
>
> And the second is the same question for libraries. For instance, the unladen
> benchmarks have Django 1.1a0 as the version which is rather ancient. And
> with 1.5 coming out with provisional Python 3 support I obviously would like
> to update it. But the same questions as with benchmarks crops up in
> reference to immutability. Another thing is that 2to3 can't actually be
> ported using 2to3 (http://bugs.python.org/issue15834) and so that itself
> will require two versions -- a 2.x version (probably from Python 2.7's
> stdlib) and a 3.x version (from the 3.2 stdlib) -- which already starts to
> add interesting issues for me in terms of comparing performance (e.g. I will
> have to probably update the 2.7 code to use io.BytesIO instead of
> StringIO.StringIO to be on more equal footing). Similar thing goes for
> html5lib which has developed its Python 3 support separately from its Python
> 2 code.
>
> If we can't find a reasonable way to handle all of this then what I will do
> is branch the unladen benchmarks for 2.x/3.x benchmarking, and then create
> another branch of the benchmark suite to just be for Python 3.x so that we
> can start fresh with a new set of benchmarks that will never change
> themselves for benchmarking Python 3 itself. That would also mean we could
> start of with whatever is needed from PyPy and unladen to have the optimal
> benchmark runner for speed.python.org.
>
> _______________________________________________
> Speed mailing list
> Speed(a)python.org
> http://mail.python.org/mailman/listinfo/speed
>
Ideally I would like benchmarks to be immutable (have _v is fine).
However, updating libraries might not be immutable (after all, you're
interested in *the speed of running django*), but maybe we should mark
this somehow in the history so we don't compare apples to organges.
Cheers,
fijal
Now that I can run benchmarks against Python 2.7 and 3.3 simultaneously,
I'm ready to start updating the benchmarks. This involves two parts.
One is moving benchmarks from PyPy over to the unladen repo on
hg.python.org/benchmarks. But I wanted to first make sure people don't view
the benchmarks as immutable (e.g. as Octane does:
https://developers.google.com/octane/faq). Since the benchmarks are always
relative between two interpreters their immutability isn't critical
compared to if we were to report some overall score. But it also means that
any changes made would throw off historical comparisons. For instance, if I
take PyPy's Mako benchmark (which does a lot more work), should it be named
mako_v2, or should we just replace mako wholesale?
And the second is the same question for libraries. For instance, the
unladen benchmarks have Django 1.1a0 as the version which is rather
ancient. And with 1.5 coming out with provisional Python 3 support I
obviously would like to update it. But the same questions as with
benchmarks crops up in reference to immutability. Another thing is that
2to3 can't actually be ported using 2to3 (http://bugs.python.org/issue15834)
and so that itself will require two versions -- a 2.x version (probably
from Python 2.7's stdlib) and a 3.x version (from the 3.2 stdlib) -- which
already starts to add interesting issues for me in terms of comparing
performance (e.g. I will have to probably update the 2.7 code to use
io.BytesIO instead of StringIO.StringIO to be on more equal footing).
Similar thing goes for html5lib which has developed its Python 3 support
separately from its Python 2 code.
If we can't find a reasonable way to handle all of this then what I will do
is branch the unladen benchmarks for 2.x/3.x benchmarking, and then create
another branch of the benchmark suite to just be for Python 3.x so that we
can start fresh with a new set of benchmarks that will never change
themselves for benchmarking Python 3 itself. That would also mean we could
start of with whatever is needed from PyPy and unladen to have the optimal
benchmark runner for speed.python.org.