Re: [Speed] Are benchmarks and libraries mutable?
On Sat, 1 Sep 2012 13:21:36 -0400 Brett Cannon <brett@python.org> wrote:
One is moving benchmarks from PyPy over to the unladen repo on hg.python.org/benchmarks. But I wanted to first make sure people don't view the benchmarks as immutable (e.g. as Octane does: https://developers.google.com/octane/faq). Since the benchmarks are always relative between two interpreters their immutability isn't critical compared to if we were to report some overall score. But it also means that any changes made would throw off historical comparisons. For instance, if I take PyPy's Mako benchmark (which does a lot more work), should it be named mako_v2, or should we just replace mako wholesale?
mako_v2 sounds fine to me. Mutating benchmarks makes things confusing: one person may report that interpreter A is faster than interpreter B on a given benchmark, and another person retort that no, interpreter B is faster than interpreter A.
Besides, if you want to have useful timelines on speed.p.o, you definitely need stable benchmarks.
And the second is the same question for libraries. For instance, the unladen benchmarks have Django 1.1a0 as the version which is rather ancient. And with 1.5 coming out with provisional Python 3 support I obviously would like to update it. But the same questions as with benchmarks crops up in reference to immutability.
django_v2 sounds fine too :)
(e.g. I will have to probably update the 2.7 code to use io.BytesIO instead of StringIO.StringIO to be on more equal footing).
I disagree. If io.BytesIO is faster than StringIO.StringIO then it's normal for the benchmark results to reflect that (ditto if it's slower).
If we can't find a reasonable way to handle all of this then what I will do is branch the unladen benchmarks for 2.x/3.x benchmarking, and then create another branch of the benchmark suite to just be for Python 3.x so that we can start fresh with a new set of benchmarks that will never change themselves for benchmarking Python 3 itself.
Why not simply add Python 3-specific benchmarks to the mix? You can then create a "py3" benchmark suite in perf.py (and perhaps also a "py2" one).
Regards
Antoine.
-- Software development and contracting: http://pro.pitrou.net
On Sat, Sep 1, 2012 at 2:57 PM, Antoine Pitrou <solipsis@pitrou.net> wrote:
On Sat, 1 Sep 2012 13:21:36 -0400 Brett Cannon <brett@python.org> wrote:
One is moving benchmarks from PyPy over to the unladen repo on hg.python.org/benchmarks. But I wanted to first make sure people don't
the benchmarks as immutable (e.g. as Octane does: https://developers.google.com/octane/faq). Since the benchmarks are always relative between two interpreters their immutability isn't critical compared to if we were to report some overall score. But it also means
view that
any changes made would throw off historical comparisons. For instance, if I take PyPy's Mako benchmark (which does a lot more work), should it be named mako_v2, or should we just replace mako wholesale?
mako_v2 sounds fine to me. Mutating benchmarks makes things confusing: one person may report that interpreter A is faster than interpreter B on a given benchmark, and another person retort that no, interpreter B is faster than interpreter A.
Besides, if you want to have useful timelines on speed.p.o, you definitely need stable benchmarks.
And the second is the same question for libraries. For instance, the unladen benchmarks have Django 1.1a0 as the version which is rather ancient. And with 1.5 coming out with provisional Python 3 support I obviously would like to update it. But the same questions as with benchmarks crops up in reference to immutability.
django_v2 sounds fine too :)
True, but having to carry around multiple copies of libraries just becomes a pain.
(e.g. I will have to probably update the 2.7 code to use io.BytesIO instead of StringIO.StringIO to be on more equal footing).
I disagree. If io.BytesIO is faster than StringIO.StringIO then it's normal for the benchmark results to reflect that (ditto if it's slower).
If we can't find a reasonable way to handle all of this then what I will do is branch the unladen benchmarks for 2.x/3.x benchmarking, and then create another branch of the benchmark suite to just be for Python 3.x so that we can start fresh with a new set of benchmarks that will never change themselves for benchmarking Python 3 itself.
Why not simply add Python 3-specific benchmarks to the mix? You can then create a "py3" benchmark suite in perf.py (and perhaps also a "py2" one).
To avoid historical baggage and to start from a clean slate. I don't necessarily want to carry around Python 2 benchmarks forever. It's not a massive concern, just a nicety.
participants (2)
-
Antoine Pitrou
-
Brett Cannon