Re: [Speed] Are benchmarks and libraries mutable?

On Sat, Sep 1, 2012 at 7:21 PM, Brett Cannon <brett@python.org> wrote:
Now that I can run benchmarks against Python 2.7 and 3.3 simultaneously, I'm ready to start updating the benchmarks. This involves two parts.
One is moving benchmarks from PyPy over to the unladen repo on hg.python.org/benchmarks. But I wanted to first make sure people don't view the benchmarks as immutable (e.g. as Octane does: https://developers.google.com/octane/faq). Since the benchmarks are always relative between two interpreters their immutability isn't critical compared to if we were to report some overall score. But it also means that any changes made would throw off historical comparisons. For instance, if I take PyPy's Mako benchmark (which does a lot more work), should it be named mako_v2, or should we just replace mako wholesale?
And the second is the same question for libraries. For instance, the unladen benchmarks have Django 1.1a0 as the version which is rather ancient. And with 1.5 coming out with provisional Python 3 support I obviously would like to update it. But the same questions as with benchmarks crops up in reference to immutability. Another thing is that 2to3 can't actually be ported using 2to3 (http://bugs.python.org/issue15834) and so that itself will require two versions -- a 2.x version (probably from Python 2.7's stdlib) and a 3.x version (from the 3.2 stdlib) -- which already starts to add interesting issues for me in terms of comparing performance (e.g. I will have to probably update the 2.7 code to use io.BytesIO instead of StringIO.StringIO to be on more equal footing). Similar thing goes for html5lib which has developed its Python 3 support separately from its Python 2 code.
If we can't find a reasonable way to handle all of this then what I will do is branch the unladen benchmarks for 2.x/3.x benchmarking, and then create another branch of the benchmark suite to just be for Python 3.x so that we can start fresh with a new set of benchmarks that will never change themselves for benchmarking Python 3 itself. That would also mean we could start of with whatever is needed from PyPy and unladen to have the optimal benchmark runner for speed.python.org.
Speed mailing list Speed@python.org http://mail.python.org/mailman/listinfo/speed
Ideally I would like benchmarks to be immutable (have _v is fine). However, updating libraries might not be immutable (after all, you're interested in *the speed of running django*), but maybe we should mark this somehow in the history so we don't compare apples to organges.
Cheers, fijal
participants (1)
-
Maciej Fijalkowski