Brett Cannon, 01.02.2012 18:25:
to prevent this from either ending up in a dead-end because of this, we need to first decide where the canonical set of Python VM benchmarks are going to live. I say hg.python.org/benchmarks for two reasons. One is that Antoine has already done work there to port some of the benchmarks so there is at least some there that are ready to be run under Python 3 (and the tooling is in place to create separate Python 2 and Python 3 benchmark suites). Two, this can be a test of having the various VM contributors work out of hg.python.org if we are ever going to break the stdlib out for shared development. At worst we can simply take the changes made at pypy/benchmarks that apply to just the unladen benchmarks that exists, and at best merge the two sets (manually) into one benchmark suite so PyPy doesn't lose anything for Python 2 measurements that they have written and CPython doesn't lose any of its Python 3 benchmarks that it has created.
How does that sound?
FWIW, Cython currently uses both benchmark suites, that of PyPy (in Py2.7) and that of hg.python.org (in Py2.7 and 3.3), but without codespeed integration and also without a dedicated server for benchmark runs. So the results are unfortunately not accurate enough to spot minor changes even over time.
We would like to join in on speed.python.org, once it's clear how the benchmarks will be run and how the data uploads work and all that. It already proved a bit tricky to get Cython integrated with the benchmark runner on our side, and I'm planning to rewrite that integration at some point, but it should already be doable to get "something" to work now.
I should also note that we don't currently support the whole benchmark suite, so there must be a way to record individual benchmark results even in the face of failures in other benchmarks. Basically, speed.python.org would be useless for us if a failure in a single benchmark left us without any performance data at all, because it will still take us some time to get to 100% compliance and we would like to know if anything on that road has a performance impact. Currently, we apply a short patch that adds a try-except to the benchmark runner's main loop before starting the measurements, because otherwise it would just bail out completely on a single failure. Oh, and we also patch the benchmarks to remove references to __file__ because of CPython issue 13429, although we may be able to work around that at some point, specifically when doing on-the-fly compilation during imports.
Also note that benchmarks that only test C implemented stdlib modules (re, pickle, json) are useless for Cython because they would only end up timing the exact same code as for plain CPython.
Another test that is useless for us is the "mako" benchmark, because most of what it does is to run generated code. There is currently no way for Cython to hook into that, so we're out of the game here.
We also don't care about program startup tests, obviously, because we know that Cython's compiler overhead plus an optimising gcc run will render them meaningless anyway. I like the fact that there's still an old hg_startup timing result lingering around from the time before I disabled that test, telling us that Cython runs it 99.68% slower than CPython. Got to beat that. 8-)