[Cython] Cython's view on a common benchmark suite (was: Re: [Speed] Buildbot Status)

Thu Feb 2 09:21:11 CET 2012

Brett Cannon, 01.02.2012 18:25:
> to prevent this from either ending up in a dead-end because of this, we
> need to first decide where the canonical set of Python VM benchmarks are
> going to live. I say hg.python.org/benchmarks for two reasons. One is that
> Antoine has already done work there to port some of the benchmarks so there
> is at least some there that are ready to be  run under Python 3 (and the
> tooling is in place to create separate Python 2 and Python 3 benchmark
> suites). Two, this can be a test of having the various VM contributors work
> out of hg.python.org if we are ever going to break the stdlib out for
> shared development. At worst we can simply take the changes made at
> pypy/benchmarks that apply to just the unladen benchmarks that exists, and
> at best merge the two sets (manually) into one benchmark suite so PyPy
> doesn't lose anything for Python 2 measurements that they have written and
> CPython doesn't lose any of its Python 3 benchmarks that it has created.
> 
> How does that sound?

+1

FWIW, Cython currently uses both benchmark suites, that of PyPy (in Py2.7)
and that of hg.python.org (in Py2.7 and 3.3), but without codespeed
integration and also without a dedicated server for benchmark runs. So the
results are unfortunately not accurate enough to spot minor changes even
over time.

https://sage.math.washington.edu:8091/hudson/view/bench/

We would like to join in on speed.python.org, once it's clear how the
benchmarks will be run and how the data uploads work and all that. It
already proved a bit tricky to get Cython integrated with the benchmark
runner on our side, and I'm planning to rewrite that integration at some
point, but it should already be doable to get "something" to work now.

I should also note that we don't currently support the whole benchmark
suite, so there must be a way to record individual benchmark results even
in the face of failures in other benchmarks. Basically, speed.python.org
would be useless for us if a failure in a single benchmark left us without
any performance data at all, because it will still take us some time to get
to 100% compliance and we would like to know if anything on that road has a
performance impact. Currently, we apply a short patch that adds a
try-except to the benchmark runner's main loop before starting the
measurements, because otherwise it would just bail out completely on a
single failure. Oh, and we also patch the benchmarks to remove references
to __file__ because of CPython issue 13429, although we may be able to work
around that at some point, specifically when doing on-the-fly compilation
during imports.

http://bugs.python.org/issue13429

Also note that benchmarks that only test C implemented stdlib modules (re,
pickle, json) are useless for Cython because they would only end up timing
the exact same code as for plain CPython.

Another test that is useless for us is the "mako" benchmark, because most
of what it does is to run generated code. There is currently no way for
Cython to hook into that, so we're out of the game here.

We also don't care about program startup tests, obviously, because we know
that Cython's compiler overhead plus an optimising gcc run will render them
meaningless anyway. I like the fact that there's still an old hg_startup
timing result lingering around from the time before I disabled that test,
telling us that Cython runs it 99.68% slower than CPython. Got to beat
that. 8-)

Stefan