With the planned move to GitHub, there is an opportunity to try and rework
the set of benchmarks -- and anything else -- in 2016 by starting a new
benchmark repo from scratch. E.g., modern numeric benchmarks, long-running
benchmarks that warm up JITs, using pip with pegged bugfix versions so we
stop shipping library code with the benchmarks, etc. We could also
standardize results output -- e.g. should we just make everything run under
codespeed? -- so that the benchmarks are easy to run locally for one-off
results as well as continuous benchmarking for trend details with a common
benchmark driver?
Would people be interested and motivated enough in getting representatives
from the various Python implementations together at PyCon and have a BoF to
discuss what we want from a proper, unified, baseline benchmark suite and
see if we can pull one together -- or at least start one -- in 2016?