[pypy-dev] speed.pypy.org launched
stefan_ml at behnel.de
Fri Feb 26 11:30:11 CET 2010
Miquel Torres, 26.02.2010 11:05:
> You may also consider that a benchmark that varies greatly between
> runs may be a flawed benchmark.
> I think it should be considered, but only on the running side, and act
> accordingly (too high a deviation: discard run, reconsider benchmark,
> reconsider environment or whatever).
Right, there might even have been a cron job running at the same time.
There are various reasons why benchmark numbers can vary.
Especially in a JIT environment, you'd normally expect the benchmark
numbers to decrease over a certain time, or to stay constantly high for a
while, then show a peak when the compiler kicks in, and then continue at a
lower level (e.g. with the Sun JVM's hotspot JIT or incremental JIT
compilers in general). I assume that the benchmarking machinery handles
this, but it's yet another reason why highly differing timings can occur
within a single run, and why it's only the best run that really matters.
You could even go one step further: ignore deviating results in the history
graph and only present them when they are still reproducible (preferably
with the same source revision) an hour later.
More information about the Pypy-dev