[pypy-dev] speed.pypy.org launched

Stefan Behnel stefan_ml at behnel.de
Fri Feb 26 11:30:11 CET 2010


Miquel Torres, 26.02.2010 11:05:
> You may also consider that a benchmark that varies greatly between
> runs may be a flawed benchmark.
> 
> I think it should be considered, but only on the running side, and act
> accordingly (too high a deviation: discard run, reconsider benchmark,
> reconsider environment or whatever).

Right, there might even have been a cron job running at the same time.
There are various reasons why benchmark numbers can vary.

Especially in a JIT environment, you'd normally expect the benchmark
numbers to decrease over a certain time, or to stay constantly high for a
while, then show a peak when the compiler kicks in, and then continue at a
lower level (e.g. with the Sun JVM's hotspot JIT or incremental JIT
compilers in general). I assume that the benchmarking machinery handles
this, but it's yet another reason why highly differing timings can occur
within a single run, and why it's only the best run that really matters.

You could even go one step further: ignore deviating results in the history
graph and only present them when they are still reproducible (preferably
with the same source revision) an hour later.

Stefan




More information about the Pypy-dev mailing list