[pypy-dev] speed.pypy.org launched

Fri Feb 26 13:13:15 CET 2010

On 02/26/2010 12:30 PM, Miquel Torres wrote:
> The paper is right, and the unladen swallow runner does the right thing.
>
> What I meant was: use the statistically right method (like we are
> doing now!), but don't show deviation bars if the deviation is
> acceptable. Check after the run whether the deviation is not
> "acceptable". If it isn't, rerun later, check that nothing in the
> background is affecting performance, reevaluate reproducibility of the
> given benchmark, etc.

I think an important point is that the deviations don't need to come 
from anything in the background. We don't use threads in the benchmarks 
(yet) which would obviously insert non-determinism, but even currently 
there is enough randomness in the interpreter itself. The GC can start 
at different points, the JIT could decide (late in the process) that 
something else should be compiled, there are cache-effects, etc. This 
randomness is not a bad thing, but I think we should try to at least 
evaluate it, by showing the error bars. We should do that even if the 
errors are small, because that is a good result worth mentioning.

I guess around 20 or even 10 years ago you could attribute a "correct" 
running time to a program, but nowadays there is noise on all levels of 
the system and it is not really possible to ignore that. Also, there are 
really a lot more levels too :-).

> But it doesn't change the fact that speed could save the deviation
> data for later use.

Yip.

Cheers,

Carl Friedrich