[Python-checkins] r46505 -python/trunk/Tools/pybench/systimes.py
Steve Holden
steve at holdenweb.com
Wed Jun 7 12:21:50 CEST 2006
M.-A. Lemburg wrote:
> Fredrik Lundh wrote:
>
>>M.-A. Lemburg wrote:
[...]
>>>In fact, if you run time.time() vs. resource.getrusage() on
>>>a Linux box, you'll find that both more or less show the same
>>>flux in timings - with an error interval of around 15ms.
>>
>>which is easily explained by "cycle stealers", and is destroying the
>>benchmark's precision.
>
>
> Right, but there's nothing much you can do about, I'm afraid.
>
And therein lies the crux of this discussion. It's pointless talking
about "1.56% accuracy" under circumstances like this. It's also, in my
opinion, a bit pointless deriving a notional "per operation" figure for
each class of operation, but let's let that slide.
Benchmarks are useful to discover whether one system is faster than
another for a given processing load. The days of determinism are pretty
much long gone, so we have to accept that.
>
>>and as usual, if you don't have precision, you
>>don't really have accuracy (unless you have a good statistical model,
>>and enough data to use it; see Andrew's posts for more on that).
>
>
> If you know how big your error interval is, then you are
> already in a very good position. If you can narrow down
> that interval, you're in an even better position. How
> this can be done depends on the method of timing you're
> using and whether you run the benchmark using many short
> runs, a few long ones or many long ones.
>
But if the benchmark gives radically different results under each
circumstance then there isn't a lot of point providing comparison features.
This latest conversation all started because we observed at the need for
Speed sprint that there didn't seem to be any reliable way to determine
whether a given change in the interpreter resulted in speed increases.
When Tim timed things that affected the pystone benchmark he discovered
that among other things he observed a difference of up to 50% simply due
to CPU core temperature.
Ultimately I suspect that the answer is to have more available
benchmarks and to persuade people to run them more frequently, and place
less absolute trust in the output from a single run.
regards
Steve
--
Steve Holden +44 150 684 7255 +1 800 494 3119
Holden Web LLC/Ltd http://www.holdenweb.com
Love me, love my blog http://holdenweb.blogspot.com
Recent Ramblings http://del.icio.us/steve.holden
More information about the Python-checkins
mailing list