On 10 June 2016 at 15:34, David Malcolm <dmalcolm@redhat.com> wrote:
The problem is that random noise can only ever slow the code down, it cannot speed it up. [...] Isn't it possible that under some circumstances the 2nd process could prefetch memory into the cache in such a way that the workload under test actually gets faster than if the 2nd process wasn't running?
My feeling is that it would be much rarer for random effects to speed up the benchmark under test - possible in the sort of circumstance you describe, but not common. The conclusion I draw is "be careful how you interpret summary statistics if you don't know the distribution of the underlying data as an estimator of the value you are interested in". In the case of Victor's article, he's specifically trying to compensate for variations introduced by Python's hash randomisation algorithm. And for that, you would get both positive and negative effects on code speed, so the average makes sense. But only if you've already eliminated the other common noise (such as other proceses, etc). In Victor's articles, he sounds like he's done this, but he's using very Linux-specific mechanisms, and I don't know if he's done the same for other platforms. Also, the way people commonly use micro-benchmarks ("hey, look, this way of writing the expression goes faster than that way") doesn't really address questions like "is the difference statistically significant". Summary: Micro-benchmarking is hard. Victor looks like he's done some really interesting work on it, but any "easy to use" timeit tool will typically get used in an over-simplistic way in practice, and so you probably shouldn't read too much into timing figures quoted in isolation, no matter what tool was used to generate them. Paul