[Python-Dev] Stop using timeit, use perf.timeit!

Fri Jun 10 11:09:44 EDT 2016

On 10 June 2016 at 15:34, David Malcolm <dmalcolm at redhat.com> wrote:
>> The problem is that random noise can only ever slow the code down, it
>> cannot speed it up.
[...]
> Isn't it possible that under some circumstances the 2nd process could
> prefetch memory into the cache in such a way that the workload under
> test actually gets faster than if the 2nd process wasn't running?

My feeling is that it would be much rarer for random effects to speed
up the benchmark under test - possible in the sort of circumstance you
describe, but not common.

The conclusion I draw is "be careful how you interpret summary
statistics if you don't know the distribution of the underlying data
as an estimator of the value you are interested in".

In the case of Victor's article, he's specifically trying to
compensate for variations introduced by Python's hash randomisation
algorithm. And for that, you would get both positive and negative
effects on code speed, so the average makes sense. But only if you've
already eliminated the other common noise (such as other proceses,
etc). In Victor's articles, he sounds like he's done this, but he's
using very Linux-specific mechanisms, and I don't know if he's done
the same for other platforms. Also, the way people commonly use
micro-benchmarks ("hey, look, this way of writing the expression goes
faster than that way") doesn't really address questions like "is the
difference statistically significant".

Summary: Micro-benchmarking is hard. Victor looks like he's done some
really interesting work on it, but any "easy to use" timeit tool will
typically get used in an over-simplistic way in practice, and so you
probably shouldn't read too much into timing figures quoted in
isolation, no matter what tool was used to generate them.

Paul