Le lun. 25 févr. 2019 à 05:57, Raymond Hettinger firstname.lastname@example.org a écrit :
I'll been running benchmarks that have been stable for a while. But between today and yesterday, there has been an almost across the board performance regression.
How do you run your benchmarks? If you use Linux, are you using CPU isolation?
It's possible that this is a measurement error or something unique to my system (my Mac installed the 10.14.3 release today), so I'm hoping other folks can run checks as well.
Getting reproducible benchmark results on timing smaller than 1 ms is really hard. I wrote some advices to get more stable results: https://perf.readthedocs.io/en/latest/run_benchmark.html#how-to-get-reproduc...
Variable and attribute read access: 4.0 ns read_local
In my experience, for timing less than 100 ns, *everything* impacts the benchmark, and the result is useless without the standard deviation.
On such microbenchmarks, the hash function hash a significant impact on performance. So you should run your benchmark on multiple different *processes* to get multiple different hash functions. Some people prefer to use PYTHONHASHSEED=0 (or another value), but I dislike using that since it's less representative of performance "on production" (with randomized hash function). For example, using 20 processes to test 20 randomized hash function is enough to compute the average cost of the hash function. More remark was more general, I didn't look at the specific case of var_access_benchmark.py. Maybe benchmarks on C depend on the hash function.
For example, 4.0 ns +/- 10 ns or 4.0 ns +/- 0.1 ns is completely different to decide if "5.0 ns" is slower to faster.
The "perf compare" command of my perf module "determines whether two samples differ significantly using a Student’s two-sample, two-tailed t-test with alpha equals to 0.95.": https://en.wikipedia.org/wiki/Student%27s_t-test
I don't understand how these things work, I just copied the code from the old Python benchmark suite :-)
See also my articles in my journey to stable benchmarks:
* https://vstinner.github.io/journey-to-stable-benchmark-system.html # nosy applications / CPU isolation * https://vstinner.github.io/journey-to-stable-benchmark-deadcode.html # PGO * https://vstinner.github.io/journey-to-stable-benchmark-average.html # randomized hash function
There are likely other parameters which impact benchmarks, that's why std dev and how the benchmark matter so much.