
Hi,
I released pyperf 2.1.0: the compare_to command now computes the geometric mean of a whole benchmark suite and no longer displays percentages (display less number to not confuse readers).
If the benchmark suites contain more than one benchmark, the geometric mean is computed: normalize benchmark results means to the reference results means.
For comparisons, pyperf only displays numbers greater than or equal to 1.0 but mentions "faster" or "slower". speed.pypy.org displays the geometric *both* ways: "The geometric average of all benchmarks is 0.23 or 4.3 times faster than cpython." I prefer to only displays a single number, so I picked "4.3x faster" rather than "0.23 (faster)".
Example:
$ python3 -m pyperf compare_to --table mult_list_py36.json mult_list_py37.json mult_list_py38.json +----------------+----------------+-----------------------+-----------------------+ | Benchmark | mult_list_py36 | mult_list_py37 | mult_list_py38 | +================+================+=======================+=======================+ | [1]*1000 | 2.13 us | 2.09 us: 1.02x faster | not significant | +----------------+----------------+-----------------------+-----------------------+ | [1,2]*1000 | 3.70 us | 5.28 us: 1.42x slower | 3.18 us: 1.16x faster | +----------------+----------------+-----------------------+-----------------------+ | [1,2,3]*1000 | 4.61 us | 6.05 us: 1.31x slower | 4.17 us: 1.11x faster | +----------------+----------------+-----------------------+-----------------------+ | Geometric mean | (ref) | 1.22x slower | 1.09x faster | +----------------+----------------+-----------------------+-----------------------+
Here you can see that Python 3.7 is faster than Python 3.6 on one benchmark, but slower on two benchmarks. What does it mean? Is it slower or faster "overall"? The geometric mean "1.22x slower" helps to quickly see that Python 3.7 is overall slower.
Python 3.8 is faster than Python 3.6 on two benchmarks and one benchmark is not significant. The geometric mean "1.09x faster" confirms that it's faster, but it also tells how much "overall".
The geometric mean gives the same weight (1.0) to all benchmarks. It's up to you to carefully check the benchmarks that you consider as the most important, and not only rely on the geometric mean.
See also compare_to documentation for more details: https://pyperf.readthedocs.io/en/latest/cli.html#compare-to-cmd
Victor
Night gathers, and now my watch begins. It shall not end until my death.
participants (1)
-
Victor Stinner