Hi,
I released pyperf 2.1.0: the compare_to command now computes the
geometric mean of a whole benchmark suite and no longer displays
percentages (display less number to not confuse readers).
If the benchmark suites contain more than one benchmark, the geometric
mean is computed: normalize benchmark results means to the reference
results means.
For comparisons, pyperf only displays numbers greater than or equal to
1.0 but mentions "faster" or "slower". speed.pypy.org displays the
geometric *both* ways: "The geometric average of all benchmarks is
0.23 or 4.3 times faster than cpython." I prefer to only displays a
single number, so I picked "4.3x faster" rather than "0.23 (faster)".
Example:
$ python3 -m pyperf compare_to --table mult_list_py36.json
mult_list_py37.json mult_list_py38.json
+----------------+----------------+-----------------------+-----------------------+
| Benchmark | mult_list_py36 | mult_list_py37 |
mult_list_py38 |
+================+================+=======================+=======================+
| [1]*1000 | 2.13 us | 2.09 us: 1.02x faster | not
significant |
+----------------+----------------+-----------------------+-----------------------+
| [1,2]*1000 | 3.70 us | 5.28 us: 1.42x slower | 3.18 us:
1.16x faster |
+----------------+----------------+-----------------------+-----------------------+
| [1,2,3]*1000 | 4.61 us | 6.05 us: 1.31x slower | 4.17 us:
1.11x faster |
+----------------+----------------+-----------------------+-----------------------+
| Geometric mean | (ref) | 1.22x slower | 1.09x
faster |
+----------------+----------------+-----------------------+-----------------------+
Here you can see that Python 3.7 is faster than Python 3.6 on one
benchmark, but slower on two benchmarks. What does it mean? Is it
slower or faster "overall"? The geometric mean "1.22x slower" helps to
quickly see that Python 3.7 is overall slower.
Python 3.8 is faster than Python 3.6 on two benchmarks and one
benchmark is not significant. The geometric mean "1.09x faster"
confirms that it's faster, but it also tells how much "overall".
The geometric mean gives the same weight (1.0) to all benchmarks. It's
up to you to carefully check the benchmarks that you consider as the
most important, and not only rely on the geometric mean.
See also compare_to documentation for more details:
https://pyperf.readthedocs.io/en/latest/cli.html#compare-to-cmd
Victor
--
Night gathers, and now my watch begins. It shall not end until my death.