Measure of Python performance for general-purpose code

I'm doing some Python speed testing, measuring the effect of different combinations of compiler flags on a small range of hardware.
So far, for the test load I've mostly been using a specific program I happen to care about.
But I'm thinking of writing up the results for more general interest, so I've been looking at pyperformance.
To get comprehensible results, I think I really need to summarise the speed of a particular build+hardware combination as a single number, representing Python's performance for "general purpose code".
So does anyone have any recommendations on what the best figure to extract from pyperformance results would be?
Is pyperformance's 'default' benchmark group the most suitable for this?
Is there any more sensible way to get a single number than taking the geometric mean of what Benchmark.mean() gives me for each test in the group?
Are pyperformance's other default settings suitable for this purpose?
-M-

On 23 April 2018 at 05:00, Matthew Woodcraft <matthew@woodcraft.me.uk> wrote:
To get comprehensible results, I think I really need to summarise the speed of a particular build+hardware combination as a single number, representing Python's performance for "general purpose code".
So does anyone have any recommendations on what the best figure to extract from pyperformance results would be?
There's no such number in the general case, since the way different aspects should be weighted differs significantly based on your use case (e.g. a long running server or GUI application may care very little about startup time, while it's critical for command line application responsiveness). That's why we have a benchmark suite, rather than just a single benchmark.
https://hackernoon.com/which-is-the-fastest-version-of-python-2ae7c61a6b2b is an example of going through and calling out specific benchmarks based on the kind of code they best represent.
So I don't think you're going to be able to get away from coming up with your own custom scheme that emphasises a particular usage profile. While the simplest approach is the one the linked article took (i.e. weight one benchmark at a time at 100%, ignore the others), searching for "combining multiple benchmark results into an aggregate score" returned https://pubsonline.informs.org/doi/pdf/10.1287/ited.2013.0124 as the first link for me, and based on skimming the abstract and introduction, I think it's likely to be quite relevant to your question.
Cheers, Nick.
-- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia
participants (2)
-
Matthew Woodcraft
-
Nick Coghlan