On Fri, 10 Jun 2016 at 10:11 Steven D'Aprano <steve@pearwood.info> wrote:
I started to work on visualisation. IMHO it helps to understand the
On Fri, Jun 10, 2016 at 05:07:18PM +0200, Victor Stinner wrote: problem.
Let's create a large dataset: 500 samples (100 processes x 5 samples): --- $ python3 telco.py --json-file=telco.json -p 100 -n 5 ---
Attached plot.py script creates an histogram: --- avg: 26.7 ms +- 0.2 ms; min = 26.2 ms
26.1 ms: 1 # 26.2 ms: 12 ##### 26.3 ms: 34 ############ 26.4 ms: 44 ################ 26.5 ms: 109 ###################################### 26.6 ms: 117 ######################################## 26.7 ms: 86 ############################## 26.8 ms: 50 ################## 26.9 ms: 32 ########### 27.0 ms: 10 #### 27.1 ms: 3 ## 27.2 ms: 1 # 27.3 ms: 1 #
minimum 26.1 ms: 0.2% (1) of 500 samples ---
[...]
The distribution looks a gaussian curve: https://en.wikipedia.org/wiki/Gaussian_function
Lots of distributions look a bit Gaussian, but they can be skewed, or truncated, or both. E.g. the average life-span of a lightbulb is approximately Gaussian with a central peak at some value (let's say 5000 hours), but while it is conceivable that you might be really lucky and find a bulb that lasts 15000 hours, it isn't possible to find one that lasts -10000 hours. The distribution is truncated on the left.
To me, your graph looks like the distribution is skewed: the right-hand tail (shown at the bottom) is longer than the left-hand tail, six buckets compared to five buckets. There are actual statistical tests for detecting deviation from Gaussian curves, but I'd have to look them up. But as a really quick and dirty test, we can count the number of samples on either side of the central peak (the mode):
left: 109+44+34+12+1 = 200 centre: 117 right: 500 - 200 - 117 = 183
It certainly looks *close* to Gaussian, but with the crude tests we are using, we can't be sure. If you took more and more samples, I would expect that the right-hand tail would get longer and longer, but the left-hand tail would not.
The interesting thing is that only 1 sample on 500 are in the minimum bucket (26.1 ms). If you say that the performance is 26.1 ms, only 0.2% of your users will be able to reproduce this timing.
Hmmm. Okay, that is a good point. In this case, you're not so much reporting your estimate of what the "true speed" of the code snippet would be in the absence of all noise, but your estimate of what your users should expect to experience "most of the time".
I think the other way to think about why you don't want to use the minimum is what if one run just happened to get lucky and ran when nothing else was running (some random lull on the system), while the second run didn't get so lucky on magically hitting an equivalent lull? Using the average helps remove the "luck of the draw" potential of taking the minimum. This is why the PyPy folks suggested to Victor to not consider the minimum but the average instead; minimum doesn't measure typical system behaviour.
Assuming they have exactly the same hardware, operating system, and load on their system as you have.
Sure, but that's true of any benchmarking. The only way to get accurate measurements for one's own system is to run the benchmarks yourself. -Brett
The average and std dev are 26.7 ms +- 0.2 ms, so numbers 26.5 ms .. 26.9 ms: we got 109+117+86+50+32 samples in this range which gives us 394/500 = 79%.
IMHO saying "26.7 ms +- 0.2 ms" (79% of samples) is less a lie than 26.1 ms (0.2%).
I think I understand the point you are making. I'll have to think about it some more to decide if I agree with you.
But either way, I think the work you have done on perf is fantastic and I think this will be a great tool. I really love the histogram. Can you draw a histogram of two functions side-by-side, for comparisons?
-- Steve _______________________________________________ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/brett%40python.org