[Python-Dev] Python Benchmarks

Fri Jun 2 14:06:23 CEST 2006

M.-A. Lemburg:
> The approach pybench is using is as follows:
 ...
>  The calibration step is run multiple times and is used
>  to calculate an average test overhead time.

One of the changes that occured during the sprint was to change this algorithm
to use the best time rather than the average.  Using the average assumes a
Gaussian distribution.  Timing results are not.  There is an absolute best but
that's rarely reached due to background noise.  It's more like a gamma
distribution
plus the minimum time.

To show the distribution is non-Gaussian I ran the following

def compute():
    x = 0
    for i in range(1000):
        for j in range(1000):
            x += 1

def bench():
    t1 = time.time()
    compute()
    t2 = time.time()
    return t2-t1

times = []
for i in range(1000):
    times.append(bench())

print times

The full distribution is attached as 'plot1.png' and the close up
(range 0.45-0.65)
as 'plot2.png'.  Not a clean gamma function, but that's a closer match than an
exponential.

The gamma distribution looks more like a exponential function when the shape
parameter is large.  This corresponds to a large amount of noise in the system,
so the run time is not close to the best time.  This means the average approach
works better when there is a lot of random background activity, which is not the
usual case when I try to benchmark.

When averaging a gamma distribution you'll end up with a bit of a
skew, and I think
the skew depends on the number of samples, reaching a limit point.

Using the minimum time should be more precise because there is a
definite lower bound and the machine should be stable.  In my test
above the first few results are

0.472838878632
0.473038911819
0.473326921463
0.473494052887
0.473829984665

I'm pretty certain the best time is 0.4725, or very close to that.
But the average
time is 0.58330151391 because of the long tail.  Here are the last 6 results in
my population of 1000

1.76353311539
1.79937505722
1.82750201225
2.01710510254
2.44861507416
2.90868496895

Granted, I hit a couple of web pages while doing this and my spam
filter processed
my mailbox in the background...

There's probably some Markov modeling which would look at the number
and distribution of samples so far and assuming a gamma distribution
determine how many more samples are needed to get a good estimate of
the absolute minumum time.  But min(large enough samples) should work
fine.

> If the whole suite runs in 50 seconds, the per-test
> run-times are far too small to be accurate. I usually
> adjust the warp factor so that each *round* takes
> 50 seconds.

The stringbench.py I wrote uses the timeit algorithm which
dynamically adjusts the test to run between 0.2 and 2 seconds.

> That's why the timers being used by pybench will become a
> parameter that you can then select to adapt pybench it to
> the OS your running pybench on.

Wasn't that decision a consequence of the problems found during
the sprint?

                                Andrew
                                dalke at dalkescientific.com
-------------- next part --------------
A non-text attachment was scrubbed...
Name: plot1.png
Type: image/png
Size: 2683 bytes
Desc: not available
Url : http://mail.python.org/pipermail/python-dev/attachments/20060602/be8bb51c/attachment.png 
-------------- next part --------------
A non-text attachment was scrubbed...
Name: plot2.png
Type: image/png
Size: 2916 bytes
Desc: not available
Url : http://mail.python.org/pipermail/python-dev/attachments/20060602/be8bb51c/attachment-0001.png