[Speed] Getting the project off the ground

Antonio Cuni anto.cuni at gmail.com
Thu Jul 7 16:56:11 CEST 2011


On 06/07/11 21:11, Da_Blitz wrote:

> Looks like a X5680 cpu so yes it has the turbo boost feature 
> (http://en.wikipedia.org/wiki/Nehalem_(microarchitecture)#Server_.2F_Desktop_Processors_2) 
> i am running an i7 for building pypy and have the same turbo boost 
> feature and in practice have not found it to be an issue aslong as you 
> are only running one translation at a time. as the workload is single 
> threadded it ramps up nicely. a lock or two should prevent the turbo 
> boost enabling/disabling erratic but it is also under kernel control. 
> 
> i havent investigated how much control the kernel has over it but i 
> assume if you switch the cpu speed governers from performance over to 
> user mode and manually set the freqency that should not be much of an 
> issue

I did some benchmarks, trying to understand how much the turbo boost and/or
scaling governors affect the performance and, most importantly, if/how much
they affect the standard deviation.  Since we are talking of benchmarks, the
smaller the standard deviation is, the better.

I ran the benchmark on Linux on an Intel i7 920 CPU, which has 4 physical
cores (8 logical ones with hyperthreading, but we do not want to use them).

The benchmark consisted in running richards.py (one of the benchmarks we use
in PyPy) using 1, 2, 3 or 4 cores at the same time.  I used "taskset" to set
the cpu affinity to a specific core.

For each number of core, I ran the benchmark 10 times in a row, and the
measured the average time spent and the standard deviation (so, with 4 cores I
had a total of 40 runs).

If the turbo boost theory is true, we expect the benchmarks to be slower when
we run 4 in parallel.

Here are the results:

1 core:  AVG:   1.939 seconds
         STDEV: 0.016 seconds

2 cores: AVG:   2.020
         STDEV: 0.013

3 cores: AVG:   2.022
         STDEV: 0.016

4 cores: AVG:   2.033
         STDEV: 0.023

We can see that with 4 cores performance drops a bit, but not much (~4%
between 1 core and 4 cores).

This is using the "ondemand" governor, which is the default on my system.

I tried to run it also with "performance", which in theory should give better
performance and smaller stdev, but in practice it does not (I don't know why,
honestly):

1 core, performance governor:
AVG:   1.961 seconds
STDEV: 0.027 seconds

I also tried to manually set the CPU to the lowest possible frequency, but got
even worse results:

1 core, slowest frequency:
AVG:   3.532 seconds
STDEV: 0.042 seconds

> turbo mode is socket specific so the isolation i talked about in my 
> last post would prevent compiles' from affecting the cpu freqency on 
> the benchmark cpus

Although the turbo boost seems not to affect much the performance, I agree
with Da_Blitz in that it makes sense to reserve 1 socket (i.e. 6 cores) for
benchmarks. Then, we can use the other 6 cores for general usage (e.g.,
running tests or translations).  But, before doing this we should check that
using one socket does not actually affect the performance of the other.  As
usual, I don't trust the theory too much :-).

ciao,
Anto


More information about the Speed mailing list