[CentralOH] Threading benchmarks - Results

Bryan Harris bryan.harris at udri.udayton.edu
Tue Nov 10 21:10:09 CET 2009


Hi all,

I got some good feedback on my multi-threading question and was able to get 
some useful benchmarks.   I ran these on a 2 CPU Xeon server, a quad core 
Phenom, and a single core Phenom.  I have attached the results along with the 
programs so you should be able to reproduce if you wish.   However, I can 
summarize a few points below:

1. The threading library does not help for cpu-bound applications.  The 
threading library got slower for any number of threads >1 and then stayed 
pretty much constant on both the 2 core and the 4 core machine.

2. The newer processing library works the same or better than the threading 
library in all cases.  I was able to get almost full utilization of both cores 
on the Xeon system.  The opteron was busy transcoding my wife's soap opera, a 
process I dared not interrupt.  However, the opteron tests confirmed that the 
processing library did reduce the run time despite competition from other cpu-
hungry applications.  Real time was reduced by almost half on the Xeon system 
with threads >1 and then stayed roughly constant as the number of threads 
increased.  

Interestingly, real time increased slightly with threads on the 4 core 
opteron.  I attribute this to competition from mythcommflag and not process 
overhead, but will need to verify later.

3. (An obvious one I should have remembered from my ECE courses.) The print 
(just as with printf) function is terribly slow and masked some of the results 
from before.  This was removed as recommended and increased performance.

4. Perhaps most interestingly, the threading library significantly hurt 
performance for threads > 1 on a multicore machine but not on a single core 
machine. 

Thanks for the feedback,
Bryan

-- 
Bryan Harris
Research Engineer
Structures and Materials Evaluation Group
bryan.harris at udri.udayton.edu
http://www.udri.udayton.edu/


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/mailman/private/centraloh/attachments/20091110/9391d2a0/attachment.htm>
-------------- next part --------------
First Run: Dual Physical CPU - Intel(R) Xeon(TM) CPU 2.40GHz
total=$(( 2 ** 16 ));threads=1;while [[ $threads -lt 128 ]]; do repeats=$(( total / threads )); echo threading $threads threads $repeats repeats; time ./mt.py $threads $repeats >mt$threads.log; threads=$(( $threads * 2 )); done
threading 1 threads 65536 repeats

real	0m9.119s
user	0m8.405s
sys	0m0.712s
threading 2 threads 32768 repeats

real	0m11.878s
user	0m8.389s
sys	0m3.788s
threading 4 threads 16384 repeats

real	0m11.485s
user	0m9.701s
sys	0m3.484s
threading 8 threads 8192 repeats

real	0m11.045s
user	0m9.165s
sys	0m2.828s
threading 16 threads 4096 repeats

real	0m10.980s
user	0m9.189s
sys	0m2.712s
threading 32 threads 2048 repeats

real	0m11.146s
user	0m9.393s
sys	0m2.788s
threading 64 threads 1024 repeats

real	0m11.230s
user	0m9.321s
sys	0m2.980s



total=$(( 2 ** 16 ));threads=1;while [[ $threads -lt 128 ]]; do repeats=$(( total / threads )); echo -n processing $threads threads $repeats repeats; time ./mp.py $threads $repeats >mp$threads.log; threads=$(( $threads * 2 )); done
processing 1 threads 65536 repeats
real	0m9.088s
user	0m8.553s
sys	0m0.512s
processing 2 threads 32768 repeats
real	0m5.122s
user	0m8.861s
sys	0m0.940s
processing 4 threads 16384 repeats
real	0m5.282s
user	0m9.821s
sys	0m0.648s
processing 8 threads 8192 repeats
real	0m4.571s
user	0m8.357s
sys	0m0.656s
processing 16 threads 4096 repeats
real	0m4.592s
user	0m8.357s
sys	0m0.736s
processing 32 threads 2048 repeats
real	0m5.014s
user	0m9.005s
sys	0m0.892s
processing 64 threads 1024 repeats
real	0m4.639s
user	0m8.289s
sys	0m0.852s

Run 2:

total=$(( 2 ** 18 ));threads=1;while [[ $threads -lt 257 ]]; do repeats=$(( total / threads )); echo -n threading $threads threads $repeats repeats; time ./mt.py $threads $repeats >mt$threads.log; threads=$(( $threads * 2 )); done
threading 1 threads 262144 repeats
real	0m35.760s
user	0m33.738s
sys	0m1.980s
threading 2 threads 131072 repeats
real	0m47.518s
user	0m38.466s
sys	0m15.277s
threading 4 threads 65536 repeats
real	0m46.009s
user	0m36.322s
sys	0m14.201s
threading 8 threads 32768 repeats
real	0m44.785s
user	0m35.706s
sys	0m12.521s
threading 16 threads 16384 repeats
real	0m44.470s
user	0m37.074s
sys	0m11.521s
threading 32 threads 8192 repeats
real	0m44.572s
user	0m36.998s
sys	0m11.225s
threading 64 threads 4096 repeats
real	0m44.349s
user	0m37.518s
sys	0m11.105s
threading 128 threads 2048 repeats
real	0m44.861s
user	0m37.558s
sys	0m11.377s
threading 256 threads 1024 repeats
real	0m45.165s
user	0m38.122s
sys	0m11.381s

total=$(( 2 ** 18 ));threads=1;while [[ $threads -lt 257 ]]; do repeats=$(( total / threads )); echo -n processing $threads threads $repeats repeats; time ./mp.py $threads $repeats >mp$threads.log; threads=$(( $threads * 2 )); done
processing 1 threads 262144 repeats
real	0m34.568s
user	0m32.498s
sys	0m2.064s
processing 2 threads 131072 repeats
real	0m18.715s
user	0m33.770s
sys	0m3.032s
processing 4 threads 65536 repeats
real	0m17.994s
user	0m33.358s
sys	0m2.488s
processing 8 threads 32768 repeats
real	0m17.991s
user	0m33.298s
sys	0m2.536s
processing 16 threads 16384 repeats
real	0m17.993s
user	0m33.002s
sys	0m2.844s
processing 32 threads 8192 repeats
real	0m19.022s
user	0m34.786s
sys	0m3.044s
processing 64 threads 4096 repeats
real	0m18.731s
user	0m34.406s
sys	0m2.892s
processing 128 threads 2048 repeats
real	0m19.346s
user	0m35.202s
sys	0m3.296s
processing 256 threads 1024 repeats
real	0m18.668s
user	0m33.490s
sys	0m3.624s

Run 3: AMD Phenom(tm) 9600 Quad-Core Processor (Competing with 2 mythcommflag processes)

total=$(( 2 ** 18 ));threads=1;while [[ $threads -lt 257 ]]; do repeats=$(( total / threads )); echo threading $threads threads $repeats repeats; time ./mt.py $threads $repeats >mt$threads.log; threads=$(( $threads * 2 )); done
threading 1 threads 262144 repeats

real	0m18.139s
user	0m16.993s
sys	0m0.796s
threading 2 threads 131072 repeats

real	0m46.050s
user	0m36.606s
sys	0m13.785s
threading 4 threads 65536 repeats

real	0m44.709s
user	0m35.174s
sys	0m9.401s
threading 8 threads 32768 repeats

real	0m53.652s
user	0m35.378s
sys	0m8.573s
threading 16 threads 16384 repeats

real	0m55.168s
user	0m35.638s
sys	0m8.041s
threading 32 threads 8192 repeats

real	1m1.495s
user	0m35.430s
sys	0m7.300s
threading 64 threads 4096 repeats

real	0m52.198s
user	0m35.298s
sys	0m7.896s
threading 128 threads 2048 repeats

real	1m1.661s
user	0m35.478s
sys	0m8.301s
threading 256 threads 1024 repeats

real	1m1.717s
user	0m34.670s
sys	0m9.041s

total=$(( 2 ** 18 ));threads=1;while [[ $threads -lt 257 ]]; do repeats=$(( total / threads )); echo processing $threads threads $repeats repeats; time ./mp.py $threads $repeats >mp$threads.log; threads=$(( $threads * 2 )); done
processing 1 threads 262144 repeats

real	0m17.797s
user	0m16.773s
sys	0m0.796s
processing 2 threads 131072 repeats

real	0m10.568s
user	0m16.973s
sys	0m1.072s
processing 4 threads 65536 repeats

real	0m7.582s
user	0m16.565s
sys	0m1.516s
processing 8 threads 32768 repeats

real	0m9.308s
user	0m20.713s
sys	0m2.096s
processing 16 threads 16384 repeats

real	0m9.569s
user	0m19.985s
sys	0m2.432s
processing 32 threads 8192 repeats

real	0m10.586s
user	0m21.785s
sys	0m3.252s
processing 64 threads 4096 repeats

real	0m10.937s
user	0m22.705s
sys	0m3.136s
processing 128 threads 2048 repeats

real	0m11.237s
user	0m23.065s
sys	0m3.700s
processing 256 threads 1024 repeats

real	0m12.592s
user	0m25.434s
sys	0m5.200s


Run 4:  AMD Opteron(tm) Processor 148

total=$(( 2 ** 16 ));threads=1;while [[ $threads -lt 128 ]]; do repeats=$(( total / threads )); echo threading $threads threads $repeats repeats; time ./mt.py $threads $repeats >mt$threads.log; threads=$(( $threads * 2 )); done
threading 1 threads 65536 repeats

real	0m6.207s
user	0m5.784s
sys	0m0.268s
threading 2 threads 32768 repeats

real	0m6.252s
user	0m5.852s
sys	0m0.272s
threading 4 threads 16384 repeats

real	0m6.348s
user	0m5.944s
sys	0m0.276s
threading 8 threads 8192 repeats

real	0m6.430s
user	0m6.040s
sys	0m0.276s
threading 16 threads 4096 repeats

real	0m6.307s
user	0m5.844s
sys	0m0.364s
threading 32 threads 2048 repeats

real	0m6.618s
user	0m5.852s
sys	0m0.368s
threading 64 threads 1024 repeats

real	0m6.678s
user	0m5.816s
sys	0m0.428s

total=$(( 2 ** 16 ));threads=1;while [[ $threads -lt 128 ]]; do repeats=$(( total / threads )); echo processing $threads threads $repeats repeats; time ./mp.py $threads $repeats >mp$threads.log; threads=$(( $threads * 2 )); done
processing 1 threads 65536 repeats

real	0m6.379s
user	0m5.908s
sys	0m0.284s
processing 2 threads 32768 repeats

real	0m6.358s
user	0m5.960s
sys	0m0.276s
processing 4 threads 16384 repeats

real	0m6.039s
user	0m5.612s
sys	0m0.328s
processing 8 threads 8192 repeats

real	0m6.502s
user	0m6.108s
sys	0m0.272s
processing 16 threads 4096 repeats

real	0m6.260s
user	0m5.792s
sys	0m0.352s
processing 32 threads 2048 repeats

real	0m6.693s
user	0m5.964s
sys	0m0.336s
processing 64 threads 1024 repeats

real	0m6.777s
user	0m5.988s
sys	0m0.340s
-------------- next part --------------
A non-text attachment was scrubbed...
Name: mt.py
Type: text/x-python
Size: 886 bytes
Desc: not available
URL: <http://mail.python.org/mailman/private/centraloh/attachments/20091110/9391d2a0/attachment.py>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: mp.py
Type: text/x-python
Size: 910 bytes
Desc: not available
URL: <http://mail.python.org/mailman/private/centraloh/attachments/20091110/9391d2a0/attachment-0001.py>


More information about the CentralOH mailing list