[CentralOH] Threading benchmarks - Results
Bryan Harris
bryan.harris at udri.udayton.edu
Tue Nov 10 21:10:09 CET 2009
Hi all,
I got some good feedback on my multi-threading question and was able to get
some useful benchmarks. I ran these on a 2 CPU Xeon server, a quad core
Phenom, and a single core Phenom. I have attached the results along with the
programs so you should be able to reproduce if you wish. However, I can
summarize a few points below:
1. The threading library does not help for cpu-bound applications. The
threading library got slower for any number of threads >1 and then stayed
pretty much constant on both the 2 core and the 4 core machine.
2. The newer processing library works the same or better than the threading
library in all cases. I was able to get almost full utilization of both cores
on the Xeon system. The opteron was busy transcoding my wife's soap opera, a
process I dared not interrupt. However, the opteron tests confirmed that the
processing library did reduce the run time despite competition from other cpu-
hungry applications. Real time was reduced by almost half on the Xeon system
with threads >1 and then stayed roughly constant as the number of threads
increased.
Interestingly, real time increased slightly with threads on the 4 core
opteron. I attribute this to competition from mythcommflag and not process
overhead, but will need to verify later.
3. (An obvious one I should have remembered from my ECE courses.) The print
(just as with printf) function is terribly slow and masked some of the results
from before. This was removed as recommended and increased performance.
4. Perhaps most interestingly, the threading library significantly hurt
performance for threads > 1 on a multicore machine but not on a single core
machine.
Thanks for the feedback,
Bryan
--
Bryan Harris
Research Engineer
Structures and Materials Evaluation Group
bryan.harris at udri.udayton.edu
http://www.udri.udayton.edu/
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/mailman/private/centraloh/attachments/20091110/9391d2a0/attachment.htm>
-------------- next part --------------
First Run: Dual Physical CPU - Intel(R) Xeon(TM) CPU 2.40GHz
total=$(( 2 ** 16 ));threads=1;while [[ $threads -lt 128 ]]; do repeats=$(( total / threads )); echo threading $threads threads $repeats repeats; time ./mt.py $threads $repeats >mt$threads.log; threads=$(( $threads * 2 )); done
threading 1 threads 65536 repeats
real 0m9.119s
user 0m8.405s
sys 0m0.712s
threading 2 threads 32768 repeats
real 0m11.878s
user 0m8.389s
sys 0m3.788s
threading 4 threads 16384 repeats
real 0m11.485s
user 0m9.701s
sys 0m3.484s
threading 8 threads 8192 repeats
real 0m11.045s
user 0m9.165s
sys 0m2.828s
threading 16 threads 4096 repeats
real 0m10.980s
user 0m9.189s
sys 0m2.712s
threading 32 threads 2048 repeats
real 0m11.146s
user 0m9.393s
sys 0m2.788s
threading 64 threads 1024 repeats
real 0m11.230s
user 0m9.321s
sys 0m2.980s
total=$(( 2 ** 16 ));threads=1;while [[ $threads -lt 128 ]]; do repeats=$(( total / threads )); echo -n processing $threads threads $repeats repeats; time ./mp.py $threads $repeats >mp$threads.log; threads=$(( $threads * 2 )); done
processing 1 threads 65536 repeats
real 0m9.088s
user 0m8.553s
sys 0m0.512s
processing 2 threads 32768 repeats
real 0m5.122s
user 0m8.861s
sys 0m0.940s
processing 4 threads 16384 repeats
real 0m5.282s
user 0m9.821s
sys 0m0.648s
processing 8 threads 8192 repeats
real 0m4.571s
user 0m8.357s
sys 0m0.656s
processing 16 threads 4096 repeats
real 0m4.592s
user 0m8.357s
sys 0m0.736s
processing 32 threads 2048 repeats
real 0m5.014s
user 0m9.005s
sys 0m0.892s
processing 64 threads 1024 repeats
real 0m4.639s
user 0m8.289s
sys 0m0.852s
Run 2:
total=$(( 2 ** 18 ));threads=1;while [[ $threads -lt 257 ]]; do repeats=$(( total / threads )); echo -n threading $threads threads $repeats repeats; time ./mt.py $threads $repeats >mt$threads.log; threads=$(( $threads * 2 )); done
threading 1 threads 262144 repeats
real 0m35.760s
user 0m33.738s
sys 0m1.980s
threading 2 threads 131072 repeats
real 0m47.518s
user 0m38.466s
sys 0m15.277s
threading 4 threads 65536 repeats
real 0m46.009s
user 0m36.322s
sys 0m14.201s
threading 8 threads 32768 repeats
real 0m44.785s
user 0m35.706s
sys 0m12.521s
threading 16 threads 16384 repeats
real 0m44.470s
user 0m37.074s
sys 0m11.521s
threading 32 threads 8192 repeats
real 0m44.572s
user 0m36.998s
sys 0m11.225s
threading 64 threads 4096 repeats
real 0m44.349s
user 0m37.518s
sys 0m11.105s
threading 128 threads 2048 repeats
real 0m44.861s
user 0m37.558s
sys 0m11.377s
threading 256 threads 1024 repeats
real 0m45.165s
user 0m38.122s
sys 0m11.381s
total=$(( 2 ** 18 ));threads=1;while [[ $threads -lt 257 ]]; do repeats=$(( total / threads )); echo -n processing $threads threads $repeats repeats; time ./mp.py $threads $repeats >mp$threads.log; threads=$(( $threads * 2 )); done
processing 1 threads 262144 repeats
real 0m34.568s
user 0m32.498s
sys 0m2.064s
processing 2 threads 131072 repeats
real 0m18.715s
user 0m33.770s
sys 0m3.032s
processing 4 threads 65536 repeats
real 0m17.994s
user 0m33.358s
sys 0m2.488s
processing 8 threads 32768 repeats
real 0m17.991s
user 0m33.298s
sys 0m2.536s
processing 16 threads 16384 repeats
real 0m17.993s
user 0m33.002s
sys 0m2.844s
processing 32 threads 8192 repeats
real 0m19.022s
user 0m34.786s
sys 0m3.044s
processing 64 threads 4096 repeats
real 0m18.731s
user 0m34.406s
sys 0m2.892s
processing 128 threads 2048 repeats
real 0m19.346s
user 0m35.202s
sys 0m3.296s
processing 256 threads 1024 repeats
real 0m18.668s
user 0m33.490s
sys 0m3.624s
Run 3: AMD Phenom(tm) 9600 Quad-Core Processor (Competing with 2 mythcommflag processes)
total=$(( 2 ** 18 ));threads=1;while [[ $threads -lt 257 ]]; do repeats=$(( total / threads )); echo threading $threads threads $repeats repeats; time ./mt.py $threads $repeats >mt$threads.log; threads=$(( $threads * 2 )); done
threading 1 threads 262144 repeats
real 0m18.139s
user 0m16.993s
sys 0m0.796s
threading 2 threads 131072 repeats
real 0m46.050s
user 0m36.606s
sys 0m13.785s
threading 4 threads 65536 repeats
real 0m44.709s
user 0m35.174s
sys 0m9.401s
threading 8 threads 32768 repeats
real 0m53.652s
user 0m35.378s
sys 0m8.573s
threading 16 threads 16384 repeats
real 0m55.168s
user 0m35.638s
sys 0m8.041s
threading 32 threads 8192 repeats
real 1m1.495s
user 0m35.430s
sys 0m7.300s
threading 64 threads 4096 repeats
real 0m52.198s
user 0m35.298s
sys 0m7.896s
threading 128 threads 2048 repeats
real 1m1.661s
user 0m35.478s
sys 0m8.301s
threading 256 threads 1024 repeats
real 1m1.717s
user 0m34.670s
sys 0m9.041s
total=$(( 2 ** 18 ));threads=1;while [[ $threads -lt 257 ]]; do repeats=$(( total / threads )); echo processing $threads threads $repeats repeats; time ./mp.py $threads $repeats >mp$threads.log; threads=$(( $threads * 2 )); done
processing 1 threads 262144 repeats
real 0m17.797s
user 0m16.773s
sys 0m0.796s
processing 2 threads 131072 repeats
real 0m10.568s
user 0m16.973s
sys 0m1.072s
processing 4 threads 65536 repeats
real 0m7.582s
user 0m16.565s
sys 0m1.516s
processing 8 threads 32768 repeats
real 0m9.308s
user 0m20.713s
sys 0m2.096s
processing 16 threads 16384 repeats
real 0m9.569s
user 0m19.985s
sys 0m2.432s
processing 32 threads 8192 repeats
real 0m10.586s
user 0m21.785s
sys 0m3.252s
processing 64 threads 4096 repeats
real 0m10.937s
user 0m22.705s
sys 0m3.136s
processing 128 threads 2048 repeats
real 0m11.237s
user 0m23.065s
sys 0m3.700s
processing 256 threads 1024 repeats
real 0m12.592s
user 0m25.434s
sys 0m5.200s
Run 4: AMD Opteron(tm) Processor 148
total=$(( 2 ** 16 ));threads=1;while [[ $threads -lt 128 ]]; do repeats=$(( total / threads )); echo threading $threads threads $repeats repeats; time ./mt.py $threads $repeats >mt$threads.log; threads=$(( $threads * 2 )); done
threading 1 threads 65536 repeats
real 0m6.207s
user 0m5.784s
sys 0m0.268s
threading 2 threads 32768 repeats
real 0m6.252s
user 0m5.852s
sys 0m0.272s
threading 4 threads 16384 repeats
real 0m6.348s
user 0m5.944s
sys 0m0.276s
threading 8 threads 8192 repeats
real 0m6.430s
user 0m6.040s
sys 0m0.276s
threading 16 threads 4096 repeats
real 0m6.307s
user 0m5.844s
sys 0m0.364s
threading 32 threads 2048 repeats
real 0m6.618s
user 0m5.852s
sys 0m0.368s
threading 64 threads 1024 repeats
real 0m6.678s
user 0m5.816s
sys 0m0.428s
total=$(( 2 ** 16 ));threads=1;while [[ $threads -lt 128 ]]; do repeats=$(( total / threads )); echo processing $threads threads $repeats repeats; time ./mp.py $threads $repeats >mp$threads.log; threads=$(( $threads * 2 )); done
processing 1 threads 65536 repeats
real 0m6.379s
user 0m5.908s
sys 0m0.284s
processing 2 threads 32768 repeats
real 0m6.358s
user 0m5.960s
sys 0m0.276s
processing 4 threads 16384 repeats
real 0m6.039s
user 0m5.612s
sys 0m0.328s
processing 8 threads 8192 repeats
real 0m6.502s
user 0m6.108s
sys 0m0.272s
processing 16 threads 4096 repeats
real 0m6.260s
user 0m5.792s
sys 0m0.352s
processing 32 threads 2048 repeats
real 0m6.693s
user 0m5.964s
sys 0m0.336s
processing 64 threads 1024 repeats
real 0m6.777s
user 0m5.988s
sys 0m0.340s
-------------- next part --------------
A non-text attachment was scrubbed...
Name: mt.py
Type: text/x-python
Size: 886 bytes
Desc: not available
URL: <http://mail.python.org/mailman/private/centraloh/attachments/20091110/9391d2a0/attachment.py>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: mp.py
Type: text/x-python
Size: 910 bytes
Desc: not available
URL: <http://mail.python.org/mailman/private/centraloh/attachments/20091110/9391d2a0/attachment-0001.py>
More information about the CentralOH
mailing list