[Numpy-discussion] OT: performance in C extension; OpenMP, or SSE ?
Eric Carlson
ecarlson at eng.ua.edu
Wed Feb 16 20:12:27 EST 2011
Sebastian,
Optimization appears to be important here. I used no optimization in my
previous post, so you could try the -O3 compile option:
gcc -O3 -c my_lib.c -fPIC -fopenmp -ffast-math
for na=329 and nb=340 I get (about 7.5 speedup)
c_threads 1 time 0.00103106021881
c_threads 2 time 0.000528309345245
c_threads 3 time 0.000362541675568
c_threads 4 time 0.00028993844986
c_threads 5 time 0.000287840366364
c_threads 6 time 0.000264899730682
c_threads 7 time 0.000244019031525
c_threads 8 time 0.000242137908936
c_threads 9 time 0.000232398509979
c_threads 10 time 0.000227460861206
c_threads 11 time 0.00021938085556
c_threads 12 time 0.000216970443726
c_threads 13 time 0.000215198993683
c_threads 14 time 0.00021940946579
c_threads 15 time 0.000204219818115
c_threads 16 time 0.000216958522797
c_threads 17 time 0.000219728946686
c_threads 18 time 0.000199990272522
c_threads 19 time 0.000157492160797
c_threads 20 time 0.000171000957489
c_threads 21 time 0.000147500038147
c_threads 22 time 0.000141770839691
c_threads 23 time 0.000137741565704
for na=3290 and nb=3400 (about 11.5 speedup)
c_threads 1 time 0.100258581638
c_threads 2 time 0.0501346611977
c_threads 3 time 0.0335096096992
c_threads 4 time 0.0253720903397
c_threads 5 time 0.0208190107346
c_threads 6 time 0.0173784399033
c_threads 7 time 0.0148811817169
c_threads 8 time 0.0130474209785
c_threads 9 time 0.011598110199
c_threads 10 time 0.0104278612137
c_threads 11 time 0.00950778007507
c_threads 12 time 0.00870131969452
c_threads 13 time 0.015882730484
c_threads 14 time 0.0148504400253
c_threads 15 time 0.0139465212822
c_threads 16 time 0.0130301308632
c_threads 17 time 0.012240819931
c_threads 18 time 0.011567029953
c_threads 19 time 0.0109891605377
c_threads 20 time 0.0104281497002
c_threads 21 time 0.00992572069168
c_threads 22 time 0.00957406997681
c_threads 23 time 0.00936627149582
for na=329 and nb=340, cdist comes in at 0.00111914873123 which is
1.085x slower than the c version for my system.
for na=3290 and nb=3400 cdist gives 0.143441538811
Cheers,
Eric
More information about the NumPy-Discussion
mailing list