Re: [Numpy-discussion] Python ctypes and OpenMP mystery

16 Feb 2011


      A Saturday 12 February 2011 21:19:39 Eric Carlson escrigué:
...
Hello All,
I have been toying with OpenMP through f2py and ctypes. On the whole,
the results of my efforts have been very encouraging. That said, some
results are a bit perplexing.
I have written identical routines that I run directly as a C-derived
executable, and through ctypes as a shared library. I am running the
tests on a dual-Xeon Ubuntu system with 12 cores and 24 threads. The
C executable is SLIGHTLY faster than the ctypes at lower thread
counts, but the C eventually has a speedup ratio of 12+, while the
python caps off at 7.7, as shown below:
threads C-speedup Python-speedup
1	1	1
2	2.07	1.98
3	3.1	2.96
4	4.11	3.93
5	4.97	4.75
6	5.94	5.54
7	6.83	6.53
8	7.78	7.3
9	8.68	7.68
10	9.62	7.42
11	10.38	7.51
12	10.44	7.26
13	7.19	6.04
14	7.7	5.73
15	8.27	6.03
16	8.81	6.29
17	9.37	6.55
18	9.9	6.67
19	10.36	6.9
20	10.98	7.01
21	11.45	6.97
22	11.92	7.1
23	12.2	7.08
These ratios are quite consistent from 100KB double arrays to 100MB
double arrays, so I do not think it reflects a Python overhead issue.
There is no question the routine is memory bandwidth constrained, and
I feel lucky to squeeze the eventual 12+ ratio, but I am very
perplexed as to why the performance of the Python-invoked routine
seems to cap off.
Does anyone have an explanation for the caps? Am I seeing some effect
from ctypes, or the Python engine, or what?
It is difficult to realize what could be going on by only looking at the 
timings.  Can you attach a small, self-contained benchmark?  Not that I 
can offer a definitive answer, but I'm curious about this.

-- 
Francesc Alted