[Numpy-discussion] Python ctypes and OpenMP mystery
Francesc Alted
faltet at pytables.org
Wed Feb 16 10:57:03 EST 2011
A Saturday 12 February 2011 21:19:39 Eric Carlson escrigué:
> Hello All,
> I have been toying with OpenMP through f2py and ctypes. On the whole,
> the results of my efforts have been very encouraging. That said, some
> results are a bit perplexing.
>
> I have written identical routines that I run directly as a C-derived
> executable, and through ctypes as a shared library. I am running the
> tests on a dual-Xeon Ubuntu system with 12 cores and 24 threads. The
> C executable is SLIGHTLY faster than the ctypes at lower thread
> counts, but the C eventually has a speedup ratio of 12+, while the
> python caps off at 7.7, as shown below:
>
> threads C-speedup Python-speedup
> 1 1 1
> 2 2.07 1.98
> 3 3.1 2.96
> 4 4.11 3.93
> 5 4.97 4.75
> 6 5.94 5.54
> 7 6.83 6.53
> 8 7.78 7.3
> 9 8.68 7.68
> 10 9.62 7.42
> 11 10.38 7.51
> 12 10.44 7.26
> 13 7.19 6.04
> 14 7.7 5.73
> 15 8.27 6.03
> 16 8.81 6.29
> 17 9.37 6.55
> 18 9.9 6.67
> 19 10.36 6.9
> 20 10.98 7.01
> 21 11.45 6.97
> 22 11.92 7.1
> 23 12.2 7.08
>
> These ratios are quite consistent from 100KB double arrays to 100MB
> double arrays, so I do not think it reflects a Python overhead issue.
> There is no question the routine is memory bandwidth constrained, and
> I feel lucky to squeeze the eventual 12+ ratio, but I am very
> perplexed as to why the performance of the Python-invoked routine
> seems to cap off.
>
> Does anyone have an explanation for the caps? Am I seeing some effect
> from ctypes, or the Python engine, or what?
It is difficult to realize what could be going on by only looking at the
timings. Can you attach a small, self-contained benchmark? Not that I
can offer a definitive answer, but I'm curious about this.
--
Francesc Alted
More information about the NumPy-Discussion
mailing list