[Cython] `cdef inline` and typed memory views
Dimitri Tcaciuc
dtcaciuc at gmail.com
Sat Apr 21 21:17:52 CEST 2012
Hey everyone,
Congratulations on shipping 0.16! I think I found a problem which
seems pretty straight forward. Say I want to factor out inner part of
some N^2 loops over a flow array, I write something like
cdef inline float _inner(size_t i, size_t j, float[:] x):
cdef float d = x[i] - x[j]
return sqrtf(d * d)
In 0.16, this actually compiles (as opposed to 0.15 with ndarray) and
function is declared as inline, which is great. However, the
memoryview structure is passed by value:
static CYTHON_INLINE float __pyx_f_3foo__inner(size_t __pyx_v_i,
size_t __pyx_v_j, __Pyx_memviewslice __pyx_v_x) {
...
This seems to hinder compiler's (in my case, GCC 4.3.4) ability to
perform efficient inlining (although function does in fact get
inlined). If I manually inline that distance calculation, I get 3x
speedup. (in my case 0.324020147324 vs 1.43209195137 seconds for 10k
elements). When I manually modified generated .c file to pass memory
view slice by pointer, slowdown was eliminated completely.
On a somewhat relevant node, have you considered enabling Issues page on Github?
Thanks!
Dimitri.
More information about the cython-devel
mailing list