[Cython] `cdef inline` and typed memory views

Dimitri Tcaciuc dtcaciuc at gmail.com
Sun Apr 22 23:45:52 CEST 2012

On Sun, Apr 22, 2012 at 1:20 PM, mark florisson
<markflorisson88 at gmail.com> wrote:
> On 21 April 2012 20:17, Dimitri Tcaciuc <dtcaciuc at gmail.com> wrote:
>> Hey everyone,
>> Congratulations on shipping 0.16! I think I found a problem which
>> seems pretty straight forward. Say I want to factor out inner part of
>> some N^2 loops over a flow array, I write something like
>>  cdef inline float _inner(size_t i, size_t j, float[:] x):
>>     cdef float d = x[i] - x[j]
>>     return sqrtf(d * d)
>> In 0.16, this actually compiles (as opposed to 0.15 with ndarray) and
>> function is declared as inline, which is great. However, the
>> memoryview structure is passed by value:
>>  static CYTHON_INLINE float __pyx_f_3foo__inner(size_t __pyx_v_i,
>> size_t __pyx_v_j, __Pyx_memviewslice __pyx_v_x) {
>>     ...
>> This seems to hinder compiler's (in my case, GCC 4.3.4) ability to
>> perform efficient inlining (although function does in fact get
>> inlined). If I manually inline that distance calculation, I get 3x
>> speedup. (in my case 0.324020147324 vs 1.43209195137 seconds for 10k
>> elements). When I manually modified generated .c file to pass memory
>> view slice by pointer, slowdown was eliminated completely.
>> On a somewhat relevant node, have you considered enabling Issues page on Github?
>> Thanks!
>> Dimitri.
>> _______________________________________________
>> cython-devel mailing list
>> cython-devel at python.org
>> http://mail.python.org/mailman/listinfo/cython-devel
> Although it is neither documented nor tested, it works if you just
> take the address of the memoryview. You can then index it using
> memoryview_pointer[0][i]. One should be careful, as taking the pointer
> and passing that around means that pointer is not acquisition counted,
> and will point to invalid memory if the memoryview goes out of scope
> (e.g. if it's a local variable, when you return).

Nice, passing by pointer did the trick! As an observation, I tried
using `cython.operator.dereference(x)` and in this case it's way less
efficient than `x[0]`. Dereferencing actually allocates an empty
memory view slice and copies the contents of `x`, even if the
`dereference(x)` result is never assigned anywhere and is only a
temporary value in the expression.


> Cython could manually inline functions though, which could greatly
> reduce argument passing and unpacking overhead in some situations
> (like buffers).
> _______________________________________________
> cython-devel mailing list
> cython-devel at python.org
> http://mail.python.org/mailman/listinfo/cython-devel

More information about the cython-devel mailing list