[Cython] `cdef inline` and typed memory views

mark florisson markflorisson88 at gmail.com
Mon Apr 23 10:39:53 CEST 2012


On 23 April 2012 07:24, Stefan Behnel <stefan_ml at behnel.de> wrote:
> mark florisson, 22.04.2012 22:20:
>> On 21 April 2012 20:17, Dimitri Tcaciuc wrote:
>>> Say I want to factor out inner part of
>>> some N^2 loops over a flow array, I write something like
>>>
>>>  cdef inline float _inner(size_t i, size_t j, float[:] x):
>>>     cdef float d = x[i] - x[j]
>>>     return sqrtf(d * d)
>>>
>>> In 0.16, this actually compiles (as opposed to 0.15 with ndarray) and
>>> function is declared as inline, which is great. However, the
>>> memoryview structure is passed by value:
>>>
>>>  static CYTHON_INLINE float __pyx_f_3foo__inner(size_t __pyx_v_i,
>>> size_t __pyx_v_j, __Pyx_memviewslice __pyx_v_x) {
>>>     ...
>>>
>>> This seems to hinder compiler's (in my case, GCC 4.3.4) ability to
>>> perform efficient inlining (although function does in fact get
>>> inlined). If I manually inline that distance calculation, I get 3x
>>> speedup. (in my case 0.324020147324 vs 1.43209195137 seconds for 10k
>>> elements). When I manually modified generated .c file to pass memory
>>> view slice by pointer, slowdown was eliminated completely.
>>
>> Although it is neither documented nor tested, it works if you just
>> take the address of the memoryview. You can then index it using
>> memoryview_pointer[0][i].
>
> Are you advertising this an an actual feature here? I'm just asking because
> supporting hacks can be nasty in the long run. What if we ever want to make
> a change to the internal way memoryviews work that would break this?
>
> Stefan
> _______________________________________________
> cython-devel mailing list
> cython-devel at python.org
> http://mail.python.org/mailman/listinfo/cython-devel

Yeah, I'm not entirely sure if this is a hack or a feature. It doesn't
really matter how memoryviews are represented, or where they are
stored, as dereferencing the pointer gives you the same situation as
before. The only difference is when a) memoryviews would be relocated
or b) go out of scope.
If we're ever planning to support garbage collection (and I doubt we
are) or if we're ever going to allocate them on the heap and have a
variable-sized representation, a) could be a case. As for b), it's
really the same as automatic C variables. So I suppose I wouldn't be
opposed to officially supporting this.


More information about the cython-devel mailing list