[Numpy-discussion] Conversion from C-layout to Fortran-layout in Cython

Nov. 10, 2021

      I've asked this in Cython mailing list but probably I should also get some
feedback here too.

I have the following function defined in Cython and using flat memory
pointers to hold n by n array data.

cdef some_C_layout_func(double[:, :, ::1] Am) nogil: # ... cdef double *
work1 = <double*>malloc(n*n*sizeof(double)) cdef double *work2 = <double*>
malloc(n*n*sizeof(double)) # ... # Lots of C-layout operations here # ...
dgetrs(<char*>'T', &n, &n, &work1[0], &n, &ipiv[0], &work2[0], &n, &info )
dcopy(&n2, &work2[0], &int1, &Am[0, 0, 0], &int1) free(...)

Here, I have done everything in C layout with work1 and work2 but I have to
convert work2 into Fortran layout to be able to solve AX = B. A can be
transposed in Lapack internally via the flag 'T' so the only obstacle I
have now is to shuffle work2 which holds B transpose in the eyes of Fortran
since it is still in C layout.

If I go naively and make loops to get one layout to the other that actually
spoils all the speed benefits from this Cythonization due to cache misses.
In fact 60% of the time is spent in that naive loop across the whole
function. Same goes for the copy_fortran() of memoryviews.

I have measured the regular NumPy np.asfortranarray()  and the performance
is quite good enough compared to the actual linear solve. Hence whatever it
is doing underneath I would like to reach out and do the same possibly via
the C-API. But my C knowledge basically failed me around this line
https://github.com/numpy/numpy/blob/8dbd507fb6c854b362c26a0dd056cd04c9c10f25...

I have found the SO post from
https://stackoverflow.com/questions/45143381/making-a-memoryview-c-contiguou...
but I am not sure if that is the canonical way to do it in newer Python
versions.

Can anyone show me how to go about it without interacting with Python
objects?

Best,
ilhan

[Numpy-discussion] Conversion from C-layout to Fortran-layout in Cython

Ilhan Polat