[Numpy-discussion] Re: Conversion from C-layout to Fortran-layout in Cython

Nov. 10, 2021


      Indeed for matrix multiplication and many other L3 BLAS functions, we are
lucky however for linear solve function ?getrs unfortunately no avail.

On Thu, Nov 11, 2021 at 12:31 AM Benjamin Root <ben.v.root@gmail.com> wrote:
...
I have found that a bunch of lapack functions seem to have arguments for
stating whether or not the given arrays are C or F ordered. Then you
wouldn't need to worry about handling the layout yourself. For example, I
have some C++ code like so:
extern "C" {
/**
 * Forward declaration for LAPACK's Fortran dgemm function to allow use in
C/C++ code.
 *
 * This function is used for matrix multiplication between two arrays of
doubles.
 *
 * For complete reference:
http://www.netlib.org/lapack/explore-html/d1/d54/group__double__blas__level3...
 */
void dgemm_(const char* TRANSA, const char* TRANSB, const int* M, const
int* N, const int* K,
    const double* ALPHA, const double* A, const int* LDA, const double* B,
const int* LDB,
    const double* BETA, double* C, const int* LDC);
}
...
dgemm_("C", "C", &nLayers, &N, &nVariables, &alpha, matrices.IW->data(),
&nVariables,
            inputs.data(), &N, &beta, intermediate.data(), &nLayers);
(in this case, I was using boost multiarrays, but the basic idea is the
same). IIRC, a bunch of other lapack functions had similar features.
I hope this is helpful.
Ben Root
On Wed, Nov 10, 2021 at 6:02 PM Ilhan Polat <ilhanpolat@gmail.com> wrote:
...
I've asked this in Cython mailing list but probably I should also get
some feedback here too.
I have the following function defined in Cython and using flat memory
pointers to hold n by n array data.
cdef some_C_layout_func(double[:, :, ::1] Am) nogil: # ... cdef double *
work1 = <double*>malloc(n*n*sizeof(double)) cdef double *work2 = <double
*>malloc(n*n*sizeof(double)) # ... # Lots of C-layout operations here #
... dgetrs(<char*>'T', &n, &n, &work1[0], &n, &ipiv[0], &work2[0], &n, &
info ) dcopy(&n2, &work2[0], &int1, &Am[0, 0, 0], &int1) free(...)
Here, I have done everything in C layout with work1 and work2 but I have
to convert work2 into Fortran layout to be able to solve AX = B. A can be
transposed in Lapack internally via the flag 'T' so the only obstacle I
have now is to shuffle work2 which holds B transpose in the eyes of Fortran
since it is still in C layout.
If I go naively and make loops to get one layout to the other that
actually spoils all the speed benefits from this Cythonization due to cache
misses. In fact 60% of the time is spent in that naive loop across the
whole function. Same goes for the copy_fortran() of memoryviews.
I have measured the regular NumPy np.asfortranarray()  and the
performance is quite good enough compared to the actual linear solve. Hence
whatever it is doing underneath I would like to reach out and do the same
possibly via the C-API. But my C knowledge basically failed me around this
line
https://github.com/numpy/numpy/blob/8dbd507fb6c854b362c26a0dd056cd04c9c10f25...
I have found the SO post from
https://stackoverflow.com/questions/45143381/making-a-memoryview-c-contiguou...
but I am not sure if that is the canonical way to do it in newer Python
versions.
Can anyone show me how to go about it without interacting with Python
objects?
Best,
ilhan
_______________________________________________
NumPy-Discussion mailing list -- numpy-discussion@python.org
To unsubscribe send an email to numpy-discussion-leave@python.org
https://mail.python.org/mailman3/lists/numpy-discussion.python.org/
Member address: ben.v.root@gmail.com
_______________________________________________
NumPy-Discussion mailing list -- numpy-discussion@python.org
To unsubscribe send an email to numpy-discussion-leave@python.org
https://mail.python.org/mailman3/lists/numpy-discussion.python.org/
Member address: ilhanpolat@gmail.com