Hi!
Problem: Using Fortran routines from Python C/API is "tricky" when multidimensional arrays are passed in.
Cause: Arrays in Fortran are stored in columnwise order while arrays in C are stored in rowwise order.
Standard solutions: 1) Create a new C array; copy the data from the old one in columnwise order; pass the new array to fortran; copy changed array back to old one in rowwise order; deallocate the array. 2) Change the storage order of an array in place: elementwise swapping; pass the array to fortran; change the storage order back with elementwise swapping
Why standard solutions are not good? 1) Additional memory allocation, that is problem for large arrays; Elementwise copying is time consuming (2 times). 2) It is good as no extra memory is needed but elementwise swapping (2 times) is approx. equivalent with the elementwise copying (4 times).
Proclamation: Introduce a columnwise array to Numeric Python where data is stored in columnwise order that can be used specifically for fortran routines.
Proposal sketch: 1) Introduce a new flag `row_order'to PyArrayObject structure: row_order == 1 > the data is stored in rowwise order (default, as it is now) row_order == 0 > the data is stored in columnwise order Note that now the concept of contiguousness depends on this flag. 2) Introduce new array "constructors" such as PyArray_CW_FromDims, PyArray_CW_FromDimsAndData, PyArray_CW_ContiguousFromObject, PyArray_CW_CopyFromObject, PyArray_CW_FromObject, etc. that all return arrays with row_order=0 and data stored in columnwise order (that is in case of contiguous results, otherwise strides feature is employd). 3) In order to operations between arrays (possibly with different storage order) would work correctly, many internal functions of NumPy C/API need to be modifyied. 4) anything else?
What is the good of this? 1) The fact is that there is a large number of very good scietific tools freely available written in Fortran (Netlib, for instance). And I don't mean only Fortran 77 codes but also Fortran 90/95 codes. 2) Having Numeric Python arrays with data stored in columnwise order, calling Fortran routines from Python becomes really efficient and spacesaving. 3) There should be little performance hit if, say, two arrays with different storage order are multiplied (compared to the operations between noncontiguous arrays in the current implementation). 4) I don't see any reason why older C/API modules would broke because of this change if it is carried out carefully enough. So, backward compability should be there. 5) anything else?
What are against of this? 1) Lots of work but with current experience it should not be a problem. 2) The size of the code will grow. 3) I suppose that most people using Numerical Python will not care of calling Fortran routines from Python. Possible reasons: too "tricky" or no need. In the first case, the answer is that there are tools such as PyFort, f2py that solve this problem. In the later case, there is no problem:) 4) anything else?
I understand that my proposal is quite radical but taking into account that we want to use Python for many years to come, the use would be more pleasing if one cause of (constant) confusion would be less during this time.
Best regards, Pearu
Proclamation: Introduce a columnwise array to Numeric Python where data is stored in columnwise order that can be used specifically for fortran routines.
This is a very interesting proposal that we should consider carefully. I seem to recall reading that Jim Hugunin originally had this idea in mind when he established the concept of contiguousness, etc.
My current thoughts on this issue are that it is of only syntatic value and seems like a lot of extra code has to be written in order to provide this "userfriendliness." I don't see why it is so confusing to recognize that Fortran just references it's arrays "backwards" (or Python references them backwards  whatever your preference). How you index into an array is an arbitrary decision. Numerical Python and Fortran just have opposite conventions. As long as that is clear, I don't see the real trouble. If the Fortran documentation calls for an array of dimension (M,N,L) you pass it a contiguous Python array of shape (L,N,M)  pretty simple.
Perhaps someone could enlighten me as to why this is more than just a aesthetic problem. Right now, I would prefer that the time spent by someone to "fix" this "problem" went to expanding the availability of easytouse processing routines for Numerical Python, or improving the crossplatform plotting capabilities.
I agree that it can be most confusing when you are talking about matrix math since we are so used to thinking of matrix multiplication as A * B = C with a shape analysis of:
M X N * N X L = M X L
If the matrix multiplacation code is in Fortran, then it expects to get an (M,N) array and a (N,L) array and returns an (M,L) array. But from Python you would pass it arrays with shape (N,M) and (L,N) and get back an (L,M) array which can be confusing to our "understanding" of shape analysis in matrix multiplication:
Python matrix multiplication rule if calling a Fortran routine to do the multiplication:
(N,M) (L,N) = (L, M)
I think a Pythononly class could solve this problem much more easily than changing the underlying Ccode. This new Python Fortranarray class would just make the user think that the shapes were (M,N) and (N,L) and the output shape was (M,L).
For future reference, any arrayprocessing codes that somebody writes should take a strides array as an argument, so that it doesn't matter what "order" the array is in.
Travis
On Wed, 26 Jan 2000, Travis Oliphant wrote:
Proclamation: Introduce a columnwise array to Numeric Python where data is stored in columnwise order that can be used specifically for fortran routines.
This is a very interesting proposal that we should consider carefully. I seem to recall reading that Jim Hugunin originally had this idea in mind when he established the concept of contiguousness, etc.
My current thoughts on this issue are that it is of only syntatic value and seems like a lot of extra code has to be written in order to provide this "userfriendliness." I don't see why it is so confusing to recognize that Fortran just references it's arrays "backwards" (or Python references them backwards  whatever your preference). How you index into an array is an arbitrary decision. Numerical Python and Fortran just have opposite conventions. As long as that is clear, I don't see the real trouble. If the Fortran documentation calls for an array of dimension (M,N,L) you pass it a contiguous Python array of shape (L,N,M)  pretty simple.
Perhaps someone could enlighten me as to why this is more than just a aesthetic problem. Right now, I would prefer that the time spent by someone to "fix" this "problem" went to expanding the availability of easytouse processing routines for Numerical Python,
I think that this expansion would be quicker if the Python/Fortran connection would not introduce this additional question to worry about.
or improving the crossplatform plotting capabilities.
Here I agree with you completely.
I can see the following problems when two different conventions are mixed: 1) if your application Python code is larger than "just an example that demonstrates the correct usage of two different conventions" and it can call other C/API modules that do calculations in C convention then you need some kind of book keeping where your matrices need to be transposed and where not, and where to insert additional code for doing transposition. I think this can be done in lower level and more efficiently than most ordinary users would do anyway. 2) Another but minor drawback of having two conventions is that if you have square matrix that is nonsymmetric, then its misuse would be easy and (may be) difficult to discover.
On the other hand, I completely understand why my proposal would not be implemented  it looks like it needs lots of work and in short term the gain would not be visible to most users.
Pearu
participants (2)

Pearu Peterson

Travis Oliphant