![](https://secure.gravatar.com/avatar/25139367d600eddefb7f209a52410cf6.jpg?s=120&d=mm&r=g)
This is NOT yet discussed on the Cython list; I wanted to check with more numerical users to see if the issue should even be brought up there. The idea behind the current syntax was to keep things as close as possible to Python/NumPy, and only provide some "hints" to Cython for optimization. My problem with this now is that a) it's too easy to get non-optimized code without a warning by letting in untyped indices, b) I think the whole thing is a bit too "magic" and that it is too unclear what is going on to newcomers (though I'm guessing there). My proposal: Introduce an explicit "buffer syntax": arr = np.zeros(..) cdef int[:,:] buf = arr # 2D buffer Here, buf would be something else than arr; it is a seperate view to the array for low-level purposes. This has certain disadvantages; consider: a1 = np.zeros(...) + 1; a2 = np.zeros(...) + 2 cdef int[:] b1 = a1, b2 = a2 Here, one would need to use b1 and b2 for for-loop arithmetic, but a1 and a2 for vectorized operations and slicing. "b1 + b2" would mean something else and not-NumPy-related (at first disallowed, but see below). "print b1" would likely coerce b1 to a Python memoryview and print "<memoryview ...>" (at least on newer Python versions); one would need to use some function to get b1 back to a NumPy array. Advantages: - More explicit - Leaves a path open in the syntax for introducing low-level slicing and arithmetic as seperate operations in Cython independent of NumPy (think Numexpr compile-time folded into Cython code). - Possible to have some creative syntax like "int[0:]" for disallowing negative wraparound and perhaps even "int[-3:]" for non-zero-based indexing. More details: http://wiki.cython.org/enhancements/buffersyntax (Of course, compatability with existing code will be taken seriously no matter how this plays out!) -- Dag Sverre
![](https://secure.gravatar.com/avatar/d5321459a9b36ca748932987de93e083.jpg?s=120&d=mm&r=g)
Dag Sverre Seljebotn wrote:
This is NOT yet discussed on the Cython list; I wanted to check with more numerical users to see if the issue should even be brought up there.
The idea behind the current syntax was to keep things as close as possible to Python/NumPy, and only provide some "hints" to Cython for optimization. My problem with this now is that a) it's too easy to get non-optimized code without a warning by letting in untyped indices, b) I think the whole thing is a bit too "magic" and that it is too unclear what is going on to newcomers (though I'm guessing there).
These may be issues, but I think keeping "cython -a my_module.pyx" in one's development cycle and inspecting the output will lead to great enlightenment on the part of the Cython user. Perhaps this should be advertised more prominently? I always do this with any Cython-generated code, and it works wonders.
My proposal: Introduce an explicit "buffer syntax":
arr = np.zeros(..) cdef int[:,:] buf = arr # 2D buffer
My initial reaction is that it seems to be a second implementation of buffer interaction Cython, and therefore yet another thing to keep in mind and it's unclear to me how different it would be from the "traditional" Cython numpy ndarray behavior and how the behavior of the two approaches might differ, perhaps in subtle ways. So that's a disadvantage from my perspective. I agree that some of your ideas are advantages, however. Also, it seems it would allow one to (more easily) interact with buffer objects in sophisticated ways without needing the GIL, which is another advantage. Could some or all of this be added to the current numpy buffer implementation, or does it really need the new syntax? Also, is there anything possible with buffer objects that would be limited by the choice of syntax you propose? I imagine this might not work with structured data types (then again, it might...).
![](https://secure.gravatar.com/avatar/25139367d600eddefb7f209a52410cf6.jpg?s=120&d=mm&r=g)
Andrew Straw wrote:
Dag Sverre Seljebotn wrote:
This is NOT yet discussed on the Cython list; I wanted to check with more numerical users to see if the issue should even be brought up there.
The idea behind the current syntax was to keep things as close as possible to Python/NumPy, and only provide some "hints" to Cython for optimization. My problem with this now is that a) it's too easy to get non-optimized code without a warning by letting in untyped indices, b) I think the whole thing is a bit too "magic" and that it is too unclear what is going on to newcomers (though I'm guessing there).
These may be issues, but I think keeping "cython -a my_module.pyx" in one's development cycle and inspecting the output will lead to great enlightenment on the part of the Cython user. Perhaps this should be advertised more prominently? I always do this with any Cython-generated code, and it works wonders.
Well, I do so too (or rather just open the generated C file in emacs, but since I'm working on Cython I'm more used to read that garbage than others I suppose :-) ). But it feels like "one extra step" which must be done. A syntax highlighter in emacs highlighting C operations would help as well though.
My proposal: Introduce an explicit "buffer syntax":
arr = np.zeros(..) cdef int[:,:] buf = arr # 2D buffer
My initial reaction is that it seems to be a second implementation of buffer interaction Cython, and therefore yet another thing to keep in
Well, it would use much of the same implementation of course :-) It's more of a change in the parser and a few rules here and there than anything else.
mind and it's unclear to me how different it would be from the "traditional" Cython numpy ndarray behavior and how the behavior of the two approaches might differ, perhaps in subtle ways. So that's a disadvantage from my perspective. I agree that some of your ideas are advantages, however. Also, it seems it would allow one to (more easily) interact with buffer objects in sophisticated ways without needing the GIL, which is another advantage.
Could some or all of this be added to the current numpy buffer implementation, or does it really need the new syntax?
The new features I mention? Most of it could be wedged into the existing syntax, but at the expense of making things even less clear (or at least I feel so) -- for instance, if we decided that Cython was to take care of operations like "a + b" in a NumExpr-like fashion, then it would mean that all declared buffers would get their Python versions of their arithmetic operators hidden and fixed. To make a point, it would mean that the NumPy matrix object would suddenly get componentwise multiplication when assigned to an ndarray[int] variable! (Granted, my feeling is that the matrix class should be better avoided anyway, but...) Regarding passing buffers to C/Fortran, it's just a matter of coming up with a nice syntax.
Also, is there anything possible with buffer objects that would be limited by the choice of syntax you propose? I imagine this might not work with structured data types (then again, it might...).
mystruct[:,:] should just work, if that is what you mean. It's simply a matter of a) adding something to the parser, and b) disable the current pass-through of "what the buffer cannot handle" to the Python runtime. -- Dag Sverre
![](https://secure.gravatar.com/avatar/86ea939a72cee216b3c076b52f48f338.jpg?s=120&d=mm&r=g)
arr = np.zeros(..) cdef int[:,:] buf = arr # 2D buffer
Here, buf would be something else than arr; it is a seperate view to the array for low-level purposes.
I like your proposal. The reason we use Fortran for numerical computing is that Fortran makes it easy to manipulate arrays. C or C++ sucks terribly for anything related to numerical computing, and arrays in particular. Cython is currently not better than C. The ndarray syntax you added last summer is useless if we need to pass the array or a view/slice to another function. That is almost always the case. While the syntax is there, the overhead is unbearable, and it doesn't even work with cdefs. Thus one is back to working with those pesky C pointers. And they are even more pesky in Cython, because pointer arithmetics is disallowed. Currently, I think the best way to use Cython with numpy is to call PyArray_AsCArray and use the normal C idiom array[i][k][k]. This works well if we define some array type with C99 restriced pointers: ctypedef double *array1D_t "double *restrict" ctypedef double **array2D_t "double *restrict *restrict" ctypedef double ***array3D_t "double *restrict *restrict *restrict" Thus, having Cython emit C99 takes away some of the pain, but its not nearly as nice as Fortran and f2py. Creating a subarray will e.g. be very painful. In Fortran we just slice the array like we do in Python with NumPy, and the compiler takes care of the rest.
- Leaves a path open in the syntax for introducing low-level slicing and arithmetic as seperate operations in Cython independent of NumPy (think Numexpr compile-time folded into Cython code).
Fortran 90/95 does this already, which is a major reason for chosing it for numerical computing. If you have something like this working, I believe many scientists would be happy to retire Fortran. It's not that anyone likes it that much. Anyhow, I don't see myself retiring Fortran and f2py any time soon. Sturla Molden
![](https://secure.gravatar.com/avatar/25139367d600eddefb7f209a52410cf6.jpg?s=120&d=mm&r=g)
Sturla Molden wrote:
arr = np.zeros(..) cdef int[:,:] buf = arr # 2D buffer
Here, buf would be something else than arr; it is a seperate view to the array for low-level purposes.
I like your proposal. The reason we use Fortran for numerical computing is that Fortran makes it easy to manipulate arrays. C or C++ sucks terribly for anything related to numerical computing, and arrays in particular.
What's your take on Blitz++? Around here when you say C++ and numerical in the same sentence, Blitz++ is what they mean.
Cython is currently not better than C. The ndarray syntax you added last summer is useless if we need to pass the array or a view/slice to another function. That is almost always the case. While the syntax is there, the
This can be fixed with the existing syntax: http://trac.cython.org/cython_trac/ticket/177 Introducing this syntax would actually mean less time to focus on "real usability issues" like that. OTOH, if the syntax I propose is superior, it's better to introduce it early in a long-term perspective.
Fortran 90/95 does this already, which is a major reason for chosing it for numerical computing. If you have something like this working, I believe many scientists would be happy to retire Fortran. It's not that anyone likes it that much. Anyhow, I don't see myself retiring Fortran and f2py any time soon.
That's certainly an interesting perspective. Requires a lot of work though :-) -- Dag Sverre
![](https://secure.gravatar.com/avatar/86ea939a72cee216b3c076b52f48f338.jpg?s=120&d=mm&r=g)
On 3/5/2009 8:51 AM, Dag Sverre Seljebotn wrote:
What's your take on Blitz++? Around here when you say C++ and numerical in the same sentence, Blitz++ is what they mean.
I have not looked at it for a long time (8 years or so). It is based on profane C++ templates that makes debugging impossible. The compiler does not emit meaningful diagnostic messages, and very often the compiler cannot tell on which line the error occurred. It was efficient for small arrays if loops could be completely unrolled by the template metaprogram. For large arrays, it produced intermediate arrays as no C++ compiler could do escape analysis.
Introducing this syntax would actually mean less time to focus on "real usability issues" like that. OTOH, if the syntax I propose is superior, it's better to introduce it early in a long-term perspective.
There is not much difference between cdef int[:,:] array and cdef numpy.ndarray[int, dim=2] array except that the latter is a Python object. The only minor issue with that is the GIL. On the other hand, the former is not a Python object, which means it is not garbage collected. S.M.
![](https://secure.gravatar.com/avatar/25139367d600eddefb7f209a52410cf6.jpg?s=120&d=mm&r=g)
Sturla Molden wrote:
Introducing this syntax would actually mean less time to focus on "real usability issues" like that. OTOH, if the syntax I propose is superior, it's better to introduce it early in a long-term perspective.
There is not much difference between
cdef int[:,:] array
and
cdef numpy.ndarray[int, dim=2] array
except that the latter is a Python object. The only minor issue with that is the GIL. On the other hand, the former is not a Python object, which means it is not garbage collected.
As with all syntax, the difference is mostly psychological. The former means "now I need fast access and will want to hit the metal, and will no longer look on my array through a NumPy object but through a buffer view", whether the latter is "let Cython can optimize some of the NumPy operations". About garbage collection, int[:,:] would always be a buffer view unto an underlying object which *would* be garbage collected. I.e. it is *not* stack-allocated memory; so when you do cdef np.int_t[:,:] arr = np.zero((10,10), np.int) then the memory of the array is garbage collected insofar the result of np.zero is. "arr" simply adds a reference to the underlying object (and slices add another reference, and so on). Support for GIL-less programming is on the wanted-list anyway for both syntaxes though; Cython can now when one does something illegal and only let through certain uses of the variable, so both syntaxes works for this. Dag Sverre
![](https://secure.gravatar.com/avatar/af6c39d6943bd4b0e1fde23161e7bb8c.jpg?s=120&d=mm&r=g)
Hi Dag 2009/3/5 Dag Sverre Seljebotn <dagss@student.matnat.uio.no>:
More details: http://wiki.cython.org/enhancements/buffersyntax
Interesting proposal! Am I correct in thinking that you'd have to re-implement a lot of NumPy yourself to get this working? Or are you planning to build on NumPy + C-API? Cheers Stéfan
![](https://secure.gravatar.com/avatar/25139367d600eddefb7f209a52410cf6.jpg?s=120&d=mm&r=g)
Stéfan van der Walt wrote:
Hi Dag
2009/3/5 Dag Sverre Seljebotn <dagss@student.matnat.uio.no>:
More details: http://wiki.cython.org/enhancements/buffersyntax
Interesting proposal! Am I correct in thinking that you'd have to re-implement a lot of NumPy yourself to get this working? Or are you planning to build on NumPy + C-API?
First off, the proposal now is simply to change the syntax for existing features, which would simply disable arithmetic and slicing this time around. Slicing could perhaps happen over summer, but aritmetic would likely not happen for some time. The only point now is that before the syntax and work habit is *too* fixed, one could leave the road more open for it. But yes, to implement that one would need to reimplement parts of NumPy to get it working. But because code would be generated specifically for the situation inline, I think it would be more like reimplementing Numexpr than reimplementing NumPy. I think one could simply invoke Numexpr as a first implementation (and make it an optional Cython plugin). The fact that any performance improvements cannot be done incrementally and transparently though is certainly speaking against the syntax/semantics. -- Dag Sverre
![](https://secure.gravatar.com/avatar/bd813aa01a8be59c71e6c797194beffe.jpg?s=120&d=mm&r=g)
A Thursday 05 March 2009, Dag Sverre Seljebotn escrigué:
But yes, to implement that one would need to reimplement parts of NumPy to get it working. But because code would be generated specifically for the situation inline, I think it would be more like reimplementing Numexpr than reimplementing NumPy. I think one could simply invoke Numexpr as a first implementation (and make it an optional Cython plugin).
At first sight, having a kind of Numexpr kernel inside Cython would be great, but provided that you can already call Numexpr from both Python/Cython, I wonder which would be the advantage to do so. As I see it, it would be better to have: c = numexpr.evaluate("a + b") in the middle of Cython code than just: c = a + b in the sense that the former would allow the programmer to see whether Numexpr is called explicitely or not. One should not forget that Numexpr starts to be competitive only when expressions whose array operands+result sizes are around the CPU cache size or larger (unless transcental functions are used and local Numexpr has support for Intel VML, in which case this size can be substantially lower). So, getting Numexpr (or the Cython implementation of it) automatically called for *every* expression should be not necessarily a Good Thing, IMO. Cheers, -- Francesc Alted
![](https://secure.gravatar.com/avatar/25139367d600eddefb7f209a52410cf6.jpg?s=120&d=mm&r=g)
Francesc Alted wrote:
A Thursday 05 March 2009, Dag Sverre Seljebotn escrigué:
But yes, to implement that one would need to reimplement parts of NumPy to get it working. But because code would be generated specifically for the situation inline, I think it would be more like reimplementing Numexpr than reimplementing NumPy. I think one could simply invoke Numexpr as a first implementation (and make it an optional Cython plugin).
At first sight, having a kind of Numexpr kernel inside Cython would be great, but provided that you can already call Numexpr from both Python/Cython, I wonder which would be the advantage to do so. As I see it, it would be better to have:
c = numexpr.evaluate("a + b")
in the middle of Cython code than just:
c = a + b
in the sense that the former would allow the programmer to see whether Numexpr is called explicitely or not.
The former would need to invoke the parser etc., which one would *not* need to do when one has the Cython compilation step. When I mention numexpr it is simply because there's gone work in it already to optimize these things; that experience could hopefully be kept, while discarding the parser and opcode system. I know too little about these things, but look: Cython can relatively easily transform things like cdef int[:,:] a = ..., b = ... c = a + b * b into a double for-loop with c[i,j] = a[i,j] + b[i,j] * b[i,j] at its core. A little more work could have it iterate the smallest dimension innermost dynamically (in strided mode). If a and b are declared as contiguous arrays and "restrict", I suppose the C compiler could do the most efficient thing in a lot of cases? (I.e. "cdef restrict int[:,:,"c"]" or similar) However if one has a strided array, numexpr could still give an advantage over such a loop. Or? But anyway, this is easily one year ahead of us, unless more numerical Cython developers show up. -- Dag Sverre
![](https://secure.gravatar.com/avatar/bd813aa01a8be59c71e6c797194beffe.jpg?s=120&d=mm&r=g)
A Thursday 05 March 2009, Dag Sverre Seljebotn escrigué:
At first sight, having a kind of Numexpr kernel inside Cython would be great, but provided that you can already call Numexpr from both Python/Cython, I wonder which would be the advantage to do so. As I see it, it would be better to have:
c = numexpr.evaluate("a + b")
in the middle of Cython code than just:
c = a + b
in the sense that the former would allow the programmer to see whether Numexpr is called explicitely or not.
The former would need to invoke the parser etc., which one would *not* need to do when one has the Cython compilation step.
Ah, yes. That's a good point.
When I mention numexpr it is simply because there's gone work in it already to optimize these things; that experience could hopefully be kept, while discarding the parser and opcode system.
I know too little about these things, but look:
Cython can relatively easily transform things like
cdef int[:,:] a = ..., b = ... c = a + b * b
into a double for-loop with c[i,j] = a[i,j] + b[i,j] * b[i,j] at its core. A little more work could have it iterate the smallest dimension innermost dynamically (in strided mode).
If a and b are declared as contiguous arrays and "restrict", I suppose the C compiler could do the most efficient thing in a lot of cases? (I.e. "cdef restrict int[:,:,"c"]" or similar)
Agreed.
However if one has a strided array, numexpr could still give an advantage over such a loop. Or?
Well, I suppose that, provided that Cython could perform the for-loop transformation, giving support for strided arrays would be relatively trivial, and the performance would be similar than numexpr in this case. The case for unaligned arrays would a bit different, as the next trick is used: whenever an unaligned array is detected, a new 'copy' opcode is issued so that, for each data block, a copy is done in order to make the data aligned. As the block sizes are chosen to fit easily in CPU's level-1 cache, this copy operation is done very fast and impacts rather little on performance. As I see it, this would be the only situation that would be more complicated to implement natively in Cython because it requires non-trivial code for both blocking and handle opcodes. However, for most of situations, my guess is that unaligned array operands do not appear, so perhaps the unaligned case optimization would not be so important for implementing it Cython.
But anyway, this is easily one year ahead of us, unless more numerical Cython developers show up.
Cheers, -- Francesc Alted
![](https://secure.gravatar.com/avatar/bd813aa01a8be59c71e6c797194beffe.jpg?s=120&d=mm&r=g)
A Thursday 05 March 2009, Francesc Alted escrigué:
Well, I suppose that, provided that Cython could perform the for-loop transformation, giving support for strided arrays would be relatively trivial, and the performance would be similar than numexpr in this case.
Mmh, perhaps not so trivial, because that implies that the stride of an array should be known in compilation time, and that would require a new qualifier when declaring the array. Tricky... Cheers, -- Francesc Alted
![](https://secure.gravatar.com/avatar/25139367d600eddefb7f209a52410cf6.jpg?s=120&d=mm&r=g)
Francesc Alted wrote:
A Thursday 05 March 2009, Francesc Alted escrigué:
Well, I suppose that, provided that Cython could perform the for-loop transformation, giving support for strided arrays would be relatively trivial, and the performance would be similar than numexpr in this case.
Mmh, perhaps not so trivial, because that implies that the stride of an array should be known in compilation time, and that would require a new qualifier when declaring the array. Tricky...
No, one could do the same thing that NumPy does (I think, never looked into it in detail), i.e: decide on dimension to do innermost dynamically from strides and sizes save the stride in that dimension for each array for loop using n-dimensional iterator with larger per-loop overhead: save offsets for loop on the innermost dimension with lower per-loop overhead: component-wise operation using offsets and innermost strides Dag Sverre
![](https://secure.gravatar.com/avatar/bd813aa01a8be59c71e6c797194beffe.jpg?s=120&d=mm&r=g)
A Thursday 05 March 2009, Dag Sverre Seljebotn escrigué:
No, one could do the same thing that NumPy does (I think, never looked into it in detail), i.e:
decide on dimension to do innermost dynamically from strides and sizes save the stride in that dimension for each array for loop using n-dimensional iterator with larger per-loop overhead: save offsets for loop on the innermost dimension with lower per-loop overhead: component-wise operation using offsets and innermost strides
I see. Yes, it seems definitely doable. However, I don't understand very well when you say that you have to "decide on dimension to do innermost dynamically". For me, this dimension should always be the trailing dimension, in order to maximize the locality of data. Or I'm missing something? -- Francesc Alted
![](https://secure.gravatar.com/avatar/25139367d600eddefb7f209a52410cf6.jpg?s=120&d=mm&r=g)
Francesc Alted wrote:
A Thursday 05 March 2009, Dag Sverre Seljebotn escrigué:
No, one could do the same thing that NumPy does (I think, never looked into it in detail), i.e:
decide on dimension to do innermost dynamically from strides and sizes save the stride in that dimension for each array for loop using n-dimensional iterator with larger per-loop overhead: save offsets for loop on the innermost dimension with lower per-loop overhead: component-wise operation using offsets and innermost strides
I see. Yes, it seems definitely doable. However, I don't understand very well when you say that you have to "decide on dimension to do innermost dynamically". For me, this dimension should always be the trailing dimension, in order to maximize the locality of data. Or I'm missing something?
For a transposed array (or Fortran-ordered one) it will be the leading. Not sure whether it is possible with other kinds of views (where e.g. a middle dimension varies fastest), but the NumPy model doesn't preclude it and I suppose it would be possible with stride_tricks. Dag Sverre
![](https://secure.gravatar.com/avatar/bd813aa01a8be59c71e6c797194beffe.jpg?s=120&d=mm&r=g)
A Thursday 05 March 2009, Dag Sverre Seljebotn escrigué:
Francesc Alted wrote:
A Thursday 05 March 2009, Dag Sverre Seljebotn escrigué:
No, one could do the same thing that NumPy does (I think, never looked into it in detail), i.e:
decide on dimension to do innermost dynamically from strides and sizes save the stride in that dimension for each array for loop using n-dimensional iterator with larger per-loop overhead: save offsets for loop on the innermost dimension with lower per-loop overhead: component-wise operation using offsets and innermost strides
I see. Yes, it seems definitely doable. However, I don't understand very well when you say that you have to "decide on dimension to do innermost dynamically". For me, this dimension should always be the trailing dimension, in order to maximize the locality of data. Or I'm missing something?
For a transposed array (or Fortran-ordered one) it will be the leading.
Good point. I was not aware of this subtlity. In fact, numexpr does not get well with transposed views of NumPy arrays. Filed the bug in: http://code.google.com/p/numexpr/issues/detail?id=18
Not sure whether it is possible with other kinds of views (where e.g. a middle dimension varies fastest), but the NumPy model doesn't preclude it and I suppose it would be possible with stride_tricks.
Middle dimensions varying first? Oh my! :) -- Francesc Alted
![](https://secure.gravatar.com/avatar/86ea939a72cee216b3c076b52f48f338.jpg?s=120&d=mm&r=g)
A Thursday 05 March 2009, Dag Sverre Seljebotn escrigué:
Good point. I was not aware of this subtlity. In fact, numexpr does not get well with transposed views of NumPy arrays. Filed the bug in:
http://code.google.com/p/numexpr/issues/detail?id=18
Not sure whether it is possible with other kinds of views (where e.g. a middle dimension varies fastest), but the NumPy model doesn't preclude it and I suppose it would be possible with stride_tricks.
Middle dimensions varying first? Oh my! :)
I cannot see any obvious justification for letting the middle dimension varying first. C ordering is natural if we work with a "pointer to an array of pointers" or an "array of arrays", which in both cases will be indexed as array[i][j] in C: array[i][j] = (array[i])[j] = *(array[i]+j) = *(*(array+i)+j) While C has arrays and pointers, the difference is almost never visible to the programmer. This has lead some to erroneously believe that "C has no arrays, only pointers". However: double array[512]; double *p = array; Now sizeof(array) will be sizeof(double)*512, whereas sizeof(p) will be sizeof(long). This is one of very few cases where C arrays and pointers behave differently, but demonstrates the existence of arrays in C. The justification for Fortran ordering is in the mathematics. Say we have a set of linear equations A * X = B and are going to solve for X, using some magical subroutine 'solve'. The most efficient way to store these arrays becomes the Fortran ordering. That is, call solve(A, X, B) will be mathematically equivalent to the loop do i = i, n call solve(A, X(:,i), B) end do All the arrays in the call to solve are still kept contigous! This would not be the case with C ordering, which is an important reason that C sucks so much for numerical computing. To write efficient linear algebra in C, we have to store matrices in memory transposed to how they appear in mathematical equations. In fact, Matlab uses Fortran ordering because of this. While C ordering feels natural to computer scientists, who loves the beauty of pointer and array symmetries, it is a major obstacle for scientists and engineers from other fields. It is perhaps the most important reason why numerical code written in Fortran tend to be the faster: If a matrix is rank n x m in the equation, it should be rank n x m in the program as well, right? Not so with C. The better performance of Fortran for numerical code is often blamed on pointer aliasing in C. I believe wrong memory layout by the hands of the programmer is actually the more important reason. In fact, whenever I have done comparisons, the C compiler has always produced the faster machine code (gcc vs. gfortran or g77, icc vs. ifort). But to avoid the pitfall, one must be aware of it. And when a programmer's specialization is in another field, this is usually not the case. Most scientists doing some casual C, Java or Python programming fall straight into the trap. That is also why I personally feel it was a bad choice to let C ordering be the default in NumPy. S.M.
![](https://secure.gravatar.com/avatar/9820b5956634e5bbad7f4ed91a232822.jpg?s=120&d=mm&r=g)
Sturla Molden wrote:
The justification for Fortran ordering is in the mathematics. Say we have a set of linear equations
A * X = B
and are going to solve for X, using some magical subroutine 'solve'. The most efficient way to store these arrays becomes the Fortran ordering. That is,
call solve(A, X, B)
will be mathematically equivalent to the loop
do i = i, n call solve(A, X(:,i), B) end do
All the arrays in the call to solve are still kept contigous! This would not be the case with C ordering, which is an important reason that C sucks so much for numerical computing. To write efficient linear algebra in C, we have to store matrices in memory transposed to how they appear in mathematical equations. In fact, Matlab uses Fortran ordering because of this.
I don't think that's true: I am pretty sure matlab follows fortran ordering because matlab was born as a "shell" around BLAS, LINPACK and co. I don't understand your argument about Row vs Column matters: which one is best depends on your linear algebra equations. You give an example where Fortran is better, but I can find example where C order will be more appropriate. Most of the time, for anything non trivial, which one is best depends on the dimensions of the problem (Kalman filtering in high dimensional spaces for example), because some parts of the equations are better handled in a row-order fashion, and some other parts in a column order fashion. I don't know whether this is true, but I have read several times that the column order in Fortran is historical and due to some specificities of the early IBM - but I have of course no idea what the hardware in 1954 looks like from a programming point of view :) This does not prevent it from being a happy accident with regard to performances, though. cheers, David
![](https://secure.gravatar.com/avatar/bd813aa01a8be59c71e6c797194beffe.jpg?s=120&d=mm&r=g)
A Thursday 05 March 2009, David Cournapeau escrigué:
I don't understand your argument about Row vs Column matters: which one is best depends on your linear algebra equations. You give an example where Fortran is better, but I can find example where C order will be more appropriate. Most of the time, for anything non trivial, which one is best depends on the dimensions of the problem (Kalman filtering in high dimensional spaces for example), because some parts of the equations are better handled in a row-order fashion, and some other parts in a column order fashion.
Yeah. Yet another (simple) example coming from linear algebra: a matrix multiplied by a vector. Given a (matrix): a = [[0,1,2], [3,4,5], [6,7,8]] and b (vector): b = [[1], [2], [3]] the most intuitive way to do the multiplication is to take the 1st row of a and do a dot product against b, repeating the process for 2nd and 3rd rows of a. C order coincides with this rule, and it is optimal from the point of view of memory access, while Fortran order is not. -- Francesc Alted
![](https://secure.gravatar.com/avatar/ab7e74f2443b81e5175638d72be65e07.jpg?s=120&d=mm&r=g)
2009/3/5 Francesc Alted <faltet@pytables.org>:
A Thursday 05 March 2009, Francesc Alted escrigué:
Well, I suppose that, provided that Cython could perform the for-loop transformation, giving support for strided arrays would be relatively trivial, and the performance would be similar than numexpr in this case.
Mmh, perhaps not so trivial, because that implies that the stride of an array should be known in compilation time, and that would require a new qualifier when declaring the array. Tricky...
Not necessarily. You can transform a[1,2,3] into *(a.data + 1*a.strides[0] + 2*a.strides[1] + 3*a.strides[2]) without any need for static information beyond that a is 3-dimensional. This would already be valuable, though perhaps you'd want to be able to declare that a particular dimension had stride 1 to simplify things. You could then use this same implementation to add automatic iteration. Anne
Cheers,
-- Francesc Alted _______________________________________________ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
![](https://secure.gravatar.com/avatar/86ea939a72cee216b3c076b52f48f338.jpg?s=120&d=mm&r=g)
On 3/5/2009 10:11 AM, Dag Sverre Seljebotn wrote:
Cython can relatively easily transform things like
cdef int[:,:] a = ..., b = ... c = a + b * b
Now you are wandering far into Fortran territory...
If a and b are declared as contiguous arrays and "restrict", I suppose the C compiler could do the most efficient thing in a lot of cases? (I.e. "cdef restrict int[:,:,"c"]" or similar)
A Fortran compiler can compile a vectorized expression like a = b*c(i,:) + sin(k) into do j=1,n a(j) = b(j)*c(i,j) + sin(k(j)) end do The compiler do this because Fortran has strict rules prohibiting aliasing, and because the instrinsic function sin is declared 'elemental'. On the other hand, if the expression contains functions not declared 'elemental' or 'pure', there may be side effects, and temporary copies must be made. The same could happen if the expression contained variables declared 'pointer', in which case it could contain aliases. I think in the case of numexpr, it assumes that NumPy ufuncs are elemental like Fortran intrinsics. Matlab's JIT compiler works with the assumption that arrays are inherently immutable (everything has copy-on-write semantics). That makes life easier.
However if one has a strided array, numexpr could still give an advantage over such a loop. Or?
Fortran compilers often makes copies of strided arrays. The trick is to make sure the working arrays fit in cache. Again, this is safe when the expression only contains 'elemental' or 'pure' functions. Fortran also often does "copy-in copy-out" if a function is called with a strided array as argument. S.M.
![](https://secure.gravatar.com/avatar/b4929294417e9ac44c17967baae75a36.jpg?s=120&d=mm&r=g)
Hi,
The idea behind the current syntax was to keep things as close as possible to Python/NumPy, and only provide some "hints" to Cython for optimization. My problem with this now is that a) it's too easy to get non-optimized code without a warning by letting in untyped indices, b) I think the whole thing is a bit too "magic" and that it is too unclear what is going on to newcomers (though I'm guessing there).
My proposal: Introduce an explicit "buffer syntax":
arr = np.zeros(..) cdef int[:,:] buf = arr # 2D buffer
I like this proposal a lot; it seems a great deal clearer to me than the earlier syntax; it helps me think of the new Cython thing that I have in a different and more natural way. Best, Matthew
participants (8)
-
Andrew Straw
-
Anne Archibald
-
Dag Sverre Seljebotn
-
David Cournapeau
-
Francesc Alted
-
Matthew Brett
-
Sturla Molden
-
Stéfan van der Walt