help speeding up a Runge-Kuta algorithm (cython, f2py, ...)
I need help speeding up some code I wrote to perform a Runge-Kuta integration. I need to do the integration as part of a real-time control algorithm, so it needs to be fairly fast. scipy.integrate.odeint does too much error checking to be fast enough. My pure Python version was just a little too slow, so I tried coding it up in Cython. I have only used Cython once before, so I don't know if I did it correctly (the .pyx file is attached). The code runs just fine, but there is almost no speed up. I think the core issue is that my dxdt_runge_kuta function gets called about 4000 times per second, so most of my overhead is in the function calls (I think). I am running my real-time control algorithm at 500 Hz and I need at least 2 Runge-Kuta integration steps per real-time steps for numeric stability. And the Runge-Kuta algorithm needs to evaluate the derivative 4 times per times step. So, 500 Hz * 2 * 4 = 4000 calls per second. I also tried coding this up in fortran and using f2py, but I am getting a type mismatch error I don't understand. I have a function that declares its return values as double precision: double precision function dzdt(x,voltage) and I declare the variable I want to store the returned value in to also be double precision: double precision F,z,vel,accel,zdot1,zdot2,zdot3,zdot4 zdot1 = dzdt(x_prev,volts) but some how it is not happy. My C skills are pretty weak (the longer I use Python, the more C I forget, and I didn't know that much to start with). I started looking into Boost as well as using f2py on C code, but I got stuck. Can anyone either make my Cython or Fortran approaches work or point me in a different direction? Thanks, Ryan
On 8/3/2012 11:02 AM, Ryan Krauss wrote:
I need help speeding up some code I wrote to perform a Runge-Kuta integration. I need to do the integration as part of a real-time control algorithm, so it needs to be fairly fast. scipy.integrate.odeint does too much error checking to be fast enough. My pure Python version was just a little too slow, so I tried coding it up in Cython. I have only used Cython once before, so I don't know if I did it correctly (the .pyx file is attached).
The code runs just fine, but there is almost no speed up. I think the core issue is that my dxdt_runge_kuta function gets called about 4000 times per second, so most of my overhead is in the function calls (I think). I am running my real-time control algorithm at 500 Hz and I need at least 2 Runge-Kuta integration steps per real-time steps for numeric stability. And the Runge-Kuta algorithm needs to evaluate the derivative 4 times per times step. So, 500 Hz * 2 * 4 = 4000 calls per second.
I also tried coding this up in fortran and using f2py, but I am getting a type mismatch error I don't understand. I have a function that declares its return values as double precision:
double precision function dzdt(x,voltage)
and I declare the variable I want to store the returned value in to also be double precision:
double precision F,z,vel,accel,zdot1,zdot2,zdot3,zdot4
zdot1 = dzdt(x_prev,volts)
but some how it is not happy.
I'm not much of a Fortran programmer and I may misunderstand the above, but have you tried adding dzdt to your double precision declaration?
My C skills are pretty weak (the longer I use Python, the more C I forget, and I didn't know that much to start with). I started looking into Boost as well as using f2py on C code, but I got stuck.
Can anyone either make my Cython or Fortran approaches work or point me in a different direction?
Thanks,
Ryan
_______________________________________________ SciPy-User mailing list SciPy-User@scipy.org http://mail.scipy.org/mailman/listinfo/scipy-user
03.08.2012 19:02, Ryan Krauss kirjoitti: [clip]
Can anyone either make my Cython or Fortran approaches work or point me in a different direction?
Regarding Cython: run cython -a runge_kuta.pyx and check the created HTML file. Slow points are highlighted with yellow. Regarding this case: - `cdef`, not `def` for the dxdt_* function - from libc.math import exp - Do not use small numpy arrays inside loops. Use C constructs instead. - Use @cython.cdivision(True), @cython.boundscheck(False) PS. Runge-Kutta -- Pauli Virtanen
Thanks for the suggestions.
- Do not use small numpy arrays inside loops. Use C constructs instead.
This is where I ran into trouble with my knowledge of C. I have several 3x1 arrays that I need to pass into the dxdt function, multiply by scalars, and add together. I don't know how to do that cleanly in C. For example: x_out = x_prev + 1.0/6*(g1 + 2*g2 + 2*g3 + g4) where x_prev, g1, g2, g3, and g4 are all 3x1. A little googling lead me to valarray's, but I don't know if that is the best approach or how to use them within Cython. How would you do basic math on small arrays in pure C? On Fri, Aug 3, 2012 at 1:56 PM, Pauli Virtanen <pav@iki.fi> wrote:
03.08.2012 19:02, Ryan Krauss kirjoitti: [clip]
Can anyone either make my Cython or Fortran approaches work or point me in a different direction?
Regarding Cython: run
cython -a runge_kuta.pyx
and check the created HTML file. Slow points are highlighted with yellow.
Regarding this case:
- `cdef`, not `def` for the dxdt_* function
- from libc.math import exp
- Do not use small numpy arrays inside loops. Use C constructs instead.
- Use @cython.cdivision(True), @cython.boundscheck(False)
PS. Runge-Kutta
-- Pauli Virtanen
_______________________________________________ SciPy-User mailing list SciPy-User@scipy.org http://mail.scipy.org/mailman/listinfo/scipy-user
Fortran, so fast, yet so painful. Once I got it working, it was 94 times faster than my pure Python version. Thanks to Jim and Pauli for helping me find my error. Ironically, I was thinking like a C programmer. Just because a Fortran function declares its return value data type doesn't mean all calling functions or subroutines will know the data of the function when they call it. I am still open to Cython suggestions. I don't want to bring more F77 code into the world..... On Fri, Aug 3, 2012 at 2:16 PM, Ryan Krauss <ryanlists@gmail.com> wrote:
Thanks for the suggestions.
- Do not use small numpy arrays inside loops. Use C constructs instead.
This is where I ran into trouble with my knowledge of C. I have several 3x1 arrays that I need to pass into the dxdt function, multiply by scalars, and add together. I don't know how to do that cleanly in C. For example: x_out = x_prev + 1.0/6*(g1 + 2*g2 + 2*g3 + g4) where x_prev, g1, g2, g3, and g4 are all 3x1.
A little googling lead me to valarray's, but I don't know if that is the best approach or how to use them within Cython.
How would you do basic math on small arrays in pure C?
On Fri, Aug 3, 2012 at 1:56 PM, Pauli Virtanen <pav@iki.fi> wrote:
03.08.2012 19:02, Ryan Krauss kirjoitti: [clip]
Can anyone either make my Cython or Fortran approaches work or point me in a different direction?
Regarding Cython: run
cython -a runge_kuta.pyx
and check the created HTML file. Slow points are highlighted with yellow.
Regarding this case:
- `cdef`, not `def` for the dxdt_* function
- from libc.math import exp
- Do not use small numpy arrays inside loops. Use C constructs instead.
- Use @cython.cdivision(True), @cython.boundscheck(False)
PS. Runge-Kutta
-- Pauli Virtanen
_______________________________________________ SciPy-User mailing list SciPy-User@scipy.org http://mail.scipy.org/mailman/listinfo/scipy-user
Hey, Just to add what was said previously, isn't float in Cython single precision? I doubt this was intended here and should be replaced with DTYPE_t everywhere. Other then that it was already said, np.zeros/np.exp is bad there... Regards, Sebastian On Fr, 2012-08-03 at 14:41 -0500, Ryan Krauss wrote:
Fortran, so fast, yet so painful. Once I got it working, it was 94 times faster than my pure Python version.
Thanks to Jim and Pauli for helping me find my error. Ironically, I was thinking like a C programmer. Just because a Fortran function declares its return value data type doesn't mean all calling functions or subroutines will know the data of the function when they call it.
I am still open to Cython suggestions. I don't want to bring more F77 code into the world.....
On Fri, Aug 3, 2012 at 2:16 PM, Ryan Krauss <ryanlists@gmail.com> wrote:
Thanks for the suggestions.
- Do not use small numpy arrays inside loops. Use C constructs instead.
This is where I ran into trouble with my knowledge of C. I have several 3x1 arrays that I need to pass into the dxdt function, multiply by scalars, and add together. I don't know how to do that cleanly in C. For example: x_out = x_prev + 1.0/6*(g1 + 2*g2 + 2*g3 + g4) where x_prev, g1, g2, g3, and g4 are all 3x1.
A little googling lead me to valarray's, but I don't know if that is the best approach or how to use them within Cython.
How would you do basic math on small arrays in pure C?
On Fri, Aug 3, 2012 at 1:56 PM, Pauli Virtanen <pav@iki.fi> wrote:
03.08.2012 19:02, Ryan Krauss kirjoitti: [clip]
Can anyone either make my Cython or Fortran approaches work or point me in a different direction?
Regarding Cython: run
cython -a runge_kuta.pyx
and check the created HTML file. Slow points are highlighted with yellow.
Regarding this case:
- `cdef`, not `def` for the dxdt_* function
- from libc.math import exp
- Do not use small numpy arrays inside loops. Use C constructs instead.
- Use @cython.cdivision(True), @cython.boundscheck(False)
PS. Runge-Kutta
-- Pauli Virtanen
_______________________________________________ SciPy-User mailing list SciPy-User@scipy.org http://mail.scipy.org/mailman/listinfo/scipy-user
SciPy-User mailing list SciPy-User@scipy.org http://mail.scipy.org/mailman/listinfo/scipy-user
03.08.2012 19:02, Ryan Krauss kirjoitti: [clip]
zdot1 = dzdt(x_prev,volts)
but some how it is not happy.
It's Fortran 77. You need to declare double precision dzdt I'd suggest writing Fortran 90 --- no need to bring more F77 code into existence ;) -- Pauli Virtanen
Den 03.08.2012 21:05, skrev Pauli Virtanen:
It's Fortran 77. You need to declare
double precision dzdt
I'd suggest writing Fortran 90 --- no need to bring more F77 code into existence ;)
With the new typed memoryviews in Cython, there is no need to bring more Fortran of any sort into existance. ;-) Sturla
Not tested and debugged, but to me it looks like something like this might be what you want. Sturla Den 03.08.2012 19:02, skrev Ryan Krauss:
I need help speeding up some code I wrote to perform a Runge-Kuta integration. I need to do the integration as part of a real-time control algorithm, so it needs to be fairly fast. scipy.integrate.odeint does too much error checking to be fast enough. My pure Python version was just a little too slow, so I tried coding it up in Cython. I have only used Cython once before, so I don't know if I did it correctly (the .pyx file is attached).
The code runs just fine, but there is almost no speed up. I think the core issue is that my dxdt_runge_kuta function gets called about 4000 times per second, so most of my overhead is in the function calls (I think). I am running my real-time control algorithm at 500 Hz and I need at least 2 Runge-Kuta integration steps per real-time steps for numeric stability. And the Runge-Kuta algorithm needs to evaluate the derivative 4 times per times step. So, 500 Hz * 2 * 4 = 4000 calls per second.
I also tried coding this up in fortran and using f2py, but I am getting a type mismatch error I don't understand. I have a function that declares its return values as double precision:
double precision function dzdt(x,voltage)
and I declare the variable I want to store the returned value in to also be double precision:
double precision F,z,vel,accel,zdot1,zdot2,zdot3,zdot4
zdot1 = dzdt(x_prev,volts)
but some how it is not happy.
My C skills are pretty weak (the longer I use Python, the more C I forget, and I didn't know that much to start with). I started looking into Boost as well as using f2py on C code, but I got stuck.
Can anyone either make my Cython or Fortran approaches work or point me in a different direction?
Thanks,
Ryan
_______________________________________________ SciPy-User mailing list SciPy-User@scipy.org http://mail.scipy.org/mailman/listinfo/scipy-user
Thanks to Sturla for helping me get this working in Cython. I am trying to compile the code to compare it against fortran for speed. I have run into two bugs so far (I mentioned that my C skills are weak). The first has to do with the "const trick": Error compiling Cython file: ------------------------------------------------------------ ... cdef inline void dxdt_runge_kuta(double *x "const double *", double voltage "const double", double *dxdt): #cdef double J = 0.0011767297528720126 "const double" cdef double J = 0.0011767297528720126 cdef double alpha0 = 4.1396263800000002 "const double" ^ ------------------------------------------------------------ runge_kuta_v2.pyx:12:44: Syntax error in C variable declaration I don't know what the problem is here, so for now I just got rid of all the "const double" statements. (In case the formatting doesn't come through, the little error carrot ^ points to the space between the last number and the quote. After getting rid of all the "const double" expressions (just to see if everything else would compile), I got this: Error compiling Cython file: ------------------------------------------------------------ ... dxdt[0] = vel dxdt[1] = accel dxdt[2] = dzdt def runge_kuta_one_step(double _x[::1], Py_ssize_t factor, double volts, ^ ------------------------------------------------------------ runge_kuta_v2.pyx:31:34: Expected an identifier or literal The carrot points to the first square bracket. Thanks, Ryan On Sat, Aug 4, 2012 at 6:28 PM, Sturla Molden <sturla@molden.no> wrote:
Not tested and debugged, but to me it looks like something like this might be what you want.
Sturla
Den 03.08.2012 19:02, skrev Ryan Krauss:
I need help speeding up some code I wrote to perform a Runge-Kuta integration. I need to do the integration as part of a real-time control algorithm, so it needs to be fairly fast. scipy.integrate.odeint does too much error checking to be fast enough. My pure Python version was just a little too slow, so I tried coding it up in Cython. I have only used Cython once before, so I don't know if I did it correctly (the .pyx file is attached).
The code runs just fine, but there is almost no speed up. I think the core issue is that my dxdt_runge_kuta function gets called about 4000 times per second, so most of my overhead is in the function calls (I think). I am running my real-time control algorithm at 500 Hz and I need at least 2 Runge-Kuta integration steps per real-time steps for numeric stability. And the Runge-Kuta algorithm needs to evaluate the derivative 4 times per times step. So, 500 Hz * 2 * 4 = 4000 calls per second.
I also tried coding this up in fortran and using f2py, but I am getting a type mismatch error I don't understand. I have a function that declares its return values as double precision:
double precision function dzdt(x,voltage)
and I declare the variable I want to store the returned value in to also be double precision:
double precision F,z,vel,accel,zdot1,zdot2,zdot3,zdot4
zdot1 = dzdt(x_prev,volts)
but some how it is not happy.
My C skills are pretty weak (the longer I use Python, the more C I forget, and I didn't know that much to start with). I started looking into Boost as well as using f2py on C code, but I got stuck.
Can anyone either make my Cython or Fortran approaches work or point me in a different direction?
Thanks,
Ryan
_______________________________________________ SciPy-User mailing list SciPy-User@scipy.org http://mail.scipy.org/mailman/listinfo/scipy-user
_______________________________________________ SciPy-User mailing list SciPy-User@scipy.org http://mail.scipy.org/mailman/listinfo/scipy-user
So, I get the same error when I try to compile Stula's memview.pyx example. I think I have too old of a version of cython: Cython version 0.15.1 Let me look into that... On Mon, Aug 6, 2012 at 8:51 AM, Ryan Krauss <ryanlists@gmail.com> wrote:
Thanks to Sturla for helping me get this working in Cython.
I am trying to compile the code to compare it against fortran for speed. I have run into two bugs so far (I mentioned that my C skills are weak).
The first has to do with the "const trick": Error compiling Cython file: ------------------------------------------------------------ ... cdef inline void dxdt_runge_kuta(double *x "const double *", double voltage "const double", double *dxdt): #cdef double J = 0.0011767297528720126 "const double" cdef double J = 0.0011767297528720126 cdef double alpha0 = 4.1396263800000002 "const double" ^ ------------------------------------------------------------
runge_kuta_v2.pyx:12:44: Syntax error in C variable declaration
I don't know what the problem is here, so for now I just got rid of all the "const double" statements. (In case the formatting doesn't come through, the little error carrot ^ points to the space between the last number and the quote.
After getting rid of all the "const double" expressions (just to see if everything else would compile), I got this: Error compiling Cython file: ------------------------------------------------------------ ... dxdt[0] = vel dxdt[1] = accel dxdt[2] = dzdt
def runge_kuta_one_step(double _x[::1], Py_ssize_t factor, double volts, ^ ------------------------------------------------------------
runge_kuta_v2.pyx:31:34: Expected an identifier or literal
The carrot points to the first square bracket.
Thanks,
Ryan
On Sat, Aug 4, 2012 at 6:28 PM, Sturla Molden <sturla@molden.no> wrote:
Not tested and debugged, but to me it looks like something like this might be what you want.
Sturla
Den 03.08.2012 19:02, skrev Ryan Krauss:
I need help speeding up some code I wrote to perform a Runge-Kuta integration. I need to do the integration as part of a real-time control algorithm, so it needs to be fairly fast. scipy.integrate.odeint does too much error checking to be fast enough. My pure Python version was just a little too slow, so I tried coding it up in Cython. I have only used Cython once before, so I don't know if I did it correctly (the .pyx file is attached).
The code runs just fine, but there is almost no speed up. I think the core issue is that my dxdt_runge_kuta function gets called about 4000 times per second, so most of my overhead is in the function calls (I think). I am running my real-time control algorithm at 500 Hz and I need at least 2 Runge-Kuta integration steps per real-time steps for numeric stability. And the Runge-Kuta algorithm needs to evaluate the derivative 4 times per times step. So, 500 Hz * 2 * 4 = 4000 calls per second.
I also tried coding this up in fortran and using f2py, but I am getting a type mismatch error I don't understand. I have a function that declares its return values as double precision:
double precision function dzdt(x,voltage)
and I declare the variable I want to store the returned value in to also be double precision:
double precision F,z,vel,accel,zdot1,zdot2,zdot3,zdot4
zdot1 = dzdt(x_prev,volts)
but some how it is not happy.
My C skills are pretty weak (the longer I use Python, the more C I forget, and I didn't know that much to start with). I started looking into Boost as well as using f2py on C code, but I got stuck.
Can anyone either make my Cython or Fortran approaches work or point me in a different direction?
Thanks,
Ryan
_______________________________________________ SciPy-User mailing list SciPy-User@scipy.org http://mail.scipy.org/mailman/listinfo/scipy-user
_______________________________________________ SciPy-User mailing list SciPy-User@scipy.org http://mail.scipy.org/mailman/listinfo/scipy-user
I upgraded to cython 0.16 and made a bit more progress. I don't know if this is headed in the right direction or not, but based on the memview.pyx example I changed double _x[::1] to np.float64_t[::1] _x and did the same thing with cdef double out[::1] = np.zeros(3) I seem to be closer to compiling successfully, but now have this error: Error compiling Cython file: ------------------------------------------------------------ ... import numpy as np cimport numpy as np from libc.math cimport exp, fabs cdef inline void dxdt_runge_kuta(double *x "const double *", ^ ------------------------------------------------------------ runge_kuta_v2.pyx:8:32: Function argument cannot have C name specification (carrot points to the last a in runge_kuta Thanks again, Ryan On Mon, Aug 6, 2012 at 9:02 AM, Ryan Krauss <ryanlists@gmail.com> wrote:
So, I get the same error when I try to compile Stula's memview.pyx example. I think I have too old of a version of cython:
Cython version 0.15.1
Let me look into that...
On Mon, Aug 6, 2012 at 8:51 AM, Ryan Krauss <ryanlists@gmail.com> wrote:
Thanks to Sturla for helping me get this working in Cython.
I am trying to compile the code to compare it against fortran for speed. I have run into two bugs so far (I mentioned that my C skills are weak).
The first has to do with the "const trick": Error compiling Cython file: ------------------------------------------------------------ ... cdef inline void dxdt_runge_kuta(double *x "const double *", double voltage "const double", double *dxdt): #cdef double J = 0.0011767297528720126 "const double" cdef double J = 0.0011767297528720126 cdef double alpha0 = 4.1396263800000002 "const double" ^ ------------------------------------------------------------
runge_kuta_v2.pyx:12:44: Syntax error in C variable declaration
I don't know what the problem is here, so for now I just got rid of all the "const double" statements. (In case the formatting doesn't come through, the little error carrot ^ points to the space between the last number and the quote.
After getting rid of all the "const double" expressions (just to see if everything else would compile), I got this: Error compiling Cython file: ------------------------------------------------------------ ... dxdt[0] = vel dxdt[1] = accel dxdt[2] = dzdt
def runge_kuta_one_step(double _x[::1], Py_ssize_t factor, double volts, ^ ------------------------------------------------------------
runge_kuta_v2.pyx:31:34: Expected an identifier or literal
The carrot points to the first square bracket.
Thanks,
Ryan
On Sat, Aug 4, 2012 at 6:28 PM, Sturla Molden <sturla@molden.no> wrote:
Not tested and debugged, but to me it looks like something like this might be what you want.
Sturla
Den 03.08.2012 19:02, skrev Ryan Krauss:
I need help speeding up some code I wrote to perform a Runge-Kuta integration. I need to do the integration as part of a real-time control algorithm, so it needs to be fairly fast. scipy.integrate.odeint does too much error checking to be fast enough. My pure Python version was just a little too slow, so I tried coding it up in Cython. I have only used Cython once before, so I don't know if I did it correctly (the .pyx file is attached).
The code runs just fine, but there is almost no speed up. I think the core issue is that my dxdt_runge_kuta function gets called about 4000 times per second, so most of my overhead is in the function calls (I think). I am running my real-time control algorithm at 500 Hz and I need at least 2 Runge-Kuta integration steps per real-time steps for numeric stability. And the Runge-Kuta algorithm needs to evaluate the derivative 4 times per times step. So, 500 Hz * 2 * 4 = 4000 calls per second.
I also tried coding this up in fortran and using f2py, but I am getting a type mismatch error I don't understand. I have a function that declares its return values as double precision:
double precision function dzdt(x,voltage)
and I declare the variable I want to store the returned value in to also be double precision:
double precision F,z,vel,accel,zdot1,zdot2,zdot3,zdot4
zdot1 = dzdt(x_prev,volts)
but some how it is not happy.
My C skills are pretty weak (the longer I use Python, the more C I forget, and I didn't know that much to start with). I started looking into Boost as well as using f2py on C code, but I got stuck.
Can anyone either make my Cython or Fortran approaches work or point me in a different direction?
Thanks,
Ryan
_______________________________________________ SciPy-User mailing list SciPy-User@scipy.org http://mail.scipy.org/mailman/listinfo/scipy-user
_______________________________________________ SciPy-User mailing list SciPy-User@scipy.org http://mail.scipy.org/mailman/listinfo/scipy-user
Den 06.08.2012 15:51, skrev Ryan Krauss:
Thanks to Sturla for helping me get this working in Cython.
I am trying to compile the code to compare it against fortran for speed. I have run into two bugs so far (I mentioned that my C skills are weak).
Sorry, I should have debugged :( This one compiles with $ python setup.py build_ext Is this what you wanted? Sturla
Thanks Stula. That code compiles just fine and will go a long way toward helping me understand how to use Cython to write fast code for these kinds of applications. For many Runge-Kutta steps, your Cython code is 200 times faster than my pure Python version. Fortran is still 1.6 times faster than the Cython version, but the Fortran version is much more work to code up. Thanks again, Ryan On Mon, Aug 6, 2012 at 6:18 PM, Sturla Molden <sturla@molden.no> wrote:
Den 06.08.2012 15:51, skrev Ryan Krauss:
Thanks to Sturla for helping me get this working in Cython.
I am trying to compile the code to compare it against fortran for speed. I have run into two bugs so far (I mentioned that my C skills are weak).
Sorry, I should have debugged :(
This one compiles with
$ python setup.py build_ext
Is this what you wanted?
Sturla
_______________________________________________ SciPy-User mailing list SciPy-User@scipy.org http://mail.scipy.org/mailman/listinfo/scipy-user
On 07.08.2012 18:37, Ryan Krauss wrote:
For many Runge-Kutta steps, your Cython code is 200 times faster than my pure Python version. Fortran is still 1.6 times faster than the Cython version, but the Fortran version is much more work to code up.
Don't expect anything to be "faster than Fortran" for certain kind of numerical work. Cython has a certain overhead (larger than C and Fortran), and since it compiles to ANSI C (not ISO C) we cannot restrict pointers. But still, ~75% of Fortran performance is often acceptable! Another thing is you need to look at "scalability". How much of that extra runtime is constant due to differences between Cython and f2py? How much is variable due to the numerical kernel being faster in Fortran? Will differently sized problems give you the same overhead from using Cython? It often helps to plot a graph of the performance (mean and error bars) for various problem sizes, rather than benchmarking at one single point. Correctness is always more important than speed. That is one thing to consider too. With Cython we can begin with a tested Python prototype and optimize along the way, using the Python profiler to pinpoint where it matters the most. Python, NumPy and Cython will not win the world championship of being "fastest on the CPU" for simple numerical kernels, but that is not the idea either. Implementing complex algorithms in Fortran can be a PITA compared to Python. But Cython helps us in a stright forward way to speed up Python code and/or interface with C or C++. Fortran is only nice for helping us scientists to avoid the pointer arithmetics of C, but Cython's memoryviews do that too. Sturla
I agree. Thanks again. On Tue, Aug 7, 2012 at 1:10 PM, Sturla Molden <sturla@molden.no> wrote:
On 07.08.2012 18:37, Ryan Krauss wrote:
For many Runge-Kutta steps, your Cython code is 200 times faster than my pure Python version. Fortran is still 1.6 times faster than the Cython version, but the Fortran version is much more work to code up.
Don't expect anything to be "faster than Fortran" for certain kind of numerical work. Cython has a certain overhead (larger than C and Fortran), and since it compiles to ANSI C (not ISO C) we cannot restrict pointers. But still, ~75% of Fortran performance is often acceptable! Another thing is you need to look at "scalability". How much of that extra runtime is constant due to differences between Cython and f2py? How much is variable due to the numerical kernel being faster in Fortran? Will differently sized problems give you the same overhead from using Cython? It often helps to plot a graph of the performance (mean and error bars) for various problem sizes, rather than benchmarking at one single point.
Correctness is always more important than speed. That is one thing to consider too. With Cython we can begin with a tested Python prototype and optimize along the way, using the Python profiler to pinpoint where it matters the most. Python, NumPy and Cython will not win the world championship of being "fastest on the CPU" for simple numerical kernels, but that is not the idea either. Implementing complex algorithms in Fortran can be a PITA compared to Python. But Cython helps us in a stright forward way to speed up Python code and/or interface with C or C++. Fortran is only nice for helping us scientists to avoid the pointer arithmetics of C, but Cython's memoryviews do that too.
Sturla _______________________________________________ SciPy-User mailing list SciPy-User@scipy.org http://mail.scipy.org/mailman/listinfo/scipy-user
participants (5)
-
Jim Vickroy -
Pauli Virtanen -
Ryan Krauss -
Sebastian Berg -
Sturla Molden