Hi all, I have some very light functions that get called enormous numbers of times. I'd like to get them running as fast as possible, even if it makes them ugly. The functions do several integer comparisons but no floating-point computation, so I'd like to port them to C/C++ rather than Fortran. While weave.inline has been rocking my world for most applications, on my computer the gateway alone seems to take about 1s per 100k calls, which is quite a bit of overhead for functions this small. Could anyone help me figure out which python-to-C method (swig, boost::python, etc) is fastest for tiny functions? I know ahead of time what types all the arguments will be. Thanks very much as always, Anand
While weave.inline has been rocking my world for most applications, on my computer the gateway alone seems to take about 1s per 100k calls, which is quite a bit of overhead for functions this small. Could anyone help me figure out which python-to-C method (swig, boost::python, etc) is fastest for tiny functions? I know ahead of time what types all the arguments will be.
The fastest interface is going to be hand-written. Other than that, my experience shows that ctypes and pyrex (and weave) are comparable to each other (and not much slower than hand-written at that). All of my experience, however, is on functions that do a reasonably large amount of work. -Travis
Hi all,
I have some very light functions that get called enormous numbers of times. I'd like to get them running as fast as possible, even if it makes them ugly. The functions do several integer comparisons but no floating-point computation, so I'd like to port them to C/C++ rather than Fortran.
While weave.inline has been rocking my world for most applications, on my computer the gateway alone seems to take about 1s per 100k calls, which is quite a bit of overhead for functions this small. Could anyone help me figure out which python-to-C method (swig, boost::python, etc) is fastest for tiny functions? I know ahead of time what types all the arguments will be. I have the same problem for some recursive problems which are not easy to "vectorize" (recursive stochastic approximation). I don't know your case, but I think mine is typical. I have a function which is computive intensive, but works on numpy arrays instead of looping using python. The function checks its arguments, the implementation is left to an other function etc... All the checking + function call is really expensive once you send only one sample instead of a whole array (the function itself, in C, would take a few 100/1000 cycles). My approach so far is 1 First, implements the algorithm as straightforward as possible, and keep it as a reference :) 2 Then profile it significantly to be sure that you have a problem where you think you have a problem (under linux, I found hotshot + kcachegrind really useful for this, thanks to the graphical call tree with the cost at each node). 2 To slightly change the api: basically, you have to trade flexibility for speed. You have to give up flexibility ie your function won't be reusable in other cases, but it is tiny, so it is OK. I check
Anand Patil wrote: the arguments in the "root function", and loop in python using a special purpose, ugly function without argument checking in python first, and once I checked it worked, in pure C python, without arguments checking. For 2, and for the scale you are talking about (100k functions / s; I don't know your computer's CPU power, but on my fairly recent computer, this scale definitely shows the limits of actual python implementation, where function calls are expensive), ctypes is too expensive in my experience, when passing at least one numpy array: the call from_param used for any argument passing is itself a python function call, which is expensive. Also, you should avoid any superflous other function call (in python); in my case, with my special purpose function, by removing all the intermediate layers, I got a speed up of 5x. For example, juste by removing calls to functions such as atleast_2d, shape, etc... I already got a 2x speed up ! When I am done, as the code is part of the pyem toolbox in scipy.sandbox, you may take a look if you want (the change will be documented), cheers, David
participants (3)
-
Anand Patil
-
David Cournapeau
-
Travis Oliphant