[SciPy-user] Fastest python extension for tiny functions?

Sun Dec 3 22:58:56 EST 2006

Anand Patil wrote:
> Hi all,
>
> I have some very light functions that get called enormous numbers of 
> times. I'd like to get them running as fast as possible, even if it 
> makes them ugly. The functions do several integer comparisons but no 
> floating-point computation, so I'd like to port them to C/C++ rather 
> than Fortran.
>
> While weave.inline has been rocking my world for most applications, on 
> my computer the gateway alone seems to take about 1s per 100k calls, 
> which is quite a bit of overhead for functions this small. Could anyone 
> help me figure out which python-to-C method (swig, boost::python, etc) 
> is fastest for tiny functions? I know ahead of time what types all the 
> arguments will be.
I have the same problem for some recursive problems which are not easy 
to "vectorize" (recursive stochastic approximation). I don't know your 
case, but I think mine is typical. I have a function which is computive 
intensive, but works on numpy arrays instead of looping using python. 
The function checks its arguments, the implementation is left to an 
other function etc... All the checking + function call is really 
expensive once you send only one sample instead of a whole array (the 
function itself, in C, would take a few 100/1000 cycles). My approach so 
far is
    1 First, implements the algorithm as straightforward as possible, 
and keep it as a reference :)
    2 Then profile it significantly to be sure that you have a problem 
where you think you have a problem (under linux, I found hotshot + 
kcachegrind really useful for this, thanks to the graphical call tree 
with the cost at each node).
    2 To slightly change the api: basically, you have to trade 
flexibility for speed. You have to give up flexibility ie your function 
won't be reusable in other cases, but it is tiny, so it is OK. I check 
the arguments in the "root function", and loop in python using a special 
purpose, ugly function without argument checking in python first, and 
once I checked it worked, in pure C python, without arguments checking.

    For 2, and for the scale you are talking about (100k functions / s; 
I don't know your computer's CPU power, but on my fairly recent 
computer, this scale definitely shows the limits of actual python 
implementation, where function calls are expensive), ctypes is too 
expensive in my experience, when passing at least one numpy array: the 
call from_param used for any argument passing is itself a python 
function call, which is expensive. Also, you should avoid any superflous 
other function call (in python); in my case, with my special purpose 
function, by removing all the intermediate layers, I got a speed up of 
5x. For example, juste by removing calls to functions such as 
atleast_2d, shape, etc... I already got a 2x speed up !

    When I am done, as the code is part of the pyem toolbox in 
scipy.sandbox, you may take a look if you want (the change will be 
documented),

    cheers,

    David