[SciPy-user] Fastest python extension for tiny functions?
David Cournapeau
david at ar.media.kyoto-u.ac.jp
Sun Dec 3 22:58:56 EST 2006
Anand Patil wrote:
> Hi all,
>
> I have some very light functions that get called enormous numbers of
> times. I'd like to get them running as fast as possible, even if it
> makes them ugly. The functions do several integer comparisons but no
> floating-point computation, so I'd like to port them to C/C++ rather
> than Fortran.
>
> While weave.inline has been rocking my world for most applications, on
> my computer the gateway alone seems to take about 1s per 100k calls,
> which is quite a bit of overhead for functions this small. Could anyone
> help me figure out which python-to-C method (swig, boost::python, etc)
> is fastest for tiny functions? I know ahead of time what types all the
> arguments will be.
I have the same problem for some recursive problems which are not easy
to "vectorize" (recursive stochastic approximation). I don't know your
case, but I think mine is typical. I have a function which is computive
intensive, but works on numpy arrays instead of looping using python.
The function checks its arguments, the implementation is left to an
other function etc... All the checking + function call is really
expensive once you send only one sample instead of a whole array (the
function itself, in C, would take a few 100/1000 cycles). My approach so
far is
1 First, implements the algorithm as straightforward as possible,
and keep it as a reference :)
2 Then profile it significantly to be sure that you have a problem
where you think you have a problem (under linux, I found hotshot +
kcachegrind really useful for this, thanks to the graphical call tree
with the cost at each node).
2 To slightly change the api: basically, you have to trade
flexibility for speed. You have to give up flexibility ie your function
won't be reusable in other cases, but it is tiny, so it is OK. I check
the arguments in the "root function", and loop in python using a special
purpose, ugly function without argument checking in python first, and
once I checked it worked, in pure C python, without arguments checking.
For 2, and for the scale you are talking about (100k functions / s;
I don't know your computer's CPU power, but on my fairly recent
computer, this scale definitely shows the limits of actual python
implementation, where function calls are expensive), ctypes is too
expensive in my experience, when passing at least one numpy array: the
call from_param used for any argument passing is itself a python
function call, which is expensive. Also, you should avoid any superflous
other function call (in python); in my case, with my special purpose
function, by removing all the intermediate layers, I got a speed up of
5x. For example, juste by removing calls to functions such as
atleast_2d, shape, etc... I already got a 2x speed up !
When I am done, as the code is part of the pyem toolbox in
scipy.sandbox, you may take a look if you want (the change will be
documented),
cheers,
David
More information about the SciPy-User
mailing list