[Tutor] how to optimize this code?

Stefan Behnel stefan_ml at behnel.de
Mon Mar 28 07:43:16 CEST 2011


Albert-Jan Roskam, 27.03.2011 21:57:
> I made a program that reads spss data files. I ran cProfile to see if I can
> optimize things (see #1 below).

First thing to note here: sort the output by "time", which refers to the 
"tottime" column. That will make it more obvious where most time is really 
spent.


> It seems that the function getValueNumeric is a pain spot (see #2
> below). This function calls a C function in a dll for each numerical
> cell value. On the basis of this limited amount of info, what could I do
> to further optimize the code? I heard about psyco, but I didn't think
> such tricks would be necessary as the function spssGetValueNumeric is is
> implemented in C already (which should be fast).

The problem is that you are using ctypes to call it. It's useful for simple 
things, but it's not usable for performance critical things, such as 
calling a C function ten million times in your example. Since you're saying 
"dll", is this under Windows? It's a bit more tricky to set up Cython on 
that platform than on pretty much all others, since you additionally need 
to install a C compiler, but if you want to go that route, it will reward 
you with a much faster way to call your C code, and will allow you to also 
speed up the code that does the calls.

That being said, see below.


> ## most time consuming function
>
>   def getValueNumeric(fh, spssio, varHandle):
>      numValue = ctypes.c_double()
>      numValuePtr = ctypes.byref(numValue)
>      retcode = spssio.spssGetValueNumeric(fh,
>                                 ctypes.c_double(varHandle),
>                                 numValuePtr)

You may still be able to make this code a tad faster, by avoiding the 
function name lookups on both the ctypes module and "spssio", and by using 
a constant pointer for numValue (is you're not using threads). That may not 
make enough of a difference, but it should at least be a little faster.

Stefan



More information about the Tutor mailing list