[Tutor] how to optimize this code?
stefan_ml at behnel.de
Mon Mar 28 07:43:16 CEST 2011
Albert-Jan Roskam, 27.03.2011 21:57:
> I made a program that reads spss data files. I ran cProfile to see if I can
> optimize things (see #1 below).
First thing to note here: sort the output by "time", which refers to the
"tottime" column. That will make it more obvious where most time is really
> It seems that the function getValueNumeric is a pain spot (see #2
> below). This function calls a C function in a dll for each numerical
> cell value. On the basis of this limited amount of info, what could I do
> to further optimize the code? I heard about psyco, but I didn't think
> such tricks would be necessary as the function spssGetValueNumeric is is
> implemented in C already (which should be fast).
The problem is that you are using ctypes to call it. It's useful for simple
things, but it's not usable for performance critical things, such as
calling a C function ten million times in your example. Since you're saying
"dll", is this under Windows? It's a bit more tricky to set up Cython on
that platform than on pretty much all others, since you additionally need
to install a C compiler, but if you want to go that route, it will reward
you with a much faster way to call your C code, and will allow you to also
speed up the code that does the calls.
That being said, see below.
> ## most time consuming function
> def getValueNumeric(fh, spssio, varHandle):
> numValue = ctypes.c_double()
> numValuePtr = ctypes.byref(numValue)
> retcode = spssio.spssGetValueNumeric(fh,
You may still be able to make this code a tad faster, by avoiding the
function name lookups on both the ctypes module and "spssio", and by using
a constant pointer for numValue (is you're not using threads). That may not
make enough of a difference, but it should at least be a little faster.
More information about the Tutor