[Tutor] how to optimize this code?

Albert-Jan Roskam fomcl at yahoo.com
Mon Mar 28 21:37:32 CEST 2011


Hi Stefan,

Thanks for your advice. I seriously thought ctypes was the module to use. That 
was before I found out the evaluating all 10**9 values of my test data set is 
glacially slow (several hours). You're right, the dll implies the program is 
running on windows. I've also been trying to make it work under Linux but I 
wanted to get the basic algorithm right first. Also, it was quite a PIA to get 
all the dependencies of the (old) .so files.

Your speed tip reminded me of: 
http://wiki.python.org/moin/PythonSpeed/PerformanceTips#Avoiding_dots...
Does this mean that "from ctypes import *" gives slightly faster code than 
"import ctypes"? If so: wow! I've always avoided the first notation like the 
plague.

What do you mean with '... using a constant pointer for numValue' ? Is this the 
byref/pointer object distinction? I replaced a the pointer object with a byref 
object, which reduced processing time by about 10 %.

Cython might be interesting as a hobby project, but I'm affraid I'll never get 
the ICT droids in my office to install that.
 

 Cheers!!
Albert-Jan


~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
All right, but apart from the sanitation, the medicine, education, wine, public 
order, irrigation, roads, a fresh water system, and public health, what have the 
Romans ever done for us?
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~




________________________________
From: Stefan Behnel <stefan_ml at behnel.de>
To: tutor at python.org
Sent: Mon, March 28, 2011 7:43:16 AM
Subject: Re: [Tutor] how to optimize this code?

Albert-Jan Roskam, 27.03.2011 21:57:
> I made a program that reads spss data files. I ran cProfile to see if I can
> optimize things (see #1 below).

First thing to note here: sort the output by "time", which refers to the 
"tottime" column. That will make it more obvious where most time is really 
spent.


> It seems that the function getValueNumeric is a pain spot (see #2
> below). This function calls a C function in a dll for each numerical
> cell value. On the basis of this limited amount of info, what could I do
> to further optimize the code? I heard about psyco, but I didn't think
> such tricks would be necessary as the function spssGetValueNumeric is is
> implemented in C already (which should be fast).

The problem is that you are using ctypes to call it. It's useful for simple 
things, but it's not usable for performance critical things, such as calling a C 
function ten million times in your example. Since you're saying "dll", is this 
under Windows? It's a bit more tricky to set up Cython on that platform than on 
pretty much all others, since you additionally need to install a C compiler, but 
if you want to go that route, it will reward you with a much faster way to call 
your C code, and will allow you to also speed up the code that does the calls.

That being said, see below.


> ## most time consuming function
> 
>   def getValueNumeric(fh, spssio, varHandle):
>      numValue = ctypes.c_double()
>      numValuePtr = ctypes.byref(numValue)
>      retcode = spssio.spssGetValueNumeric(fh,
>                                 ctypes.c_double(varHandle),
>                                 numValuePtr)

You may still be able to make this code a tad faster, by avoiding the function 
name lookups on both the ctypes module and "spssio", and by using a constant 
pointer for numValue (is you're not using threads). That may not make enough of 
a difference, but it should at least be a little faster.

Stefan

_______________________________________________
Tutor maillist  -  Tutor at python.org
To unsubscribe or change subscription options:
http://mail.python.org/mailman/listinfo/tutor
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/tutor/attachments/20110328/8287bb2c/attachment.html>


More information about the Tutor mailing list