[Tutor] how to optimize this code?
Albert-Jan Roskam
fomcl at yahoo.com
Mon Mar 28 21:37:32 CEST 2011
Hi Stefan,
Thanks for your advice. I seriously thought ctypes was the module to use. That
was before I found out the evaluating all 10**9 values of my test data set is
glacially slow (several hours). You're right, the dll implies the program is
running on windows. I've also been trying to make it work under Linux but I
wanted to get the basic algorithm right first. Also, it was quite a PIA to get
all the dependencies of the (old) .so files.
Your speed tip reminded me of:
http://wiki.python.org/moin/PythonSpeed/PerformanceTips#Avoiding_dots...
Does this mean that "from ctypes import *" gives slightly faster code than
"import ctypes"? If so: wow! I've always avoided the first notation like the
plague.
What do you mean with '... using a constant pointer for numValue' ? Is this the
byref/pointer object distinction? I replaced a the pointer object with a byref
object, which reduced processing time by about 10 %.
Cython might be interesting as a hobby project, but I'm affraid I'll never get
the ICT droids in my office to install that.
Cheers!!
Albert-Jan
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
All right, but apart from the sanitation, the medicine, education, wine, public
order, irrigation, roads, a fresh water system, and public health, what have the
Romans ever done for us?
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
________________________________
From: Stefan Behnel <stefan_ml at behnel.de>
To: tutor at python.org
Sent: Mon, March 28, 2011 7:43:16 AM
Subject: Re: [Tutor] how to optimize this code?
Albert-Jan Roskam, 27.03.2011 21:57:
> I made a program that reads spss data files. I ran cProfile to see if I can
> optimize things (see #1 below).
First thing to note here: sort the output by "time", which refers to the
"tottime" column. That will make it more obvious where most time is really
spent.
> It seems that the function getValueNumeric is a pain spot (see #2
> below). This function calls a C function in a dll for each numerical
> cell value. On the basis of this limited amount of info, what could I do
> to further optimize the code? I heard about psyco, but I didn't think
> such tricks would be necessary as the function spssGetValueNumeric is is
> implemented in C already (which should be fast).
The problem is that you are using ctypes to call it. It's useful for simple
things, but it's not usable for performance critical things, such as calling a C
function ten million times in your example. Since you're saying "dll", is this
under Windows? It's a bit more tricky to set up Cython on that platform than on
pretty much all others, since you additionally need to install a C compiler, but
if you want to go that route, it will reward you with a much faster way to call
your C code, and will allow you to also speed up the code that does the calls.
That being said, see below.
> ## most time consuming function
>
> def getValueNumeric(fh, spssio, varHandle):
> numValue = ctypes.c_double()
> numValuePtr = ctypes.byref(numValue)
> retcode = spssio.spssGetValueNumeric(fh,
> ctypes.c_double(varHandle),
> numValuePtr)
You may still be able to make this code a tad faster, by avoiding the function
name lookups on both the ctypes module and "spssio", and by using a constant
pointer for numValue (is you're not using threads). That may not make enough of
a difference, but it should at least be a little faster.
Stefan
_______________________________________________
Tutor maillist - Tutor at python.org
To unsubscribe or change subscription options:
http://mail.python.org/mailman/listinfo/tutor
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/tutor/attachments/20110328/8287bb2c/attachment.html>
More information about the Tutor
mailing list