[Tutor] Cython question

Albert-Jan Roskam fomcl at yahoo.com
Sat Jul 2 14:47:01 CEST 2011


Hi Stefan, Alan, Matt,

Thanks for your replies. 

I used cProfile to find the bottlenecks, the two Python functions getValueChar and getValueNum. These two Python functions simply call two equivalent C functions in a .dll (using ctypes). The problem is that these functions are called as many times as there are VALUES in a file (e.g. 1000 records * 100 columns = 100000 function calls). So if I understand you correctly, this is not Cpu bound and, therefore, alas, Cython won't improve the excution time. Correct?

That .dll contains many more functions, for example to extract certain header information (a list of all the spss variables, etc.). Getting this kind of information is only done once per spss file. So, to answer your question, Stefan, I'd like this part of the code to remain the same, ie. with ctypes. Nothing much to win anyway, with just one function call per data file.

Cython might be useful when the program is converting spss date/times (seconds since gregorian epoch) to iso-date/times. If I understand it correctly, this is certainly cpu bound.

Btw, Matt, I indeed used psyco already, although I never precisely quantified the improvement in speed.
 
Cheers!!

Albert-Jan



~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

All right, but apart from the sanitation, the medicine, education, wine, public order, irrigation, roads, a fresh water system, and public health, what have the Romans ever done for us?

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

--- On Sat, 7/2/11, Stefan Behnel <stefan_ml at behnel.de> wrote:

From: Stefan Behnel <stefan_ml at behnel.de>
Subject: Re: [Tutor] Cython question
To: tutor at python.org
Date: Saturday, July 2, 2011, 1:29 PM

Albert-Jan Roskam, 02.07.2011 11:49:
> Some time ago I finished a sav reader for Spss .sav data files (also with the
> help of some of you!):
> http://code.activestate.com/recipes/577650-python-reader-for-spss-sav-files/
> 
> It works fine, but it is not fast with big files. I am thinking of implementing
> two of the functions in cython (getValueChar and getValueNum).
> As far as I understood it requires the functions to be re-written in a
> Python-like langauge

"rewritten" only in the sense that you may want to apply optimisations or provide type hints. Cython is Python, but with language extensions that allow the compiler to apply static optimisations to your code.


> , 'minus the memory manager'.

Erm, not sure what you mean here. Cython uses the same memory management as CPython.


> That little piece of code is
> converted to C and subsequently compiled to a .dll or .so file. The original
> program listens and talks to that .dll file. A couple of questions:
> -is this a correct representation of things?

More or less. Instead of "listens and talks", I'd rather say "uses". What you get is just another Python extension module which you can use like any other Python module.


> -will the speed improvement be worthwhile? (pros)

Depends. If your code is I/O bound, then likely not. If the above two functions are true CPU bottlenecks that do some kind of calculation or data transformation, it's likely going to be faster in Cython.


> -are there reasons not to try this? (cons)

If your performance problem is not CPU related, it may not be worth it.


> -is it 'sane' to mix ctypes and cython for nonintensive and intensive
> operations, respectively?

Why would you want to use ctypes if you can use Cython?

Stefan

_______________________________________________
Tutor maillist  -  Tutor at python.org
To unsubscribe or change subscription options:
http://mail.python.org/mailman/listinfo/tutor
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/tutor/attachments/20110702/3cab4c38/attachment-0001.html>


More information about the Tutor mailing list