[Tutor] Cython question

Sun Jul 3 11:15:46 CEST 2011

Hi Stefan, Alan,

Thanks for your useful advice. The first thing I will try is take the call to the spssio dll out of the Python method (ie, 'unwrap' it) and put it inside the loop. I didn't think wrapping the code inside a method/function would create so much overhead.

Cheers!!

Albert-Jan

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

All right, but apart from the sanitation, the medicine, education, wine, public order, irrigation, roads, a fresh water system, and public health, what have the Romans ever done for us?

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

--- On Sat, 7/2/11, Stefan Behnel <stefan_ml at behnel.de> wrote:

From: Stefan Behnel <stefan_ml at behnel.de>
Subject: Re: [Tutor] Cython question
To: tutor at python.org
Date: Saturday, July 2, 2011, 4:52 PM

Alan Gauld, 02.07.2011 15:28:
> "Albert-Jan Roskam" wrote
>> I used cProfile to find the bottlenecks, the two Python functions
>> getValueChar and getValueNum. These two Python functions simply call two
>> equivalent C functions in a .dll (using ctypes).

The code is currently declared as Windows-only and I don't know any good C-level profiling code for that platform. Under Linux, once I'm sure I have a CPU bound problem below the Python level, I'd use valgrind and KCacheGrind to analyse the performance. That will include all C function calls (and even CPU instructions, if you want) in the call trace. Makes it a bit less obvious to see what Python is doing, but leads to much more detailed results at the C level.

It's also worth keeping in mind that all profiling attempts *always* interfere with the normal program execution. The results you get during a profiling run may not be what you'd get with profiling disabled. So, profiling is nice, but it doesn't replace proper benchmarking.

> In that case cythin will speed up the calling loops but it can't do
> anything to speed up the DLL calls, you have effectively already optimised
> those functions by calling the DLL.
> 
>> The problem is that these functions are called as many times as there are
>> VALUES in a file
> 
> It might be worth a try if you have very big data sets
> because a C loop is faster than a Python loop. But don't expect order of
> magnitude improvements.

Looking at the code now, it's actually worse than that. The C function call does not only go through ctypes, but is additionally wrapped in a method call. So the OP is paying the call overhead twice for each field, plus the method lookup and some other operations. These things can add up quite easily.

So, iff the conversion code is really a CPU bottleneck, and depending on how much work the C functions actually do, the current call overhead, 100 times per record, may be a substantial part of the game. It's worth seeing if it can be dropped at the Python level by removing method lookup and call levels (i.e. by inlining the method), but if that's not enough, Cython may still be worth it. For one, Cython's call overhead is lower than that of ctypes, and if the call is only done once, and the loop is moved into Cython (i.e. C) entirely, the overhead will also drop substantially.

It might also be worth running the code in PyPy instead of CPython. PyPy will optimise a lot of the overhead away that this code contains.

>> So if I understand you correctly, this is not Cpu bound

I don't have enough information to comment on that.

> It may still be CPU bound in that the CPU is doing all the work, but if
> the CPU time is in the DLL functions rather than in the loop cython
> won't help much.
> 
> CPU bound refers to the type of processing - is it lots of logic, math,
> control flows etc? Or is it I/O bound - reading network, disk, or user
> input? Or it might be memory bound - creating lots of in memory objects
> (especially if that results in paging to disk, when it becomes I/O
> bound  too!)
> 
> Knowing what is causing the bottleneck will determine how to improve
> things. Use tools like TaskManager in Windows or top in *nix to see
> where the time is going and what resources are being consumed. Fast code
> is not always the answer.

That is very good advice. As a rule of thumb, a process monitor like top will tell you how much time is spent in I/O and CPU. If, during a test run (with profiling disabled, as that eats time, too!), your CPU usage stays close to 100%, your program is CPU bound. If, however, it stays lower, and the monitor reports a high I/O waiting time, it's I/O bound. In this case, I/O bound is what you want to achieve, because it means that your code is running faster than your hard drive can deliver the data.

Stefan

_______________________________________________
Tutor maillist  -  Tutor at python.org
To unsubscribe or change subscription options:
http://mail.python.org/mailman/listinfo/tutor
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/tutor/attachments/20110703/2d36f9e3/attachment.html>