affects on extended modules

Thu Dec 6 06:45:41 EST 2001

"Curtis Jensen" <cjensen at bioeng.ucsd.edu> wrote:

> Kragen Sitaker wrote:
>> 
>> Curtis Jensen <cjensen at bioeng.ucsd.edu> writes:
>> > We have created a python interface to some core libraries of our own
>> > making.  We also have a C interface to these same libraries. However,
>> > the the python interface seems to affect the speed of the extended
>> > libraries.  ie.  some library routines have their own benchmark code,
>> > and the time of exection from the start of the library routine to the
>> > end of the library routine (not including any python code execution),
>> > takes longer than it's C counterpart.
>> 
>> In the Python version, the code is in a Python extension module, right?
>>  A .so or .dll file?  Is it also in the C counterpart?  (If that's not
>> it, can you provide more details on how you compiled and linked the
>> two?)
>> 
>> In general, referring to dynamically loaded things through symbols ---
>> even from within the same file --- tends to be slower than referring to
>> things that aren't dynamically loaded.
>> 
>> What architecture are you on?  If you're on the x86, maybe Numeric is
>> being stupid and allocating things that aren't maximally aligned.  But
>> you'd probably notice a pretty drastic difference in that case.
>> 
>> ... or maybe Numeric is being stupid and allocating things in a way
>> that causes cache-line contention.
>> 
>> Hope this helps.
> 
> Thanks for the responce.  The C counterpart is directly linked together
> into one large binary (yes, the python is using a dynamicaly linked
> object file, a .so).  So, That might be the source of the problem.  I
> can try and make a dynamicaly linked version of the C counterpart and
> see how that affects the speed.  We are running on IRIX 6.5 machines
> (mips).
> Thanks.
> 

Don't know if this helps but I had a similar problem on Linux.

The context was : a python script was calling an external program and
parsing output (with popen) many times. I decided to optimize this
by turning the external program into a dynamicaly linked library with
python bindings. I expected to gain the extra system calls to fork and
start a new process, but it turned out that this solution was slower.

The problem was caused by multithreading stuff. When using the library
straight from a C program, I didn't link with multithreaded libraries
and so all system calls weren't protected (they don't need to lock and
unlock their resources).

Unfortunately, the library was reading files with fgetc (character by
character :( ). Since the Python version I used was compiled with
multi-threading enabled, it turned out that the fgetc function used in
this case lock/unlock features, which cause the extra waste of time.

To find this, I compiled my library with profiling (I think I needed to
use some system call to activate profiling from the library, since I
couldn't rebuild Python).

OT : at the end I fixed the library (fgetc replaced by fgets), and didn't
gain anything by turning the external program into a python extension. 
Since it seemed that Linux disk cache was good, I removed the python 
extension thus keeping a pure Python program, and implemented a cache
for the results of the external program. This was much simpler and more
efficient in this case.

-- 

Pedro