affects on extended modules

Thu Dec 27 20:05:10 EST 2001

Pedro <pedro_rodriguez at club-internet.fr> wrote in message news:<pan.2001.12.06.12.45.41.172197.2456 at club-internet.fr>...
> "Curtis Jensen" <cjensen at bioeng.ucsd.edu> wrote:
> 
> > Kragen Sitaker wrote:
> >> 
> >> Curtis Jensen <cjensen at bioeng.ucsd.edu> writes:
> >> > We have created a python interface to some core libraries of our own
> >> > making.  We also have a C interface to these same libraries. However,
> >> > the the python interface seems to affect the speed of the extended
> >> > libraries.  ie.  some library routines have their own benchmark code,
> >> > and the time of exection from the start of the library routine to the
> >> > end of the library routine (not including any python code execution),
> >> > takes longer than it's C counterpart.
> >> 
> >> In the Python version, the code is in a Python extension module, right?
> >>  A .so or .dll file?  Is it also in the C counterpart?  (If that's not
> >> it, can you provide more details on how you compiled and linked the
> >> two?)
> >> 
> >> In general, referring to dynamically loaded things through symbols ---
> >> even from within the same file --- tends to be slower than referring to
> >> things that aren't dynamically loaded.
> >> 
> >> What architecture are you on?  If you're on the x86, maybe Numeric is
> >> being stupid and allocating things that aren't maximally aligned.  But
> >> you'd probably notice a pretty drastic difference in that case.
> >> 
> >> ... or maybe Numeric is being stupid and allocating things in a way
> >> that causes cache-line contention.
> >> 
> >> Hope this helps.
> > 
> > Thanks for the responce.  The C counterpart is directly linked together
> > into one large binary (yes, the python is using a dynamicaly linked
> > object file, a .so).  So, That might be the source of the problem.  I
> > can try and make a dynamicaly linked version of the C counterpart and
> > see how that affects the speed.  We are running on IRIX 6.5 machines
> > (mips).
> > Thanks.
> > 
> 
> Don't know if this helps but I had a similar problem on Linux.
> 
> The context was : a python script was calling an external program and
> parsing output (with popen) many times. I decided to optimize this
> by turning the external program into a dynamicaly linked library with
> python bindings. I expected to gain the extra system calls to fork and
> start a new process, but it turned out that this solution was slower.
> 
> The problem was caused by multithreading stuff. When using the library
> straight from a C program, I didn't link with multithreaded libraries
> and so all system calls weren't protected (they don't need to lock and
> unlock their resources).
> 
> Unfortunately, the library was reading files with fgetc (character by
> character :( ). Since the Python version I used was compiled with
> multi-threading enabled, it turned out that the fgetc function used in
> this case lock/unlock features, which cause the extra waste of time.
> 
> To find this, I compiled my library with profiling (I think I needed to
> use some system call to activate profiling from the library, since I
> couldn't rebuild Python).
> 
> OT : at the end I fixed the library (fgetc replaced by fgets), and didn't
> gain anything by turning the external program into a python extension. 
> Since it seemed that Linux disk cache was good, I removed the python 
> extension thus keeping a pure Python program, and implemented a cache
> for the results of the external program. This was much simpler and more
> efficient in this case.

Is this a problem with i/o only?  Our the code sections that we
benchmarked has no i/o in it.

--
Curtis Jensen