Python Performance vs. C++ in a Complex System

Andrew Markebo flognat at flognat.myip.org
Sun Apr 22 06:14:41 EDT 2001


Hmm.. A thought... you need to fetch a bunch of URLs in parallell,
instead of using one thread per URL, have you considered doing a
handler of some kind that sets up and runs select on all sockets
fetching pages?

        /Andy

/ Gabriel Ambuehl <gabriel_ambuehl at buz.ch> wrote:
| Hello Courageous,
| 
| Sunday, April 15, 2001, 10:26:23 PM, you wrote:
| > Having completed both cores, and with the C++ core HIGHLY OPTIMIZED,
| > I was finally able to perform a performance test of the the C++
| system
| > versus the Python system. To my surprise, the C++ core only beat
| Python
| > by about 30%. Given the obvious inequities in coding time in both
| efforts,
| > plus whatever future coding time inequities I might project onto
| users of
| > either core by implication of the programming language, I was quite
| > surprised by these results.
| 
| This is very interesting. I've got to implement a server resource
| monitoring system and had a shot at it in my beloved Python. While
| Python's
| threading obviously works (something I can't really say about C++ as
| it appears to be not very well thought the whole stuff),
| I found it to be very slow. I'm now thinking about
| whether I should try to reimplement the whole url stuff in C (being
| C/C++ novice) to see whether this would speed up the whole process (or
| is there any C implementation of an httplib for Python that works with
| it's threading?). The major PITA I continually stumbling across is
| the fact that I need to have concurrent service checks, so a single
| threaded app with a large queue as scheduling mechanism isn't of much
| use. I've been thinking about a fork() based solution (AFAIK this is
| what NetSaint is doing) but the reporting of the results isn't doable
| in any halfway reliable or elegant way and it obviously requires way
| more resources than a threaded app. The original idea was to have a
| constantly running thread for every resource to monitor (which can get
| kinda
| problematical ram usage wise in very big networks but this isn't my
| problem
| just now as I can throw upto 1GB RAM on this even for a few number of
| hosts[2]). which then schedules itself using sleep(). This appears to
| be
| working perfectly but slow in Python and not at all (due to libcurl[3]
| related crashes) in C/C++.
| 
| Ideally, I'd want to implement the whole stuff in C++ (or probably
| some wild mix of C and C++, which generally works pretty ok) with
| existing
| libraries but obviously nobody thought about giving the threading
| stuff some flag that would take care of the data (so that pointers
| can't get fucked by non thread safe libs while something other is
| executed)
| and I clearly lack the programming experience to do such a complicated
| task myself (I think it would be possible but I've some worries about
| the performance penalties this could cause).
| 
| But your report is pretty encouraging to try it again in Python with
| an httplib implemented in C (as said, any pointers to such a beast
| would be appreciated).
| 
| Given that I might decide to use libcurl (http://curl.haxx.se) as a
| starting point
| (which doesn't appear to be threadsafe at all to me, even if some
| other people
| state it is for their apps [1]) what does Python do with non thread
| safe
| modules in a threaded app? Crash? Do some magic to get the data
| consistent
| before switching threads? Not defined? Never tested? ANY comment
| (preferably from people who know) on this topic as well as on the
| stability of the threading stuff (I sometimes had strange crashes
| during the loading of the program but once it was running, it kept
| running) would be greatly appreciated.
| 
| 
| 
| 
| Best regards,
|  Gabriel
| 
| [1] Everything is fine as long as I don't try to do concurrent fetches
| which I desperately need.
| 
| [2] Python did some two hundred concurrent threads with about 30 MB
| RAM usage on FreeBSD which would be very nice if I could only get
| CPU utilization way down.
| 
| [3] Pointers to any thread safe HTTP or even better HTTP and HTTPS
| libs are very welcome. Preferably code that isn't GPL'd so I can use
| it in a closed source project (but I'd be willing to deal with the
| author of good lib to get a license for this).



More information about the Python-list mailing list