Python Performance vs. C++ in a Complex System

Sun Apr 22 13:52:52 EDT 2001

--- In python-list at y..., Gabriel Ambuehl <gabriel_ambuehl at b...> wrote:
> Sunday, April 15, 2001, 10:26:23 PM, you wrote:
>> Having completed both cores, and with the C++ core HIGHLY
>> OPTIMIZED, I was finally able to perform a performance test of the
>> the C++ system versus the Python system. To my surprise, the C++
>> core only beat Python by about 30%. Given the obvious inequities in
>> coding time in both efforts, plus whatever future coding time
>> inequities I might project onto users of either core by implication
>> of the programming language, I was quite surprised by these
>> results.

Programs which are I/O bound, either because they talk to other slow
programs, or because they do very little processing themselves, will
likely perform similarly in C and Python. The Python version will use
more memory. For example, at eGroups.com, Sam Rushing wrote an
outgoing mailsender called "Newman", completely in Python and his
Medusa async i/o framework. It performs exceedingly well and is only
about 8000 lines of Python. If rewritten in C, it would use less
memory, and probably perform slightly better.

> This is very interesting. I've got to implement a server resource
> monitoring system and had a shot at it in my beloved Python. While
> Python's threading obviously works (something I can't really say
> about C++ as it appears to be not very well thought the whole
> stuff), I found it to be very slow.

You can see some performance comparisons for operations common to
scripting languages at my ScriptPerf page:

http://www.chat.net/~jeske/Projects/ScriptPerf/

I'd say that C++ will perform much better threaded than Python, but
you have to be the smart one doing the locking in C++, wheras Python
helps you out a little bit.

> I'm now thinking about whether I should try to reimplement the whole
> url stuff in C (being C/C++ novice) to see whether this would speed
> up the whole process (or is there any C implementation of an httplib
> for Python that works with it's threading?).

Last time I used httplib, it was terribly slow for two reasons. First,
it was calling write() on the socket for each piece of the HTTP
header. I made it build a string in memory and then only do one
write() and it resulted in a major speed increase. Second, it does DNS
lookups each time you call it. Adding a small DNS cache will get you
another big speed win.

> The major PITA I continually stumbling across is the fact that I
> need to have concurrent service checks, so a single threaded app
> with a large queue as scheduling mechanism isn't of much use.

Python threading has never performed very well for me. Usually, this
is because it's using Pthreads, and you may be using a user-space
implementation of Pthreads. There are usually ways to get around
single points of contention by just allocating your units of work in
larger blocks.

I recommend making a non-threaded test-harness and running the Python
profiler on it. (after you fix httplib)

> I've been thinking about a fork() based solution (AFAIK this is what
> NetSaint is doing) but the reporting of the results isn't doable in
> any halfway reliable or elegant way and it obviously requires way
> more resources than a threaded app.

Sure, you can report results. Just open pipes back to the main
process, and when a child dies, read results off the pipe. If you have
lots of results you might need to make the main process
async/non-block and read results continuously. You can even use Python
marshall to hand back complex data-types.

Going multi-process does not have to mean using lots more
resources. In Linux, a thread is pretty close to a process. If you
load up all the code and then fork(), you'll have something which is
pretty damn close to the efficiency of threading, without the locking
overhead.

> The original idea was to have a constantly running thread for every
> resource to monitor (which can get kinda problematical ram usage
> wise in very big networks but this isn't my problem just now as I
> can throw upto 1GB RAM on this even for a few number of
> hosts[2]). which then schedules itself using sleep(). This appears
> to be working perfectly but slow in Python and not at all (due to
> libcurl[3] related crashes) in C/C++.

Sounds like you should look at the co-routine based version of the
Medusa async-i/o library. It's basically select() based cooperative
multitasking. If you go the next step and use Stackless python, you
can really cut down on your memory usage.

Generally I wouldn't suggest having hundreds of concurrent threads,
even if you were writing your software in C. Just use async I/o with a
few worker threads.

> Ideally, I'd want to implement the whole stuff in C++ (or probably
> some wild mix of C and C++, which generally works pretty ok) with
> existing libraries but obviously nobody thought about giving the
> threading stuff some flag that would take care of the data (so that
> pointers can't get fucked by non thread safe libs while something
> other is executed) and I clearly lack the programming experience to
> do such a complicated task myself (I think it would be possible but
> I've some worries about the performance penalties this could cause).

You certainly should learn how to keep data safe in a threaded
environment before you do threaded programming in C/C++.

> [2] Python did some two hundred concurrent threads with about 30 MB
> RAM usage on FreeBSD which would be very nice if I could only get
> CPU utilization way down.

Try:

1) optimizing httplib as I mentioned

2) don't spawn hundreds of threads, build an async I/O select loop
   (possibly with Medusa), and use a small number of worker threads
   to handle data.

3) run a python profile of your code in a single-threaded test harness

-- 
David Jeske (N9LCA) + http://www.chat.net/~jeske/ + jeske at chat.net