Python Performance vs. C++ in a Complex System

Sun Apr 15 16:26:23 EDT 2001

I've been recently working on a agent-based simulation system which is
intended to run with massive numbers of threads in the 15,000-30,000 range.

In my initial prototype of the system, I selected Python with Continuations
as a base, and crafted the simulation core in about two man-weeks. This
core performed well enough at the time, but certain that I could do better,
I decided to use the Python core as a proof-of-concept, and rewrote the
core in C++. This took approximately eight man-weeks of my time and was
quite labor intensive.

Having completed both cores, and with the C++ core HIGHLY OPTIMIZED,
I was finally able to perform a performance test of the the C++ system
versus the Python system. To my surprise, the C++ core only beat Python
by about 30%. Given the obvious inequities in coding time in both efforts,
plus whatever future coding time inequities I might project onto users of
either core by implication of the programming language, I was quite
surprised by these results.

In retrospect, it's the lack of a true continuation model in C++ which is to
blame. The C++ implementation suffers from extensive context-switching
burden, having to frequently swap out all the registers and manage the
stacks for all the threads, whereas Python-with-Continuations does not.

This was an interesting outcome, which I'm still exploring.

It may very well be that the C++ system shows better gains when the
performance tests involve non-trivial code in the threads.

However, it's still remarkable that a Python implementation of a relatively
complex system like this one (an massively-parallel agent simulator),
could come as close to C++ so easily.

One caveat: to those who know simulation engines, the priority heap
at the heart of the Python system was an optimized C-extension.

Joe Kraska
BBN Technologies
San Diego CA