Performances of Pyhton programs

Mon Feb 18 16:46:49 EST 2002

Christian Tanzer wrote:
> In my experience, a Python solution might be anywhere between 100
> times slower to 1000 times faster than a C/C++ solution.

I used to work at a company that sent between 25K and 200K distinct
emails per day through Microsoft Exchange Server (it wasn't spam, people
had to explicitly give us a lot of personal information just to get the
emails). The application that composed the emails from information in a
database and handed them off to Exchange (essentially a mail-merge
engine) was originally written in C++. It took about 2 months to
develop, and we spent a lot of time optimizing it over the years.

Finally, maintaining it and adding new features was becoming too
burdensome so I translated it to Python. It took all of a weekend to do
the translation. Of course, I was only testing with one mail note at a
time while I translated it. When I got back to the office and tried it
on 1000 notes, it finished in about 1/4 the time the C++ version would
normally take. I naturally assumed it was broken and didn't send the
mails. But sure enough, 1000 emails showed up in my inbox. I was very
confused and asked my buddy to help me review/compare the code.

It turned out that I'd implemented an optimization in the C++ code a few
years earlier and made a mistake in the implementation such that the
optimization wasn't as effective as it should've been. In translating to
Python, I "accidentally" fixed it and enjoyed the full benefit. It took
a great deal of comparison between the C++ and Python code to even
realize that the C++ code was broken (I never realized it during the
translation). This C++ code had been through several code reviews over
the years that never spotted the problem.

The moral is that, in Python, it's often easy to do the "right thing" by
accident, whereas in C++ accidents aren't often so fruitful. Within a
week we had implemented two new features in the mailer that we'd been
putting off for months because of the complexity of integrating them
into the mailer. And we still had significantly better performance than
before. And we had <400 lines of Python code to maintain in place of the
3,000+ lines of C++ code we'd had before. And it became a lot more fun
to work with. And I was pissed at myself for not making the change 3
years earlier when I first started using Python at work!

It can't be said often enough that artificial benchmarks measure
"theoretical speed" which often has almost nothing to do with "practical
speed." An overarching Python theme seems to be that the practical is
much more important than the theoretical when they don't coincide
nicely.

Jimmy