Experiences converting Python to C++

Alex Martelli aleaxit at yahoo.com
Fri Aug 17 12:17:41 EDT 2001


"Henrik Ekelund" <henrik_ekelund at yahoo.com> wrote in message
news:f583a35.0108170452.1b3eca85 at posting.google.com...
> We have to convert a large system (10000 lines) written in Python to
> C++. Does anybody have experience with doing an actual conversion of a
> working Python prototype to C++? We will probably have to offer the
> work at a fixed price, so how many hours does it take to convert X
> lines of Python code to C++? Are there any pitfalls?

I have some such experience, but never on such a scale (I think
the 10K lines of Python will translate to roughly 80K-100K lines of
C++, assuming you use *decent* C++ compilers and libraries,
such as gcc 3.0 and Boost, cfr. www.boost.org).  First, ensure all
the underlying libraries you're using (GUI toolkit, network stuff,
databases, etc, etc) are available, OK for your use, etc -- say
there's no trouble there (e.g., if you have wxPython GUI's on
the Python side, you can I think expect no trouble moving to the
wxWindows equivalents on the C++ side).  Remember that you
will often want to translate a plain Python function or class to
a C++ template, which is reasonably straightforward -- this
will greatly ease many translations of signature-based polymorphism,
which is crucial in Python and also underlies C++ templates.  So
if your C++ compiler has trouble with templates, the task is MUCH
harder -- consider using a better front-end in this case, such as
Comeau C++, it's cheap and it does templates right (make sure
you get Dinkumware's Standard C++ library for Comeau C++, as
Comeau's own std library isn't mature and production-quality yet).

Cross-platform is always hard, but much harder in C++ than in
Python, so, if you need to deliver *portable* C++, that is most
definitely something to keep in mind.  If this is the case, try at
least to make sure you're using the SAME C++ compiler (and
underlying platforms, e.g. for GUI and dabase -- wxWindows IS
good...) on all the platforms -- gcc 3.0, or maybe Comeau.

Presumably, your 10,000 lines of working Python code are
accompanied by another 10K or so lines of test harness and
unit-tests.  This is the best aspect of your predicament, as
you don't need (unless you work on a very strange contract)
to translate and deliver the C++ equivalent of the *test* code
too, but you still get the benefit of the test code to ease your
porting to C++!

Specifically, you can work a component (package, related
family of modules, etc) at a time, translating it to C++
templates (and sometimes non-templated classes and
functions); then you dress the translated component up
in Boost Python, put a tiny Python wrapper on it if need
be to ensure it has the same interface as the starting
Python code, and run all of the test-suite.  When that
component passes both unit-tests AND the system passes
acceptance tests equally well with that component in the
original Python version OR the Python-wrapped C++
version, you move on to the next component.  Any time
you find any problem, be sure to add the test causing that
problem to the suite you run and re-run every time!

If you've finessed successfully all other issues I mentioned,
this process will give you very good speed in the process
of translation -- this is what I've observed on smaller
systems (the 'decency' of the target compiler being iffy,
as it's MVC++, but, using it with Boost, ATL, &c, it's not
too bad after all).  Producing 5000 lines of solid, working,
tested and documented C++ from scratch is typically a
job of about 20 to 30 ideal-engineering-hours for a well
matched 2-person team (pairwise-programming is good
for this task, btw).  Translating about 500 to 700 lines
of Python into those 5000 lines of C++, with the same
adjectives applying, should be about 8 to 10 ideal-
engineering-hours.  I *think* (but, I have no first-hand
experience!) that the translation job will scale MUCH
closer to linearly than the writing-from-scratch would,
so your 10,000 lines of Python, while not just 20 times
as much as 500 lines would, should, I believe, take _not
much more_ than 20 times longer (again in terms of
ideal engineering hours -- I hope you're familiar with
this concept and the related one of 'velocity'). All this
is assuming you successfully finesse the issues related
above, plus one key extra one.  Hmm, make that two.

The key extra issue is, of course, memory management.
You're unlikely to have rigid ownership of objects in your
Python code, and assigning such ownership is going to
be a bear.  I would suggest you implement reference
counting as part of your porting-infrastructure, and
have everybody access the C++ objects through a smart
pointer (or smart-proxy) dealing with the reference counts.
That's not going to give you top-notch performance on
the first C++ version, but you can tune that later if
need be -- main thing is to get it working right, and
you can't afford leaks or crashes due to double-free's,
use-after-free, etc... any such thing, and your time
budget goes to hash.

The OTHER key issue is, peopleware.  You need at least
one super-duper-C++-guru in any project that's
going to produce tens of thousands of lines of good
C++ code, AND good to very good C++ competence
in every participating programmer.  If they start from
scratch, budget *MONTHS* for that -- and make sure
you do retain one C++ guru consultant for _at least_
10 hours a week throughout the project if you don't
have one in-house.  How to evaluate if a consultant
IS a C++ guru -- well -- word of mouth is always most
reliable, but I'd suggest trying to gather several good
indicators.  E.g, have him/her estimate this job: if the
estimate is within 20% of what I've given, good, but
if s/he estimates half the time I've suggested, s/he's a
quack (s/he's never done industrial-strength, industrial-
size C++ projects yet s/he's trying to pass him/herself
off as an expert in them...:-).


Best of luck!

Alex (Brainbench MVP for C++)






More information about the Python-list mailing list