program surgery vs. type safety

Sat Nov 15 01:04:41 EST 2003

aaron at reportlab.com wrote:
> I'm doing a heart/lung bypass procedure on a largish Python
> program at the moment and it prompted the thought that the
> methodology I'm using would be absolutely impossible with a
> more "type safe" environment like C++, C#, java, ML etcetera.

Python is so wonderful for this that in a few cases, I have
actually converted code *to* Python for the express purpose
of refactoring it, even when it has to be converted _back_
to use in the final system!  (Most of the code I have done
this for is control code which runs on a DSP inside a modem.)

Most people who have done serious refactoring would probably
agree that the number of interrelated items a human can consider
at a single time is quite small (probably because the number of
relationships between the items goes up roughly as the square of
the number of items).  For this reason, I have found that for
complicated systems, it can be extremely useful to take very
tiny, iterative steps when refactoring (especially when full,
robust unit tests are not available on the original code).

Tiny iterative steps can be examined and reasoned about in
isolation very successfully, in cases where the sum total
of the changes is beyond this human's comprehension capacity.

However, in some cases (as in perhaps the heart/lung scenario
discussed by Aaron), code requires a fundamental shift in its
structure that is impossible (or at least impractical) to
capture with small iterative steps.

Even when faced with this scenario, I try to design my _process_
for refactoring _this particular piece of code_ in such a
fashion that the scope of this fundamental shift is as small
as possible, e.g. by taking lots of small steps before making
the fundamental shift, and lots of small steps after making it.

So (if you're still with me :) the most interesting thing about
the process is:  The actual conversion of source code to and
from Python can be among the tiniest of iterative steps!

Treating a code conversion to Python as a tiny step in a
refactoring process allows all the hard work of the fundamental
shift to be done _in Python_, which gives you access to all
the wonderful facilities of the language for designing and
testing your new code.  The first runs of your unit tests will
basically insure that you have successfully captured the essence
of the original code during the conversion process.

Python is so malleable that I have very successfully used
it to "look like" C and a few different kinds of assembly
language.  It particularly shines (e.g. in comparison to
C) for modelling assembly language.  Have a function which
returns stuff in registers AX and BX?  No problem:

    def myfunc():
        ...
        return ax,bx

    ...

    ax,bx = myfunc()

Some preexisting code will not convert as nicely as other
code to Python, but this is not a huge problem because, as
described above, you can immediately write Python unit tests
to verify that you have accurately captured the existing code.

Conversion back to the target system can be slightly more
problematic in that it may be impossible to unit-test the
software in its native environment.  The good news here is
that it is almost always possible (in my experience) to make
Python code look arbitrarily close to the new assembly
language I am authoring.

In fact, for the conversion back to assembly language, I tend
to iterate on both the Python and assembly versions simultaneously.
I'll start coding the assembly language to look like the Python,
then realize that I have a construct which doesn't flow very
well in assembler, go back and iterate on (and unit test!) the
Python again to make it look more like the final result, and then
recapture those changes in assembler.

At the end of the process, I will have a fully tested Python version
(with a unit test for subsequent changes) and some assembler which
almost any programmer would agree looks _just like_ the Python (which
admittedly doesn't look like very good Python any more :)

In some cases I just slap the assembly language back into the
system and run system tests on it;  in other cases I have used
the Python unit tests to generate test data which can be fed
to a test harness for the assembly language version in a
simulator.  (In either case, I will have finished more quickly and
have more faith in the resultant code than if I had just tried
to refactor in the original language, using the available tools.)

In the Python version (which doesn't run in a real system at speed),
I am prone to inserting the sort of assertions which Alex asserts
(heh -- got you for yesterday's "reduce") a real design by contract
system would easily enforce, e.g. assert x > y, "The frobowitz fritzed out!"

Given the fact that assembly language is basically untyped and the
fact that I can make the corresponding Python arbitrarily similar
to the assembly language while fully testing and instrumenting it,
I could argue that, for my purposes, the _lack_ of static typing
in Python _contributes heavily_ to its viability as a code refactoring
tool, which seems to parallel your experience.

Regards,
Pat