Does Python really follow its philosophy of "Readability counts"?

Wed Jan 14 14:16:23 EST 2009

"Russ P." <Russ.Paielli at gmail.com> writes:
> I know some researchers in software engineering who believe that the
> ultimate solution to software reliability is automatic code
> generation. The don't really care much which language is used, because
> it would only be an intermediate form that humans don't interact with
> directly. In that scenario, humans would essentially use a "higher
> level" language such as UML or some such thing.
> 
> I personally have a hard time seeing how that could work, but that may
> just be due to be my own lack of understanding or vision.

The usual idea is that you would write a specificiation, and a
constructive mathematical proof that a certain value meets that
specification.  The compiler then verifies the proof and turns it into
code.  Coq (http://coq.inria.fr) is an example of a language that
works like that.  There is a family of jokes that go:

   Q. How many $LANGUAGE programmers does it take to change a lightbulb?
   A. [funny response that illustrates some point about $LANGUAGE].

The instantiation for Coq goes:

   Q. How many Coq programmers does it take to change a lightbulb?
   A. Are you kidding?  It took two postdocs six months just to prove
      that the bulb and socket are threaded in the same direction.

Despite this, a compiler for a fairly substantial C subset has been
written mostly in Coq (http://compcert.inria.fr/doc/index.html).  But,
this stuff is far far away from Python.

I have a situation which I face almost every day, where I have some
gigabytes of data that I want to slice and dice somehow and get some
numbers out of.  I spend 15 minutes writing a one-off Python program
and then several hours waiting for it to run.  If I used C instead,
I'd spend several hours writing the one-off program and then 15
minutes waiting for it to run, which is not exactly better.  (Or, I
could spend several hours writing a parallel version of the Python
program and running it on multiple machines, also not an improvement).
Often, the Python program crashes halfway through, even though I
tested it on a few megabytes of data before starting the full
multi-gigabyte run, because it hit some unexpected condition in the
data that could have been prevented with more compile time checking
that made sure the structures understood by the one-off script matched
the ones in the program that generated the input data.

I would be ecstatic with a version of Python where I might have to
spend 20 minutes instead of 15 minutes writing the program, but then
it runs in half an hour instead of several hours and doesn't crash.  I
think the Python community should be aiming towards this.