Python vs C for a mail server

Sun Jan 29 11:34:24 EST 2006

Jens Theisen <jth01 at arcor.de> wrote:
   ...
> Indeed, especially Eckels article shed some light about testing as an
> alternative to static typing. I still can't quite understand why you can't
> do both. Clearly unit tests should be part of any software, not only
> Python software.

Clearly.  Given this, adding static typing as well still has all the
costs in terms of productivity, but much diminished benefits.  So it's
not a question of "can't", but of "not worth the cost".

> Test failures, however, don't tell you anything about the current usage of
> your program - just about the inteded usage at the point where the test
> was writte. Clearly you can't test _anything_? And clearly you can never
> be sure that all you collegues did so as well? This not only about type
> safety, but simply name safety.

So you are now arguing against any kind of introspection abilities?  Do
dlopen and dlsym sap the power you attribute to C++, then?

> What do you do when you want to no if a certain method or function is
> actually used from somewhere, say "foobar", it a language which allows
> (and even encourages) that it could be called by:
> 
> getattr(obj, "foo" + "bar")()
> 
> ?

"Encourages"?  What a silly assertion.  Python makes introspection
easier than Java's Reflection, C#'s similar capabilities, and C/C++'s
primitive dlopen/dlsym, but the existence of similar dynamic name
resolution abilities in each of these languages reflects similar
underlying real needs.  The functionality is there for those cases in
which it's needed, but it's silly to use it when not needed.

> There is no systematic way to find this call.
> 
> In C++, just commend out the definition and the compiler will tell you.

You're too optimistic re C++: given its complex rules, the compiler
might well find a different 'foobar' once you comment out that one.

But more: if that function is called via dlopen and dlsym, or similar
functionalities on Windows etc, what can the compiler tell you?  It will
tell you nothing, and if you take that as meaning the function is not
called, you're in for a surprise.

So, in C++ like in any of the other languages offering handier
introspection facilities, Python included, you need integration tests
(harder than unit tests) or heuristics based on source analysis (which
won't catch the cases in which the namestrings are constructed
dynamically).

If you're arguing against introspection and dynamic libraries, then
you're not arguing against Python per se, but rather against the whole
grain of modern component-oriented development; since C++, as much as
Java, Python and C#, is most often used for such kind of development,
you should take the debate to a more general forum.

> I'm pretty sure I red a PEP about static type safety in Python at some
> point. It was even thinking about generics I think.

You're probably thinking of Guido's musings in his blog, never actually
formalized to the point of being a PEP.  Of course, he soon realized
that there was nothing "static" about the typechecking functionality he
was (and is) thinking of introducing in Python 3000 -- it's entirely
dynamic, just like a Java cast or a C++ dynamic_cast.  Just like
decorators are just syntax sugar over existing higher-order-function
possibilities, so the typing checks will be syntax sugar over either
existing or propoposed (my PEP246) dynamic adaptation ones.

E.g., you may (in a few years when Python3000 comes) write:

def f(x: Foo): ...

just like you may write (in fewer years if and when PEP246 is accepted):

def f(x):
   x = adapt(x, Foo)
   ...

or perhaps (if PEP246 is rejected) the semantics will be

def f(x):
   assert isinstance(x, Foo)
   ...

Syntax sugar matters: people will be vastly more likely to use
adaptation or typechecking if a nice syntax is provided, just like they
now use HOFs more freely thanks to decorator syntax.  But it does not in
any way change the abilities of the language, there's nothing static
about it, and it doesn't really address any of the criticisms such as
yours -- although it may reduce the volume of complaints if it takes
good understanding of the language to realize the functionality is
indeed dynamic, but that's just like saying that Java's goofy:

  zap = (Zapper) gup;

is better than C++'s more explicit:

  zap = dynamic_cast<Zapper>(gup);

because the latter immediately ADMITS it's dynamic, while the former
LOOKS (confusingly;-) more static (even though its semantics isn't).

I prefer truth in advertising: call dynamic what's dynamic, let the
critics criticize (just like they criticize dynamic_cast), and ignore
them (except when you're looking for excuses not to work, just like I'm
doing now since I should be slaving over the 2nd edition of my Nutshell
book;-).

> > The "but without declaration it can't be self-documenting" issue is a
> > red herring.  Reading, e.g.:
> 
> > int zappolop(int frep) { ...
> 
> > gives me no _useful_ "self-documenting" information
> 
> That's true. If the programmer wants to obfuscate his intention, I'm sure
> neither Python nor C++ can stop him. The question is how much more work is
> to write comprehensible code in one language or the other. I'm a bit
> afraid about Python on that matter.

Well then, look at some Python codebase vs C++ codebase with fair
assessment, rather than theorize.  The standard libraries of the
languages are both vast and available in source form, so they're a good
place to start -- Python's is vaster, because it addresses issues quite
different from C++'s (since Python's library need not strive to provide
containers and iterators), but you can find open-source C++ libraries
for direct comparison (a wonderful example is Boost).

> Python provides ways to easy literal documentation. But I'd really like to
> have a way of indicating what I'm talking about in a way that's ensured to
> be in-sync with the code. Tests are not where the code is. I have  

Actually, tests ARE code, which is part of what makes them better than
comments -- they don't go out of sync like comments may.

> difficulties remembering the type of a lot of symbols, and looking at
> testing code to fresh up usage is more difficult that just jumping to the
> definition (which the development envirnment is likely to be able to).

It's actually harder to find all definitions in C++ given name
overloading; the development environment, just like the programmer, may
get confused as to whether you mean 'int foo(float)' or rather 'float
foo(int)'.  Python doesn't have name overloading, so you just find 'def
foo(', which is an easier task.  I believe IDEs such as Wing and Komodo
offer this.  Personally, being a dinosaur, my 'IDE' is vim+commandline
in either C++ or Python, so I care more about the abilities of
"exuberant ctags", which vim supports nicely; developers who prefer
fancier tools (emacs, Eclipse, or the various IDEs) surely have more
features at their fingertips, not fewer.

> > At Google, we collectively have rather a lot of experience in these
> > issues, since we use three general-purpose languages: Python, Java, C++.
> 
> I have no doubt that goolge know what they're doing, and if you're working
> there then you're likely to know what you're talking about.

Well, Google reached all the way across the Atlantic to hire me as uber
technical lead, making it worth my while to drop my freelance
professional practice and move 9 timezones west; and no doubt my
authorship of "Python in a Nutshell", widely used at Google, helped me
get attention, but I had to prove my excellence in C++ and many other
subjects as well during the extremely strict selection process (no
sweat: before discovering Python, I made my living mostly as a guru in
C++ [and many other technologies, of course], with nice pieces of 'MVP'
parchment &c, though I never wrote books on the subject).

The funny thing is that I found myself starting at Google at about the
same time as Matt Austern, whose many claims at guruhood are in good
part about C++ and generic programming -- we hadn't met before, but
since we already loved each other's books and works, and share many
"side" technical interests (such as functional programming and strict
type theory), it was fortunate that we soon found ourself cooperating.

I've often noticed that many REAL C++ gurus tend to understand and
appreciate the tradeoffs involved in programming language design, and,
whether they end up _preferring_ dynamic languages (like [he said
immodestly;-)] me, Tim Peters, Robert Martin, Bruce Eckel), or still
prefer to stick mostly with mostly-static ones (like Matt Austern,
Andrew Koenig, Scott Meyers) don't exhibit the tendency to assume that
one set of choices is somehow inherently inferior.  There are, of
course, exceptions (just like there are Python gurus who intensely
detest C++...;-), and you'd seem to be one such exception.

> I found it especially astonishing what you had to say against the use of
> smart pointers.

If using a properly garbage-collected language was not an option, I'd be
suffering with "smart" pointers too (I did for many years, and that was
before Boost came to the world with its excellent libraries).  But once
you realize that the main difference of C++ vs Java, Python and all the
rest is exactly that C++ doesn't DO gc, trying to shoehorn gc in it
anyway starts to make little sense.  Performance issues related to CPU
cycles are pretty minor these days, particularly on server-side tasks
where having a few more servers is no big deal and bottlenecks come up
in database access and network operations anyway.  Java's JITs have
matured, and the typical performances of (e.g.) Java SciMark vs the ANSI
C version can show you that CPU cycles aren't really a big deal any
more; Python's JITs are not as mature as Java's, but psyco has been
around a while to show that the possibilities are extremely similar.

But memory - ah, memory!  A 32-bit CPU may let you use 2GB or maybe 3GB
of RAM for your tasks; and depending on how your servers are designed,
you might have four dual-core 32-bit CPUs sharing the same meager 3GB...
pretty obviously, if 4 or 8 cores are splitting that little RAM, then
RAM is the scarce resource, CPU cycles are throw-away.  The situation
may get better with 64-bit CPUs _if_ RAM costs (including ones related
to supplying power and dissipating heat;-) finally crash once again, but
they've been relatively steep for a while, even while CPU-cycle costs
kept dropping; and 64-bit CPUs are still, after years, in the "almost
there but not quite" stage -- consider that Apple has been shipping for
a month a computer where a 64-bit CPU ("G5" PPC) has been replaced with
a dual-core 32-bit one (intel "Yonah" aka Core Duo), and the latter is
faster than the former (it even holds its own, in some cases, where it
has to _emulate_ the former's instruction set, to run some of the many
programs that haven't yet been ported to the new ISA... btw, of course,
Python and Java programs don't have such porting problems, the issue is
with the stuff in C, Objective-C, C++...;-).

As long as memory is THE scarce resource, it makes sense to be able to
be entirely in control of what happens to your memory, even though that
carries the cost of risking dangling pointers, memory leaks, buffer
overflows -- a heavy cost, but (for speed-critical applications) worth
paying to ease the bottlenecks.  Java's particularly bad in this regard:
JVMs suck up memory like there's no tomorrow, and the more they JIT, the
more memory-hungry they are (Python's not quite as memory-greedy, but
still, particularly with psyco JITting away, it's no picnic either).

But if you introduce "too smart for their own good" pointers in the mix,
then suddenly it's not clear any more if you ARE still in total and
exact control of your memory consumption -- you're back into gc land.
And if you must gc, why not use a language with gc built right in?  The
gc designed as an integral part of the system will perform better and
avoid (e.g.) embarassments with reference-cycles causing memory leaks.

So, I would advise: where you can afford to spend a bit more memory than
strictly necessary, go for a gc language; where you can't, C or C++ are
the best choice today.  Of course, there is no requirement to use just
one language for any given program: if you build reusable components,
you may build them in several languages, and use them from applications
coded in other languages yet.  SWIG helps (or, Boost Python, if you
specifically care only about C++ and Python).  Java doesn't really play
well in this space -- you can use Python with it, but interoperating it
well with C++ basically requires separate processes and RPC or the like.
But Python and C++ do work very well together.

Ah well, back to the writing -- apologies in advance if I prove unable
to further continue this thread, but I don't want my editor to come
after me with a cudgel, considering how late the 2nd edition of "Python
in a Nutshell" is already;-).

Alex