Real Problems with Python

Wed Feb 9 13:56:11 EST 2000

Python isn't perfect, and I enjoy thoughtful criticism. But I am sick
to death of the -stupid- whitespace flamewar, since it's a) not a
problem, and b) not thoughtful. 

So in effort to throttle that monster, here's a list of half-a-dozen
*real* problems with Python, along with what the current workarounds
are, and my subjective assessment of the prospects of them being
solved in the long term. Enjoy, argue, whatever.

1. Reference counting memory management

   If you need to do anything with any sort of sophisticated cyclic 
   data structures, then reference-counting is deeply annoying. This
   is not something that someone in a high-level language should have
   to worry about!

   In the long run, the solution is to use a conservative garbage
   collection algorithm (such as the Boehm collector), and to use
   reference counts to make sure that finalizers are called in
   topological order. (Cyclic references with finalizers have no good
   solution, period, so it's not worth worrying about them imo.)

   Workarounds:

   o Use JPython. It hijacks Java's garbage collection, so there are
     no problems with cyclic data structures. It doesn't ever call 
     __del__ methods, though I don't think this is a problem in
     practice. 

   o Use John Max Skaller's Viper

     Viper is an interpreter written in OCaml, so like JPythn it
     uses the host language's garbage collection. It's still an
     alpha, though worth looking at if only to learn OCaml. :)

   o Use Neil Schemenauer's gc patch. 

     Neil Schemenauer has added support for using the Boehm garbage
     collector in Python. It uses reference counts in addition to the
     gc for handling finalization, so things like file objects get
     closes at the expected times, while still collecting cyclic
     objects.

   Prognosis:

   Excellent. It looks quite likely that Python 3K will have 
   gc, and in the meantime Neil Schemenauer's patch is entirely
   usable.

2. Lack of lexical scoping

   Tim Peters disagrees, but I miss it a lot, even after using Python
   for years. It makes writing callbacks harder, especially when
   dealing with Tkinter and re.sub, os.path.walk, and basically every
   time a higher-order function is the natural solution to a problem.
   (Classes with __call__ methods are too cumbersome, and lambdas too
   weak, when you need a small function closure that's basically an if
   statement.)

   There's another, more subtle problem -- much of the work that's
   been done on optimizing and analyzing highly dynamic languages has
   been from on the Scheme/ML/Lisp world, and almost of that work
   assumes lexically-scoped environments. So using well-understood
   optimization techniques is made harder by the difference.

   Workarounds:

   o Greg Ewing has a closure patch, that makes functions and 
     lambdas work without creating too much cyclic garbage.

   Prognosis:

   Pretty good. It looks like the consensus is gelling around adding
   it eventually, if only to shut up complainers like me. :)

3. Multiple inheritance is unsound

   By 'unsound' I mean that a method call to a method inherited by a
   subclass of two other classes can fail, even if that method call
   would work on an instance of the base class. So inheritance isn't a
   type-safe operation in Python. This is the result of using
   depth-first search for name resolution in MI. (Other languages,
   like Dylan, do support sound versions of MI at the price of 
   more involved rules for name resolution.)

   This makes formal analysis of program properties (for example,
   value flow analyses for IDEs) somewhere between 'harder' to
   'impossible'.

   Workarounds:

   o Don't use multiple inheritance in Python.

   Prognosis:

   I don't think there's any hope of this ever being fixed. It's just
   too much a part of Python, and there isn't much pressure to fix
   it. Just avoid using MI in most circumstances.

4. Lack of support for highly declarative styles of programming

   It's often Python claimed that Python reads like pseudocode, but
   very often when people are describing algorithms the natural
   description is "find the solution that satisfies the following
   conditions" rather than "nest for loops five deep". 

   If you've done any Prolog (or even SQL) programming you know how
   powerful the constraint-solving mindset is. When a problem can be
   represented as a set of constraints, then writing out the solution
   manually feels very tedious, compared to telling the computer to
   solve the problem for you. :)

   Workarounds:

   o Greg Ewing has implemented list comprehensions for Python. This
     is a big increase in expressive power, since it starts to permit 
     the use of a constraint solving-style. 

   o Christian Tismer's Stackless Python patch enables the use of
     first-class continuations in regular Python code. This lets
     people easily and in pure Python create and experiment with all
     sorts of funky control structures, like coroutines, generators,
     and nondeterministic evaluation.

   Prognosis:

   Pretty good, if Stackless Python makes it in, along with list
   comprehensions. (Personally, I suspect that this is the single most
   important improvement possible to Python, since it opens up a
   whole new category of expressiveness to the language.)

5. The type/class dichotomy, and the lack of a metaobject system.

   C types and Python classes are both types, but they don't really
   understand each other. C types are (kind of) primitive objects, in
   that they can't be subclassed, and this is of course frustrating
   for the usual over-discussed reasons. Also, if everything is an
   object, then subclassing Class becomes possible, which is basically
   all you need for a fully functional metaobject protocol. This
   allows for some extremely neat things.

   Workarounds:

   o Use Jim Fulton's ExtensionClasses to write C extensions that
     can be subclassed in Python code.

   o Read GvR's paper "Metaclasses in Python 1.5" to write metaclasses
     using the Don Beaudry hook. It's not as nice as just subclassing
     Class, though.

   Prognosis:

   Good. The current workarounds are messy, but workable, and
   long-term there's a real determination to solve the problem once
   and for all. 

6. The iteration protocol

   The iteration protocol is kind of hacky. This is partly a function
   of the interface, and partly due to the way the type-class
   dichotomy prevents arranging the collection classes into a nice
   hierarchy.

   The design of the iteration protocol also makes looping over
   recursive data structures (like trees) either slow, if done in the
   obvious way and comprehensible way, or clumsy and weird-looking, if
   you try to define iterator objects. 

   An example: Try writing a linked list class that you can iterate
   over using a for loop.  Then try to write a tree class that you 
   can iterate over in preorder, inorder, and postorderdirection. 
   Then make them efficient, taking no more than O(n) time and O(1)
   space to iterate over all elements. Then try to reuse duplicated
   code. Is the result beautiful?

   Workarounds:

   o Implement iterator objects and just eat the ugliness of
     implementation for clean interface.

   o Don't use trees or graphs, or at least don't expect to use
     them with the standard loop constructs. :)

   o Wait for Guido to invent a better way: Tim Peters has said on the
     newsgroup that GvR is working on a better design,

   Prognosis:

   Good. There aren't any good short-term fixes, but the long term
   outlook is probably fine. We aren't likely to ST blocks, though.

Neel