Is there a "Large Scale Python Software Design" ?

Alex Martelli aleaxit at
Fri Oct 22 22:46:57 CEST 2004

GerritM <gmuller at> wrote:
> > > these reasons that Python is a big plus during system integration of
> large
> >
> > Not as much as one might hope, in my experience.  Protocol Adaptation
> > _would_ help (see PEP 246), but it would need to be widely deployed.
> >
> I think that I understand how PEP 246 might be an improvement over the
> current situation. However, I think that Python 2.3 capabilities already
> result in smaller programs and presumably less modules than their equivalent

If you are aiming at a given fixed amount of functionality, yes: smaller
programs, and fewer modules (not in proportion, because each module
tends to be smaller).  Modules aren't really the problem in _system
integration_, though; the unit that's developed together, tested
together, released together, is something a bit less definite that is
sometimes called a "component".  It could be a module, more likely it
will be a small number of modules, perhaps grouped into a package.

One of my ideas is that a component needs to be cohesive and coherent.
I'm not alone in thinking that, at any rate.  Therefore, the number of
components in a system with a given number of FP is weakly affected by
the language level of the chosen implementation language[s], because
each component cannot/shouldn't really have more than X function points,
even if using a very high level language means each component is
reasonably small.  To get concrete, already in my previous post I gave
some numbers (indicative ones, of course): 200-300 FP per component,
meaning about 2k-3k SLOCs in Python (functional _application_ code, net
of tests, instrumentation, docs, comments, etc -- about 6k-7k lines as
wc counts them might be a reasonable rules of thumb, about half of them
being tests).

So, if you're building a 5000-FP system, you're going to end up with
about 20 components to integrate -- even though in Python that means 50k
lines of application code, and in Java or C++ it might well be 200k or
more.  The design problem (partitioning) and the system integration may
end up being in the same order of magnitude, or the Python benefit might
be 20%, 30% tops, nothing like the 4:1 or 5:1 advantage you get in the
coding and testing of the specific single components.

My numbers may well be off (I'm trying to be concrete because it's too
easy to handwave away too much, in this field;-) but even if you double
component size and thus halve number of components in each language the
relative ratio remains the same.  Python may gain some advantage by
making components that are a bit richer than the optimal size for Java
or C++ coded ones, but it's still not a many-times-to-one ratio as it is
for the pure issue of coding, in my experience.

> I haven't touched Java for centuries, ehh years). My assumption is that
> integration problems are at least proportional with the implementation size
> (in kloc). So my unproven hypothesis is that since Python programs tend to
> be smaller than their Java equivalent that the integration problems are
> smaller, purely due to the size.

This is the crux of our disagreement.  For a solid component built by
TDD, it's a second-order issue, from the POV of integrating it with the
other components with which it must interact in the overall system, how
big it is internally: the first order issue is, how rich is the
functionality the component supplies to other components, consumes from
them, internally implements.  Integrating two components with the same
amount of functionality and equivalent interfaces between them, assuming
they're both developed solidly wrt the specs that are incarnated in each
component's unit-tests, is weakly dependent on the level of their
implementation languages.

Maybe I'm taking for granted a design approach that requires system
functionality to be well-partitioned among components interacting by
defined interfaces.  But that's not a Python-specific issue: that's what
we were doing, albeit without a fully developed "ideology" well
developed to support it, when in the 2nd half of the '90s Lakos'
milestone book (whose title is echoed in this thread's subject) arrived
to confirm and guide our thinking and practice on the subject.  I'm sure
_survivable_ large systems must be developed along this kind of lines
(with many degrees of variation possible, of course) in any language.

> In the same system mentioned above we build our own instrumentation. The
> main part was based on insering a small piece of adminstrative code at every
> object creation and deletion. This Object Instantation Tracing proved to be
> The investment in the tools mentioned above were relatively small. However,
> this works only if the entire system is based on the same architectural
> rules.

Well, this last sentence might be the killer, since it looks like it
will in turn kill the project's ability to reuse the huge amount of good
code that's out there for the taking.  If you have to invasively modify
the code you're reusing, reuse benefits drop and might disappear.

So I want instrumentation that need not be in the Python sources of
application and library and framework components (multiframework reuse
is also a crux for PEP 246), much as I have for coverage or profiling.
If all it takes is hacking on the Python internals to provide a mode
(perhaps a separate compilation) that calls some sys.newhook at every
creation, sys.delhook at every deletion, etc, then that would IMHO be a
quite reasonable price to pay, for example.

> The additional challenge of Python relative to Objective-C is its garbage
> collection. This provides indeed a poorly predictable memory behavior.

Obj-C uses mark-and-sweep, right?  Like Java?  I'm not sure why
(reference counting bugs in badly tested extensions apart) Python's mix
of RC normally plus MS occasionally should be a handicap here.

> Some of the design aspects mentioned here are described in this chapter of
> my PhD thesis:
> df

Tx, I'll be happy to study this.


More information about the Python-list mailing list