[Distutils-sig] distutils charter and interscript

Greg Ward gward@cnri.reston.va.us
Tue, 1 Dec 1998 11:43:27 -0500


On Tue, Dec 01, 1998 at 09:22:15AM +1000, John Skaller wrote:
> Interscript is designed to do all this. Except, it has a much wider
> scope: it isn't limited to building Python, and it includes testing
> and documentation. (And a few other things -- some of which are implemented
> already, and some of which are not [such as version control])
> 
> There's a generic description of requirements in the documentation at
> 
>         http://www.triode.net.au/~skaller/interscript

Cool!  I'm looking through the Interscript docs now.  I have long
thought that Python and Perl would both be eminently suitable for
literate programming because of their nifty embedded documentation
features.  I've never really figured out how you would resolve the
conflict between the various target audiences implicit in the
conventional ways those embedded documentation standards are used.  For
instance, in the Perl world, pod documentation is generally targeted at
the user of the module, and the better pods provide examples and plenty
of explanatory text in addition to the nuts and bolts of "here are the
subroutines/methods provided, and here are the parameters they take".

My impression of the use of docstrings in the Python world is that
because they wind up in the runtime VM code, people tend to make them a
lot terser, and only give nuts 'n bolts descriptions of modules,
classes, and subroutines.  Thus building a useful document for most
modules simply by gluing docstrings together would be a dubious
prospect.  But still, Python docstrings are targeted at the module's
users.

The third target audience, and a much smaller one, is the people who
really want to understand the implementation.  It has always been my
impression that this was the goal of literate programming: to provide
explanations of the data structures and algorithms embodied in the code
as a high-tech replacement for poorly-maintained or non-existent
comments.  The "complete documentation" approach of POD, or the
"barebones nuts 'n bolts documentation" of Python docstrings both seem
at odds with this.

Anyways, this is way off topic.  I've always been intrigued by the idea
of literate programming, but never really got much past poking around
the TeX source and looking (briefly) at CWeb once.  I had heard of
Interscript from several of the real Python gurus (who I happen to work
with), but nobody mentioned that it includes an extension building
system!

>         Almost done: interscript creates a source tree in the doco.
> It doesn't yet provide the ability to 'tar' up those files,
> but that is fairly trivial.

I assume this is part of tangling: extract source code to make source
files, and then you know what goes into a source distribution.  Of
course, documentation and test suites also belong in the source
distributions, so I guess weaving comes into it as well.  Hmmm...

> >   install  - install a built library on the local machine
> 
> This is MUCH harder. If you read my documentation, and examine the
> sections on installation control (site-frame, platform-frame, user-frame)
> you will see I have made fairly advanced provision for installation control.
> I'm not using any of this yet.
> 
> My system discriminates the status of the package, and may install
> it in different places depending on the status. For example,
> when I download a new version of a package, I might put it into
> a test directory and test it, before I install it into a more
> widely accessible place. Furthermore, the install point is conditioned
> by the authors evaluation: is it alpha, beta, or production software?

I don't think installation has to be that hard.  If you go back and look
at the summary of the "Extension Building" Developer's Day session,
you'll find a bit about the "blib" directory approach, which I want to
steal from Perl's MakeMaker.  Basically, ./blib ('build library') is a
mock installation tree that looks quite a lot like a subset of
/usr/local/lib/perl5/site_perl (or, in the distutils case, will look
quite a lot like a subset of /usr/local/lib/python1.x).  (Note that this
is for varying values of "/usr/local/lib/perl5" -- Perl lets you install
its library anywhere, and this information is available via the standard
Config module -- "/usr/local/lib/python1.x" -- Python lets you install
its library anywhere, and this information *should* be available through
some standard module (which I refer to as 'sys.config').)

When you build a Perl module distribution, C extensions are compiled and
the .so files wind up in ./blib/arch; pure-Perl modules (.pm files) are
simply copied into ./blib/lib; and documentation (POD) is converted to
*roff format in ./blib/man.

The advantage of this is (at least) two-fold: first, running the test
suites in a "realistic" environment is trivial: just prepend ./blib/lib
and ./blib/arch to Perl's library search path, and run the test
programs.  Second, installation is trivial: just do recursive copies of
./blib{lib,arch,man} to the appropriate places under
/usr/local/lib/perl5/site_perl.  (Actually, it's a bit smarter than
that: it only copies files that are actually different from
corresponding files in the "official" library directory.)

MakeMaker also allows users to specify a different installation base
than /usr/local/lib/perl5/site_perl, so non-superusers (or superusers
who are just messing around) can install things to their home directory, 
to a temp directory, etc.

My idea for the distutils is to blatantly rip off as many of these good
ideas as possible, while making them more cross-platform (ie. no
requirement for Makefiles) and a bit more object-oriented.  However,
most of the ideas carry over quite cleanly from Perl to Python.

> >   gen_make - generate a Makefile to do some of the above tasks
> >              (mainly 'build', for developer convenience and efficiency)
> 
> Please don't! Please generate PYTHON script if it is necessary.
> Python runs on all platforms, and it has -- or should have --
> access to all the configuration information.

Yech!  I can't imagine why you'd want to generate a Python script --
Python is such a dynamic, module-oriented, introspective language that
it seems crazy to have to generate Python code.  Just write enough
modules, and make them flexible and well-documented, so that every
needed build/package/install task is trivial.  Should be doable.

Please note: ***NO ONE WILL HAVE TO USE MAKE FOR ANY OF THIS*** The
reason I propose a 'gen_make' option is largely for the convenience of
developers writing collections of C extensions under Unix.  It won't be
needed for people writing single-.py-file-distributions, it won't be
needed (but it might be nice) for people writing single-.c-file-
distributions, and it most certainly will be not be needed for people
just installing the stuff.

However, the people who write collections of C extensions under Unix are
a small but crucial segment of the Python world.  If we can convince
these people to use distutils, we win.  It would be of great convenience
for these folks if the distutils can generate a Makefile that does
everything (or almost everything) they need.  MakeMaker can do it -- so
why can't we?

Anyways, I completely agree with your statements about Make being
unreliable, unportable, flaky, and a bear to debug.  I also agree that
we have something better available; that's why the whole proposal is
built around something called 'setup.py'.  The generation of makefiles
is just an optional feature for a small but important segment of the
population.

There appears to be growing support for writing a next-generation 'make'
in Python.  Great idea, but I don't think this is the place for that; if
such a beast does come into existence, then we should add a 'gen_ngmake'
command to distutils, but examining time-based dependences amongst files
is not really germane to most of this discussion.  It's certainly
something that people writing collections of C extensions have to worry
about, and those of them using Unix have a solution -- just not a very
satisfactory (or portable) one.

>         For example, the compilers module I provide happens to 
> use gnu gcc/g++. So it won't work on NT. Or a Solaris system
> where the client wants to use the native cc. Etc etc. 

OK, that's a problem.  But as you said in your first message, we should
look more at your interface than your implementation.  Do you think your 
implementation could be easily extended to work with more compilers.
(Most importantly, it should work with the compiler and compiler flags
used to build Python.  If you don't have access to that information and
don't use it, then you can't build properly build extensions to be
dynamically loaded by Python.  That, incidentally, is why I think a
patch might be needed for 1.5.2 -- it would probably be a patch to the
configure/build stuff, and the addition of whatever code is needed to
make a builtin 'sys.config' module which provides access to everything
Python's configure/build process knows.  The intention is that this
stuff would be standard in 1.6.)

>         Yes. Although interscript allows a slightly different approach:
> test code is embedded directly in the source and is executed during
> the build process. The test code isn't installed anywhere; it's
> written to a temporary file for testing and then deleted afterwards.

Cool!  That's a neat idea -- had never occured to me.  Not sure if it
belongs in the "standard Python way of doing things", though.  Do you
find that having test code intermingled with production code greatly
increases the size of things?  I've released a couple of fair-sized
collections of Perl modules complete with test suites, and I wrote
roughly as much test code as production code.  I'm not sure if I'd want
to wade through the test code at the same time as the production code,
but I can certainly see the ease-of-access benefits: add a feature to
the production code, and immediately add a test for it (in addition to
documenting it).

> >  * a standard for representing and comparing version numbers
> 
>         I think this is very hard. It isn't enough to just use numbers.
> RCS/CVS provides a tree. With labelled cut points. AFAIK, it cannot
> handle the important thing: dependencies.

RCS/CVS (and SCCS, for that matter) address the developer; version
numbers should address the user.  And they should be in an obvious
linear sequence, or your users will get terribly confused.  There are a
whole bunch of possible ways to do version numbering; I'm inclined
towards the GNU style (1.2.3) with optional alpha/beta tags
(eg. "1.5.2a2").  Coincentally, this seems to be Guido's version
numbering system for Python... 

The only two other widely-known models I can think of offhand are Linux
(basically GNU, but with the stable/development version wrinkle) and
Perl (where version numbers can be compared as floating point numbers --
which leads to the madness of the current Perl version being 5.00502!).
I don't think either of these are appropriate.

OK, enough for now.  I have to spend some more time digesting the
Interscript docs, and then I'll try to read the rest of your email in
detail. 

Thanks for picking apart my proposal -- I was getting worried that
everyone would agree with everything I proposed, and I'd be stuck with
writing all the code.  ;-)

        Greg
-- 
Greg Ward - software developer                    gward@cnri.reston.va.us
Corporation for National Research Initiatives    
1895 Preston White Drive                      voice: +1-703-620-8990 x287
Reston, Virginia, USA  20191-5434               fax: +1-703-620-0913