[Distutils] working on a build system

Sat, 06 Mar 1999 19:00:20 -0800

My current task here at Bioreason is to set up a build/distrib
system for our projects.  We're developing pure Python modules,
Python extensions in C/C++, and stand-alone C++ executables.
I'm trying to get a basic system that lets use work with these
in a standard framework and supports automatic regression tests
with the results emailed or sent to a status web page.

Also, in order to keep the web pages up to date with the source code,
I need to be able to construct some of the pages automatically from
the source and README.

This is not quite what the distutils-sig does, but I figured it
there's a reasonable close fit that I can both contribute and
get some ideas.

The overview of our system (still in the design phase) is
something like this:

  setup --  sets up the original directory structure for a project
This is an interactive/console Python script which asks for the
following information (with defaults as needed):
  project name -- used for the directory and module name
  product name -- can be different since this is the marketing name
  alternate product name -- I've found that having a variant for
      for the product name is useful, eg, the abbrev. or acronym.
  version number -- in the form \d+(\.\d+)*
  status -- a string of letters, like "alpha", "p", "RC"
  cvs root -- we use CVS for our version control, but using "none"
      specifies not to put this into CVS
  contact name -- who's in charge of this thing?
  contact email -- how to contact that person (and where regression
      test results will be sent)
  contact url -- where to find more information about this project
  language -- python, python extension, c++ (eventually more?)

(this seems to be the core set of information I need, but the
framework will be set up so more can be added as needed.)

After this information is gathered, it (will) creates a subdirectory
with the given project name and add the following files:
  info.py  -- contains the configuration information
  configure -- used to generate the Makefile and maybe other files;
     like a "make.dat" file which is include'd by all Makefiles.
  buildno -- a number starting at 1 which is incremented for every
     new build.  In this case, a build is not the C level build
     like Python's C code, but the number passed to QA/testing.
  test/ -- a subdirectory for the regression tests
  README -- a pod (?) file containing the template for the standard
     README.

Let me explain that last one some more.  It seems that Perl's "pod"
format is the easiest to learn and use.  I want to put one level
on top of that.  The hardest thing in my ASCII documentation is to
keep version numbers, URLs, etc. in sync with the main code base,
since I have to change those by hand.  Instead, I propose that during
the process of making a source distribution or doing an install,
the pod files are read into a Python string and %'ed with the info.py
data.
  Thus, I want a file which looks like:

######

=head1 NAME
                         %(PRODUCT_NAME)s
=head1 SYNOPSIS
  This software does something.

=head1 INFORMATION
   For more information about %(ALT_PRODUCT_NAME)s see:
        %(CONTACT_URL)s
######

and which will be filled out with the appropriate information as
needed for an install.

Is there a Python way to read/parse pod files?  If not, using perl for 
this is fine with us.

BTW, to better support names, I've also introduced a few other
variables from those given in info.py, like:
  FULL_NAME = PROJECT_NAME + "-" + VERSION
  if STATUS:
     FULL_NAME = FULL_NAME + "." + BUILDNO + STATUS

so we can have names like:
  daylight-1.0.alpha9
  daylight-1.0.beta3
  daylight-1.0.rc1    (for us, "rc" == "release candidate")
  daylight-1.0        -- build number not included in final release

Okay, so once those files are made, if CVS is being used, the whole
subdirectory is added to CVS, the directory renamed, and the
project pulled out from CVS.

Then cd into the directory and run "configure".  This produces a
Makefile.  The Makefile is not put into CVS because supposedly it
can always be made by running configure; so edit "configure" instead.

The system is now ready for development.  I'll assume it is using
straight Python scripts with no submodules.  In that case, the
Makefile supports the following targets:

buildversion:
    increment the "buildno" file by one

buildtag:
    tag everything in CVS with the FULL_NAME

build: buildversion buildtag
    $(MAKE) src.dist

tests:
    cd test; ./testall

clean:
    probably remove the .pyc and .pyo files

veryclean: clean
    probaly remove emacs *~ files

install:
    do the "standard" install

and probably a few more.  Could someone tell me some other standard
target names; eg, those expected from a Python or GNU project?

I'm playing around with Makefile suffix rules.  What I want to do is
make the ".dist" targets get forwarded to Python.  Under GNU make I
can do something like:

%.dist:
        $(PYTHON) -c "import build; build.distrib('$*')"

so "src.dist" becomes:
  /usr/local/bin/python -c "import build; build.distrib('src')"

but I don't know how to do this sort of trick under non-GNU make.  Any
advice?

I mentioned the phrase "standard" install.  That's a tricky one.  What
I envision is that the build module reads the "info.py" file to
determine the project language settings then imports the module which
handles installs for it.  This will probably find all the files with
the extension ".py" and pass those to the routine which does the
actual installs (eg, copy the files and generate .pyc and .pyo files).

However, this must also be able to import project level extensions,
for example, to also install some needed data files into the install
directory.  I'm not sure about the right way to go about doing this.
Probably the "build.install.python" will try to import some code from
the package, and this will return a list of file names mapped to
install methods, like:

   ('__init__.py', build.install.python_code),  # normal python installer
   ('file.dat', build.install.data_file),    # just copies the file
   ('special.zz', localbuild.install_zz),  # something package specific

then iterate over the list and apply the function on the file.  (And
yes, these should likely be class instances and not a 2-ple.)

Again, haven't figured this out.

Then there's the question of how to deal with submodules.  Most likely
the configure script will check each subdirectory for an __init__.py
file and make the Makefile accordingly.  Of course, it will have to
ignore certain "well known" ones, like whichever directory contains
the project specific build information.

The new Makefile will change a few things, like make some targets
recurse, as "clean:"

clean:
    probably remove the .pyc and .pyo files
    cd submodule1; $(MAKE) clean
    cd submodule2; $(MAKE) clean

and add the appropriate Makefiles to those subdirectories.

My, this is getting more complicated than I thought it would.

Okay, so the final step is the "src.dist" (and "bin.dist" and
"rpm-src.dist" and ... targets).

I figure that the raw source can always be made available from a "cvs
export" followed by a manual tar/gz, so that doesn't need to be
automated.  What does need to be done is the ability to make a source
distribution "for others."  For example, at one place I worked we
stripped out all the RCS log comments when we did a source build.  Or
perhaps some of the modules cannot be distributed (eg, they are
proprietary).

So a basic source distribution must be able to take the existing
files, apply any transformations as needed (eg, convert from README in
pod form to README in straight text) and tar/gzip the result.  We'll
be saving the resulting tarball for archival purposes, and our
installs will likely be done from this distribution, which means it
needs its own Makefile.  In all likelyhood, there will be no
difference between this Makefile and the normal one.  If it is, I
guess it would be generated from the configure script although with
some special command line option.

Then there's questions of how to handle documentation (eg, some of my
documentation is in LaTeX).  For now, I'll just put stuff under "doc/"
and let it lie.  Though the configure script should be able to build a
top-level Makefile which includes targets for building the
documentation and converting into an appropriate form, such as HTML
pages for automated updates to our internal web servers.  Eh, probably
something which forward certain targets to "doc/", like:

docs:
  cd doc; $(MAKE) doc

or even do the make suffix forwarding trick, so I can have targets
like:
  user.doc
  prog.doc
  install.doc

Ugg, to get even more customizable I suppose you would need to
tell LaTeX that certain sections should/should not be used when
making a distribution, in order to reflect the code.  I suppose
the configure script could be made to generate a tex file for
inclusion, but again, I'm not going to worry about that for now.

Most likely the configure script will have the ability for someone to
add:

  make = make + build.makefile.latex_doc( <some set of options?> )

(Thinking about it some, it would be best if "make" were a list of
terms, like:

  make = ( MakeData("tests", "", ("cd test; ./testall",)),
           MakeData(target, dependencies, (list, of, actions)),
	 )

then the conversion to Makefile routine could double check that there
are no duplicate targets, and the package author can fiddle around
with the list before emitting the file.)

Of course, all of this is unix centric.  I know nothing about how to
build these types of systems for MS Windows platforms.  But then, we
don't develop on those platforms, though we will likely distribute
python applications for them.  That's where the ability to have
special ".dist" targets comes in handy.

I've considered less what's needed for a shared library Python
extension or for pure C++ code.  I envison similar mechanisms, but
there will need to be support for things like generating dependencies
for make which isn't needed for a straight Python scripts.  Of course,
this also isn't needed for the distutils-sig so I'll not go into them
here.
  I also don't know much about GNU's configure system, which will be
more useful this sort of environment.  Thus, any solution I give for
that problem will likely only be useful for our environment.

As I said, this is still in the planning stages for us, so I would
like input on these ideas.  Of course, I plan to make this framework
available, and think part of it -- at least some of the ideas -- will
be useful for distutil.

						Andrew
						dalke@bioreason.com