[Distutils] Design proposal

Greg Ward gward@cnri.reston.va.us
Sun, 10 Jan 1999 23:12:04 -0500


Well, things have been a little quiet on this list.  Time to shake it up 
a bit, methinks.  So I've written a design proposal -- here it is.

(NB. #1: to avoid sounding overly wafflish, I've written this in a fairly
imperious way -- "X will be done this way, Y is like that".  Don't take
it *too* seriously; this is open for discussion, up for grabs, welcomes
your comments and criticisms, etc.  Of course, I wouldn't have designed
it this way if I didn't think it was good idea, so expect a reaction if
you tell me I'm full of baloney... ;-)

(NB. #2: this file is in the Distutils CVS archive as
'text/design-proposal.txt'.  Once it is better firmed up through the
discussion that I'm sure is forthcoming, I'll HTMLify it, put it in the
SIG's web space, and start coding!)

$Id: design-proposal.txt,v 1.1 1999/01/11 04:06:01 gward Exp $

RECAP: User Interface
=====================

Recall from the proposed user interface (posted at
http://www.python.org/sigs/distutils-sig/interface.html) that the
Distutils will operate via a (usually) trivial Python script,
conventionally called setup.py.  setup.py has the following syntax:

   setup.py [global_options] cmd [cmd_options]

(XXX no provision for multiple commands here! shouldn't be hard, though,
as long as we're firm about what a command is and what an option is)

The Distutils will define a standard set of global command-line options;
each Distutils command will define a set of command options.  The module
developer (the person who wrote the setup.py for his module
distribution) may define a set of distribution-specific command-line
options, which will be mixed in with the global options (and extracted
from them before they can cause any harm).

The first order of business to decide on the set of standard Distutils
commands.  Again, from the proposed interface, here is my initial list:

      make_blib   - create mockup installation tree ("build library")
      build_py    - copy/compile .py files (pure Python modules)
      build_ext   - compile .c files, link to .so in blib
      build_doc   - process documentation (targets: html, info, man, ...?)
      build       - build_py, build_ext, build_doc
      dist        - create source distribution
      bdist       - create built distribution for current platform
      test        - run test suite
      install     - install on local machine
      
please see the above URL for details on these.


REAL STUFF: Proposed Design
===========================

Part 1: from the Distutils' point of view
-----------------------------------------
setup.py only has to import one module, 'distutils.core'.  This module
is responsible for parsing all command-line arguments to setup.py (even
though the interpretation of options is distributed across the various
Distutils commands, and possibly the client setup.py).  It also takes
care of receiving control from setup.py, and passing it as appropriate
to Distutils commands.  Most importantly, 'distutils.core' defines the
'Distribution' class, which is the heart and soul of the Distutils.  The
client (setup.py) exists mainly to provide attributes for a
'Distribution' instance, and all the Distutils commands operate on
that instance.

Speaking of Distutils commands: each one is implemented as a Python
module, e.g. the 'build' command is implemented by the 'distutils.build'
module.  Each command module is required to define one function, also
named for the command -- e.g. 'distutils.build.build'.  This function
takes a 'Distribution' instance as its only required argument, and then
"does its thing" (eg. build extensions, install everything, etc.).  All
information needed to "do its thing" is contained in the 'Distribution'
instance.

(XXX this isn't very OO.  As long as we can fully parameterize Distutils
commands, that's fine: the client (or user) just provides attributes to
go into a 'Distribution' instance, and the command modules use those
attributes to know what to do.  Inevitably, though, someone somewhere
will have to override actual code, and this scheme will break down --
either that or we'll have to put a lot of command-specific stuff into
the 'Distribution' class, which is bad [makes it harder to add new
commands].  A possible alternate formulation: have each command module
define a specially-named class, which is then used to "do its thing".
If a client setup.py needs to override, say, installation behaviour,
then it can create a subclass of distutils.install.Install.  It would
then need to tell the 'Distribution' class to use its derived class --
not an impossible task!)


Part 2: from the client's point of view
---------------------------------------

As I said above, the client (setup.py) only has to import
'distutils.core' -- everything else Distutils-ish is taken care of by
this core module.  However, the client needs a way to communicate its
particular options into the Distutils core (and out to the command
modules).

I have two possible schemes for this: one short and convenient (but not
too extensible), and the other a bit verbose and clunky (but more OO and 
extensible).  There's no reason we can't have our cake and eat it too;
the convenient interface could just be a wrapper for the full-blown
interface for the many module distributions that don't need a lot of
fancy customization.

First, here's an example of the simple interface, used for a module
distribution with a single "pure Python" module (mymod.py).

------------------------------------------------------------------------
from distutils.core import setup
setup (name = "mymod",
       version = "1.2",
       author = "Greg Ward <gward@cnri.reston.va.us>",
       description = "A very simple, one-module distribution")
------------------------------------------------------------------------

Note that we don't explicitly list "mymod.py" anywhere: Distutils
assumes that this is a one-horse distribution named after its sole
module ('mymod').

Those who enjoy defining subclasses might prefer to phrase this
differently:

------------------------------------------------------------------------
from distutils.core import Distribution, setup

class MyDistribution (Distribution):
    name = "mymod"
    version = "1.2",
    author = "Greg Ward <gward@cnri.reston.va.us>",
    description = "A very simple, one-module distribution")

setup (distclass = MyDistribution)
------------------------------------------------------------------------

This is overkill for a small distribution: we're defining a new class
solely to provide attribute values, when the 'distutils.core.setup'
exists mainly to let do this anyways.  Nevertheless, OO purists will
like this -- and undoubtedly there will be times when the client *will*
have to override behaviour -- not just data -- and the OO interface will
be necessary.

And more complex module distributions, with lots of attributes to
customize, might be easier to read/maintain with things broken up like
this.  Consider a distribution with two pure Python modules ('mymod' and
'my_othermod') and a C extension ('myext'); the C extension must be
linked with two ancillary C files and a C library.  Oh yeah, this
distribution requires Python 1.5 and any version of the 're' module:

------------------------------------------------------------------------
from distutils.core import Distribution, setup

class MyDistribution (Distribution):
    name = "mydist",
    version = "1.3.4",
    author = "Greg Ward <gward@cnri.reston.va.us>"
    description = 
"""This is an example module distribution.  It provides no useful code,
but is an interesting example of the Distutils in action."""

    # Dependencies
    requires = { 'python': '1.5',  # I like class-based exceptions
                 're': '',         # and I love Perl-style regexps! ;-)
               }                   # (and yes, I *know* that "Python 1.5" 
                                   # implies 're'...)

    # Actual files that need to be processed and installed in some form
    py_modules = ['mymod.py', 'my_othermod.py'],
    ext_modules = {'myext.c': 
                    {'other_c': ['extra1.c', 'extra2.c'],
                     'c_libraries': ['mylib']}
                  }

setup (distclass = MyDistribution)
------------------------------------------------------------------------

A couple of things to note:
  * I'm not afraid to use deeply nested data structures; if you're
    writing and distributing Python modules, this shouldn't be a problem!
  * every attribute has a particular type (string, list, dictionary, ...)
  * the attributes with complex types (especially dictionaries) will
    have a well-known and well-documented internal structure: eg. 

    """ext_modules is a hash mapping names of C source files (each
    containing a Python extension module) to a nested hash of
    information about how to build that module.  The allowed keys to
    this nested hash are: 
      - other_c: other C files that must be compiled and linked with 
                 the main C file to create the module
      - c_libraries: C libraries that must be included in the link
      ...
   """

    No doubt the 'ext_modules' nested hashes would have more options,
    and no doubt other Distribution attributes would have complex,
    documented structure.

Finally, the list of all Distribution attributes must be well-known and
well-documented!  These seem to fall into a couple of broad categories.
Here's an initial attempt at a list:

  Distribution meta-data
    name
    version
    author
    description

  Dependencies
    requires

  Files to be processed and installed
    py_modules
    ext_modules
    doc_files [eg. SGML source - or whatever std. we get for documentation]

  Build directories [all under "./blib" by default]
    build_lib      - where to put platform-independent library files
    build_platlib  - where to put platform-dependent library files
    build_exe      - where to put executable programs (ie. scripts)
    build_html     - where to put processed documentation (HTML)
    (etc... more documentation formats, at least)

  Installation directories [under sysconfig.LIBDEST]
    install_lib
    install_platlib
    install_exe
    install_html

  C compilation
    cc
    ccshared
    cflags
    ldflags

...well, that's a start.  I still don't know how to make all those
Unixish C compilation variables more cross-platform.


Part 3: revisiting the Distutils' point of view
-----------------------------------------------

To sum up, let's go through what happens when the user runs 'setup.py'.
Whether setup.py is written in the simple (call-a-function) or general
(define-a-subclass) form doesn't matter too much, so I won't split
things up into two streams.

  * setup.py imports distutils.core
  * distutils.core startup code parses command-line arguments: processes 
    global options that it knows about, and saves the rest for the
    client (setup.py) to deal with; saves the command, and saves the
    command-specific options for passing to the command module
  * setup.py calls distutils.core.setup (possibly with a 'distclass'
    argument specifying a subclass of Distribution)
  * distutils.core.setup instantiates Distribution (or the subclass
    supplied by the client), and uses its arguments (apart from
    'distclass') to override attributes of this instance
  * distutils.core.setup loads the command module (eg. 'distutils.build')
  * distutils.core.setup calls the command module's interface function
    (eg. 'distutils.build.build'), passing it the Distribution instance
    and any command-specific options from the setup.py command-line
  * [alternate formulation: distutils.core.setup instantiates the
    command module's interface class (eg. 'distutils.build.Build', or
    an alternate supplied by the client using the as-yet-unmentioned 
    'command_class' attribute.  The Distribution instance and all 
    command-specific options are supplied to the command class so it
    can "do its thing"]

(XXX again, no provision for multiple commands, although it shouldn't be 
too hard.  And the OO approach to writing command modules needs to be
better fleshed out.)


Part 4: Unresolved issues
-------------------------

* Where do we take care of platform dependencies?  Somewhere, sometime,
  we'll need a class or function or attribute named 'foo_posix',
  'foo_win32', 'foo_mac', etc.  We might also need 'foo_linux_i86', 
  'foo_solaris2, 'foo_irix5', 'foo_winnt', 'foo_win98', 'foo_macos8',
  etc.

* ...And that's just within Distutils itself.  What about client code -- 
  what if I have a module that sets itself up differently for different
  distributions; how do I specify that?

* Recusive setup: what if my distribution has subdirectories containing
  other module distributions, with their own setup.py's?  Will this
  ever be needed.  (Undoubtedly.)  How does the client specify them, and
  how does the Distutils run them?




-- 
Greg Ward - software developer                    gward@cnri.reston.va.us
Corporation for National Research Initiatives    
1895 Preston White Drive                      voice: +1-703-620-8990 x287
Reston, Virginia, USA  20191-5434               fax: +1-703-620-0913