[Distutils] Compiler abstractiom model

Greg Ward gward@cnri.reston.va.us
Mon, 29 Mar 1999 21:35:13 -0500

Hi all --

I've finally done some thinking and scribbling on how to build
extensions -- well, C/C++ extensions for CPython.  Java extensions for
JPython will have to wait, but they are definitely looming on the
horizon as something Distutils will have to handle.

Anyways, here are the conclusions I've arrived at.

  * Stick with C/C++ for now; don't worry about other languages (yet).
    That way we can be smart about C/C++ things like preprocessor
    tokens and macros, include directories, shared vs static
    libraries, source and object files, etc.

  * At the highest level, we should just be able to say "I know nothing,
    just give me a compiler object".  This implies a factory function
    returning instances of concrete classes derived from an abstract
    CCompiler class.  These compiler objects must know how to:
      - compile .c -> .o (or local equivalent)
      - compile multiple .c's to matching .o's
      - be able to define/undefine preprocessor macros/tokens
      - be able to supply preprocessor search directories
      - link multiple .o's to static library (libfoo.a, or local equiv.)
      - link multiple .o's to shared library (libfoo.so, or local equiv.)
      - link multiple .o's to shared object (foo.so, or local equiv.)
      - for all link steps:
          + be able to supply explicit libraries (/foo/bar/libbaz.a)
          + be able to supply implicit libraries (-lbaz)
          + be able to supply search directories for implicit libraries
      - do all this with timestamp-based dependency analysis
        (non-trivial because it requires analyzing header dependencies!)

    Linking to static/shared libraries and dependency analysis are
    optional for now; everything else is required to build C/C++
    extensions for Python.  (At least that's my impression!)

    "Local equivalent" is meant to encompass different filenames for C++ 
    (eg. .C -> .o) and different operating systems/compilers (eg. .c ->
    .obj, multiple .obj's to foo.dll or foo.lib)

BIG QUESTION: I know this will work on Unix, and from my distant
recollections of past work on other systems, it should work on MS-DOS
and VMS too.  I gather that Windows is pretty derivative of MS-DOS, so
will this model work for Windows compilers too?  Do we have to worry
about Windows compilers other than VC++?  But I have *no clue* about
Macintosh compilers -- presumably somebody "out there" (not necessarily
on this SIG, but I hope so!) knows how to compile Python on the Mac, so
hopefully it's possible to compile Python extensions on the Mac.  But
will this compiler abstraction model work there?

Brushing that moment of self-doubt aside, here's a proposed interface
for CCompiler and derived classes.

  define_macro (name [, value])
    define a preprocessor macro or token; this will affect all
    invocations of the 'compile()' method
  undefine_macro (name)
    undefine a preprocessor macro or token

  add_include_dir (dir)
    add 'dir' to the list of directories that will be searched by
    the preprocessor for header files
  set_include_dir ([dirs])
    reset the list of preprocessor search directories; 'dirs' should
    be a list or tuple of directory names; if not supplied, the list
    is cleared

  compile (source, define=macro_list, undef=names, include_dirs=dirs)
    compile source file(s).  'source' may be a sequence of source
    filenames, all of which will be compiled, or a single filename to
    compile.  The optional 'define', 'undef', and 'include_dirs'
    named parameters all augment the lists setup by the above four
    methods.  'macro_list' is a list of either 2-tuples
    (macro_name, value) or bare macro names.  'names' is a list of
    macro names, and 'dirs' a list of directories.

  add_lib (libname)
    add a library name to the list of implicit libraries ("-lfoo")
    to link with
  set_libs ([libnames])
    reset the list of implicit libraries (or clear if 'libnames'
    not supplied)

  add_lib_dir (dir)
    add a directory to the list of library search directories
    ("-L/foo/bar/baz") used when we link
  set_lib_dirs ([dirs])
    reset (or clear) the list of library search directorie

  link_shared_object (objects, shared_object,
                      libs=libnames, lib_dirs=dirs)
    link a set of object files together to create a shared object file.
    The optional 'libs' and 'lib_dirs' parameters only augment the
    lists setup by the previous four methods.

Things to think about: should there be explicit support for "explicit
libraries" (eg. where you put "/foo/bar/libbaz.a" on the command line
instead of trusting "-lbaz" to figure it out)?  I don't think we can
expect the caller to put them in the 'objects' list, because the
filenames are too system-dependent.  My inclination, as you could
probably guess, would be to add methods 'add_explicit_lib()' and
'set_explicit_libs()', and a named parameter 'explicit_libs' to

Also, there would have to be methods to support creating static and
shared libraries: I would call them 'link_static_lib()' and
'link_shared_lib()'.  They would have the same interface as
'link_shared_object()', except the output filename would of course have
to be handled differently.  (To illustrate: on Unix-y systems,
passing shared_object='foo' to 'link_shared_object()' would result in an 
output file 'foo.so'.  But passing output_lib='foo' to
'link_shared_lib()' would result in 'libfoo.so', and passing it to
'link_static_lib()' would result in 'libfoo.a'.

So, to all the Windows and Mac experts out there: will this cover it?
Can the variations in filename conventions and compilation/link schemes
all be shoved under this umbrella?  Or is it back to the drawing board?

Thanks for your comments!

Greg Ward - software developer                    gward@cnri.reston.va.us
Corporation for National Research Initiatives    
1895 Preston White Drive                      voice: +1-703-620-8990 x287
Reston, Virginia, USA  20191-5434               fax: +1-703-620-0913