Compiler abstractiom model
Hi all -- I've finally done some thinking and scribbling on how to build extensions -- well, C/C++ extensions for CPython. Java extensions for JPython will have to wait, but they are definitely looming on the horizon as something Distutils will have to handle. Anyways, here are the conclusions I've arrived at. * Stick with C/C++ for now; don't worry about other languages (yet). That way we can be smart about C/C++ things like preprocessor tokens and macros, include directories, shared vs static libraries, source and object files, etc. * At the highest level, we should just be able to say "I know nothing, just give me a compiler object". This implies a factory function returning instances of concrete classes derived from an abstract CCompiler class. These compiler objects must know how to: - compile .c -> .o (or local equivalent) - compile multiple .c's to matching .o's - be able to define/undefine preprocessor macros/tokens - be able to supply preprocessor search directories - link multiple .o's to static library (libfoo.a, or local equiv.) - link multiple .o's to shared library (libfoo.so, or local equiv.) - link multiple .o's to shared object (foo.so, or local equiv.) - for all link steps: + be able to supply explicit libraries (/foo/bar/libbaz.a) + be able to supply implicit libraries (-lbaz) + be able to supply search directories for implicit libraries - do all this with timestamp-based dependency analysis (non-trivial because it requires analyzing header dependencies!) Linking to static/shared libraries and dependency analysis are optional for now; everything else is required to build C/C++ extensions for Python. (At least that's my impression!) "Local equivalent" is meant to encompass different filenames for C++ (eg. .C -> .o) and different operating systems/compilers (eg. .c -> .obj, multiple .obj's to foo.dll or foo.lib) BIG QUESTION: I know this will work on Unix, and from my distant recollections of past work on other systems, it should work on MS-DOS and VMS too. I gather that Windows is pretty derivative of MS-DOS, so will this model work for Windows compilers too? Do we have to worry about Windows compilers other than VC++? But I have *no clue* about Macintosh compilers -- presumably somebody "out there" (not necessarily on this SIG, but I hope so!) knows how to compile Python on the Mac, so hopefully it's possible to compile Python extensions on the Mac. But will this compiler abstraction model work there? Brushing that moment of self-doubt aside, here's a proposed interface for CCompiler and derived classes. define_macro (name [, value]) define a preprocessor macro or token; this will affect all invocations of the 'compile()' method undefine_macro (name) undefine a preprocessor macro or token add_include_dir (dir) add 'dir' to the list of directories that will be searched by the preprocessor for header files set_include_dir ([dirs]) reset the list of preprocessor search directories; 'dirs' should be a list or tuple of directory names; if not supplied, the list is cleared compile (source, define=macro_list, undef=names, include_dirs=dirs) compile source file(s). 'source' may be a sequence of source filenames, all of which will be compiled, or a single filename to compile. The optional 'define', 'undef', and 'include_dirs' named parameters all augment the lists setup by the above four methods. 'macro_list' is a list of either 2-tuples (macro_name, value) or bare macro names. 'names' is a list of macro names, and 'dirs' a list of directories. add_lib (libname) add a library name to the list of implicit libraries ("-lfoo") to link with set_libs ([libnames]) reset the list of implicit libraries (or clear if 'libnames' not supplied) add_lib_dir (dir) add a directory to the list of library search directories ("-L/foo/bar/baz") used when we link set_lib_dirs ([dirs]) reset (or clear) the list of library search directorie link_shared_object (objects, shared_object, libs=libnames, lib_dirs=dirs) link a set of object files together to create a shared object file. The optional 'libs' and 'lib_dirs' parameters only augment the lists setup by the previous four methods. Things to think about: should there be explicit support for "explicit libraries" (eg. where you put "/foo/bar/libbaz.a" on the command line instead of trusting "-lbaz" to figure it out)? I don't think we can expect the caller to put them in the 'objects' list, because the filenames are too system-dependent. My inclination, as you could probably guess, would be to add methods 'add_explicit_lib()' and 'set_explicit_libs()', and a named parameter 'explicit_libs' to 'link_shared_objects()'. Also, there would have to be methods to support creating static and shared libraries: I would call them 'link_static_lib()' and 'link_shared_lib()'. They would have the same interface as 'link_shared_object()', except the output filename would of course have to be handled differently. (To illustrate: on Unix-y systems, passing shared_object='foo' to 'link_shared_object()' would result in an output file 'foo.so'. But passing output_lib='foo' to 'link_shared_lib()' would result in 'libfoo.so', and passing it to 'link_static_lib()' would result in 'libfoo.a'. So, to all the Windows and Mac experts out there: will this cover it? Can the variations in filename conventions and compilation/link schemes all be shoved under this umbrella? Or is it back to the drawing board? Thanks for your comments! Greg -- Greg Ward - software developer gward@cnri.reston.va.us Corporation for National Research Initiatives 1895 Preston White Drive voice: +1-703-620-8990 x287 Reston, Virginia, USA 20191-5434 fax: +1-703-620-0913
Hey Greg, I can see some problems with this compilation model even for unix machines: C++ template instantiation For at least some compilers, the compilation flags must be passed to the linker, because the linker instantiates templated code during a "pre-link" step so needs to know the right compilation options. compiler flags Where do I stick "-g" or "-O" in the "compile" function? (Or "-ansi", or for our SGIs, "-o32" ?) Or will you extract these from the Python compilation info? In which case, getting the values for a CPPCompiler would be tricky. include directories As clarification, the "add_include_dir" is for the list of directories used for *all* compilations while the compile(... include_dirs=dirs) is the list needed for just the given source file? I take it the compile() include dirs will be listed first, or is it used instead? Is there any way to get the list of include files (eg, the initial/default list)? lib information Repeat some of the comments from "include directories"
passing shared_object='foo' to 'link_shared_object()' would result in an output file 'foo.so'
As I recall, not all unix-y machines have .so for their shared library extensions. Eg, looking at the Python "makesetup" script, it seems some machines use ".sl". I don't think Python exports this information. I believe at times the order of the -l and -L terms can be important, but I'm not sure. Eg, I think the following -L/home/usa/lib -lfootball -L/home/everyone_else/lib -lfootball lets me do both (American) football -- as in Superbowl -- and soccer (football) -- as in World Cup. Whereas -L/home/usa/lib -L/home/everyone_else/lib -lfootball -lfootball means I link with the same library twice. I think. Andrew dalke@bioreason.com
Quoth Andrew Dalke, on 29 March 1999:
C++ template instantiation For at least some compilers, the compilation flags must be passed to the linker, because the linker instantiates templated code during a "pre-link" step so needs to know the right compilation options.
Ouch! I *knew* there was a reason I disliked C++, I just couldn't put my finger on it... ;-) Maybe we should cop out and only handle C compilation *for now*? Just how many C++ Python extensions are out there now, anyways?
compiler flags Where do I stick "-g" or "-O" in the "compile" function? (Or "-ansi", or for our SGIs, "-o32" ?) Or will you extract these from the Python compilation info? In which case, getting the values for a CPPCompiler would be tricky.
Generally, those things must be done when Python is compiled. Err, let me reiterate that with emphasis: ** COMPILER FLAGS ARE THE RESPONSIBILITY OF THE PYTHON BUILDER ** and Distutils will slurp them out of Python's Makefile (using Fred's distutils.sysconfig module) and use those to build extensions. Andrew, you use SGIs, so you can probably guess what kind of chaos would result if your sysadmin built Python with -o32 and you started building extension modules with -n32. And that's only the most obvious example of what can go wrong when you use compiler flags on a dynamically loaded object inconsistent with the binary that will be loading it. For that and other reasons, I'm quite leery of letting individual extension modules supply things like -ansi or -o32 -- those options should all be stolen straight from Python's Makefile. However, there probably should be a way to set debugging/optimization flags -- again, the default should definitely be to take them from Python's build, but I don't think inconsistent -g/-O will cause problems. (Anyone have evidence to the contrary?) However, this should not be in the CCompiler interface -- I was thinking it belongs in UnixCCompiler instead, because Unix C compilers are fairly consistent about allowing -g, -O, etc. Anything at the CCompiler level should be applicable to all compilers: compiler.debug = 1 # implies "cc -g" on Unix, something else on # other platforms compiler.optimize = 'none' # or 'medium' or 'high'
As clarification, the "add_include_dir" is for the list of directories used for *all* compilations while the compile(... include_dirs=dirs) is the list needed for just the given source file?
Yes; any directories supplied to 'add_include_dir()' and 'set_include_dirs()' would affect *all* compilations. Directories supplied to 'compile()' through the 'include_dirs' named parameter would be *added* to the standard list for that compilation step only. Ditto for macros, libraries, library directories, etc.
I take it the compile() include dirs will be listed first, or is it used instead?
Good point. "added" should be "prepended" above, for maximum clarity.
Is there any way to get the list of include files (eg, the initial/default list)?
Oh, probably. I just haven't documented it. ;-) I think David Ascher's idea of exposing the actual list might be nicer overall -- I'll reply to his post separately.
As I recall, not all unix-y machines have .so for their shared library extensions. Eg, looking at the Python "makesetup" script, it seems some machines use ".sl". I don't think Python exports this information.
I was just using '.so' as an illustration. I'll have to spend some time grovelling through Python's Makefiles and configure stuff to verify your last statement... I certainly hope that information is available, though!
I believe at times the order of the -l and -L terms can be important, but I'm not sure. Eg, I think the following
-L/home/usa/lib -lfootball -L/home/everyone_else/lib -lfootball
lets me do both (American) football -- as in Superbowl -- and soccer (football) -- as in World Cup. Whereas
-L/home/usa/lib -L/home/everyone_else/lib -lfootball -lfootball
means I link with the same library twice.
Auuugghhh!!! This seems like a "feature" to avoid like the plague, and probably one that's not consistent across platforms. Can anyone back up Andrew's claim? I've certainly never seen this behaviour before, but then I haven't exactly gone looking for such perversion. Greg -- Greg Ward - software developer gward@cnri.reston.va.us Corporation for National Research Initiatives 1895 Preston White Drive voice: +1-703-620-8990 x287 Reston, Virginia, USA 20191-5434 fax: +1-703-620-0913
Quoth David Ascher, on 29 March 1999:
On windows, it is sometimes needed to specify other files which aren't .c files, but .def files (possibly all can be done with command line options, but might as well build this in). I don't know how these should fit in...
Are they just listed in the command line like .c files? Or are they specified by a command-line option? Would you use these in code that's meant to be portable to other platforms?
[my add_include_dir()/set_include_dirs() bureaucratic silliness]
Why not expose a list object and have the user modify it?
Duh, you're quite right. I've been doing too much Java lately. Mmmm, bondage...
Yes. In general, I think it's not a bad idea to give control over the command line -- there are too many weird compilers out there with strange options, syntaxes, etc.
But, as I said emphatically in my last post, those sorts of things must be supplied when Python itself is built. I'm already allowing control over include directories and macros -- which are essential -- so I'm willing to throw in -g/-O stuff too. But if we allow access to arbitrary compiler flags, you can kiss portability goodbye! Greg -- Greg Ward - software developer gward@cnri.reston.va.us Corporation for National Research Initiatives 1895 Preston White Drive voice: +1-703-620-8990 x287 Reston, Virginia, USA 20191-5434 fax: +1-703-620-0913
Andrew Dalke, on 29 March 1999, wrote:
As I recall, not all unix-y machines have .so for their shared library extensions. Eg, looking at the Python "makesetup" script, it seems some machines use ".sl". I don't think Python exports
Greg Ward writes:
I was just using '.so' as an illustration. I'll have to spend some time grovelling through Python's Makefiles and configure stuff to verify your last statement... I certainly hope that information is available,
Use the SO variable pulled in from the Makefile; it will be .so or .sl as appropriate. -Fred -- Fred L. Drake, Jr. <fdrake@acm.org> Corporation for National Research Initiatives
Quoth David Ascher, on 29 March 1999:
On windows, it is sometimes needed to specify other files which aren't .c files, but .def files (possibly all can be done with command line options, but might as well build this in). I don't know how these should fit in...
Are they just listed in the command line like .c files? Or are they specified by a command-line option? Would you use these in code that's meant to be portable to other platforms?
Yes, no, and yes. =)
E.g. Python extensions need to declare the 'exported' entry point. This can be done either by modifying the source code (bad for portable code, requires #ifdef's etc.), by specifying a command-line option, or by including a .DEF file.
Just to follow up on this, many obscure options may need to be passed to the Windows linker, but they do require their own option - they are not passed as normal files. Examples are /DEF: - the .def file David mentions, /NOD:lib - to prevent a default library from being linked, /implib:lib to name the built .lib file, etc. It wasnt clear from David's post that these need their own option, and are not passed as a simple filename param to the linker.... Building from what Greg said, I agree that _certain_ command-line params can be mandated - they are designed not to be configurable, as messing them will likely break the build. But there also needs to be a secondary class that are virtually unconstrained. [Sorry - as usual, speaking having kept only a quick eye on the posts, and not having seen the latest code drop...] Mark.
Greg Ward wrote:
Quoth Andrew Dalke, on 29 March 1999:
C++ template instantiation For at least some compilers, the compilation flags must be passed to the linker, because the linker instantiates templated code during a "pre-link" step so needs to know the right compilation options.
Ouch! I *knew* there was a reason I disliked C++, I just couldn't put my finger on it... ;-) Maybe we should cop out and only handle C compilation *for now*? Just how many C++ Python extensions are out there now, anyways?
Enough that you can't simply punt it. For example, most of the win32 extensions are actually C++ stuff. LLNL also uses C++, I believe.
...
Is there any way to get the list of include files (eg, the initial/default list)?
Oh, probably. I just haven't documented it. ;-) I think David Ascher's idea of exposing the actual list might be nicer overall -- I'll reply to his post separately.
This is V1. Keep it dirt simple. Don't create a bazillion APIs. Expose the stuff, let people fill it in, and go. Even better: rather than doing the configuration thru code, do it declaratively where you can. e.g. a file that can be read by ConfigParser.py Also: in your original email, you talked about "factories" and "abstract classes" and crap like that. What are you building? Who needs a factory? Just instantiate some class and go. Python is easy to change and to rewrite. I really dislike seeing people get all wrapped up in a huge design session to create the ultimate API when they'd be better served just writing some code and running with it. Change it later when it becomes necessary -- change is cheap in Python.
...
I believe at times the order of the -l and -L terms can be important, but I'm not sure. Eg, I think the following
-L/home/usa/lib -lfootball -L/home/everyone_else/lib -lfootball
lets me do both (American) football -- as in Superbowl -- and soccer (football) -- as in World Cup. Whereas
-L/home/usa/lib -L/home/everyone_else/lib -lfootball -lfootball
means I link with the same library twice.
Auuugghhh!!! This seems like a "feature" to avoid like the plague, and probably one that's not consistent across platforms. Can anyone back up Andrew's claim? I've certainly never seen this behaviour before, but then I haven't exactly gone looking for such perversion.
Trying linking against Oracle sometime. You're *required* to list a library multiple times. It's really nasty -- they've created all kinds of inter-dependencies between their libraries. Cheers, -g -- Greg Stein, http://www.lyra.org/
David Ascher wrote:
...
But, as I said emphatically in my last post, those sorts of things must be supplied when Python itself is built. I'm already allowing control over include directories and macros -- which are essential -- so I'm willing to throw in -g/-O stuff too. But if we allow access to arbitrary compiler flags, you can kiss portability goodbye!
Not really -- you simply need to make the consequences of messing with certain objects clear to the user, so that if s/he wants portable, s/he does X, Y and Z, but if s/he wants to distribute the code to a specific machine but with all the other machineries that distutils provides, then s/he can do so.
IMHO, portable packaging will come by folks first using it to package their non-portable code because it's easier than doing it the old way.
yes! speak it, brother! Seriously: a number of things should have defaults, but there shouldn't be a reason to *force* developers/users into a particular model. As I've said in the past: if you try to do this, then they just won't use it. Developers are a finicky breed :-) It is especially true with Python: reinventing the wheel is cheap, so it happens a lot. Cheers, -g -- Greg Stein, http://www.lyra.org/
Greg Stein writes:
Trying linking against Oracle sometime. You're *required* to list a library multiple times. It's really nasty -- they've created all kinds of inter-dependencies between their libraries.
And reading their example Makefile is enough to given even the most diehard Unix hacker hernias; I wonder how may developers they hospitalize to develop it! -Fred -- Fred L. Drake, Jr. <fdrake@acm.org> Corporation for National Research Initiatives
Quoth Mark Hammond, on 31 March 1999:
Building from what Greg said, I agree that _certain_ command-line params can be mandated - they are designed not to be configurable, as messing them will likely break the build. But there also needs to be a secondary class that are virtually unconstrained.
I think there will have to be some way to accomodate platform dependencies in a Distutils build. Eg. Win32 extensions are allowed to compile only on Win32, and from what I'm hearing it sounds as though lots of Windows-specific compiler options might need to be snuck in to build a given extension. Or I might want my extension to be built with -O2 as long as gcc is the compiler, no matter how Python was built. Ultimately, this means that you will be free to stick "-n32" into your compiler command line if you're on an SGI and using SGI's compiler, even though this will most likely break things. Partly this is a documentation/social engineering thing, but it can also be addressed by making the compiler options that can (and sometimes must) be set in a portable way part of the compiler abstraction model -- hence the idea of a "compiler.debug" flag, which will control the "-g" switch on Unix C compilers and the local equivalent elsewhere.
[Sorry - as usual, speaking having kept only a quick eye on the posts, and not having seen the latest code drop...]
That's quite all right -- just scatter enough droplets of knowledge, I'll filter through the ones that aren't relevant to the code. Greg -- Greg Ward - software developer gward@cnri.reston.va.us Corporation for National Research Initiatives 1895 Preston White Drive voice: +1-703-620-8990 x287 Reston, Virginia, USA 20191-5434 fax: +1-703-620-0913
On Mon, 29 Mar 1999, Greg Ward wrote:
* At the highest level, we should just be able to say "I know nothing, just give me a compiler object". This implies a factory function returning instances of concrete classes derived from an abstract CCompiler class. These compiler objects must know how to: - compile .c -> .o (or local equivalent) - compile multiple .c's to matching .o's - be able to define/undefine preprocessor macros/tokens - be able to supply preprocessor search directories - link multiple .o's to static library (libfoo.a, or local equiv.) - link multiple .o's to shared library (libfoo.so, or local equiv.) - link multiple .o's to shared object (foo.so, or local equiv.) - for all link steps: + be able to supply explicit libraries (/foo/bar/libbaz.a) + be able to supply implicit libraries (-lbaz) + be able to supply search directories for implicit libraries - do all this with timestamp-based dependency analysis (non-trivial because it requires analyzing header dependencies!)
On windows, it is sometimes needed to specify other files which aren't .c files, but .def files (possibly all can be done with command line options, but might as well build this in). I don't know how these should fit in...
add_include_dir (dir) add 'dir' to the list of directories that will be searched by the preprocessor for header files set_include_dir ([dirs]) reset the list of preprocessor search directories; 'dirs' should be a list or tuple of directory names; if not supplied, the list is cleared
Why not expose a list object and have the user modify it? obj.includes.append(dir) obj.includes.insert(3, dir) obj.includes.extend(dir1, dir2)
add_lib (libname) add a library name to the list of implicit libraries ("-lfoo") to link with set_libs ([libnames]) reset the list of implicit libraries (or clear if 'libnames' not supplied) add_lib_dir (dir) add a directory to the list of library search directories ("-L/foo/bar/baz") used when we link set_lib_dirs ([dirs]) reset (or clear) the list of library search directorie
Idem.
Things to think about: should there be explicit support for "explicit libraries" (eg. where you put "/foo/bar/libbaz.a" on the command line instead of trusting "-lbaz" to figure it out)?
Yes. In general, I think it's not a bad idea to give control over the command line -- there are too many weird compilers out there with strange options, syntaxes, etc. --david
On Tue, 30 Mar 1999, Greg Ward wrote:
Quoth David Ascher, on 29 March 1999:
On windows, it is sometimes needed to specify other files which aren't .c files, but .def files (possibly all can be done with command line options, but might as well build this in). I don't know how these should fit in...
Are they just listed in the command line like .c files? Or are they specified by a command-line option? Would you use these in code that's meant to be portable to other platforms?
Yes, no, and yes. =) E.g. Python extensions need to declare the 'exported' entry point. This can be done either by modifying the source code (bad for portable code, requires #ifdef's etc.), by specifying a command-line option, or by including a .DEF file.
But, as I said emphatically in my last post, those sorts of things must be supplied when Python itself is built. I'm already allowing control over include directories and macros -- which are essential -- so I'm willing to throw in -g/-O stuff too. But if we allow access to arbitrary compiler flags, you can kiss portability goodbye!
Not really -- you simply need to make the consequences of messing with certain objects clear to the user, so that if s/he wants portable, s/he does X, Y and Z, but if s/he wants to distribute the code to a specific machine but with all the other machineries that distutils provides, then s/he can do so. IMHO, portable packaging will come by folks first using it to package their non-portable code because it's easier than doing it the old way. --david
participants (6)
-
Andrew Dalke
-
David Ascher
-
Fred L. Drake
-
Greg Stein
-
Greg Ward
-
Mark Hammond