[Distutils-sig] distutils charter and interscript

John Skaller skaller@maxtal.com.au
Wed, 02 Dec 1998 18:18:10 +1000


At 11:43 1/12/98 -0500, Greg Ward wrote:
>On Tue, Dec 01, 1998 at 09:22:15AM +1000, John Skaller wrote:
>> Interscript is designed to do all this. Except, it has a much wider
>> scope: it isn't limited to building Python, and it includes testing
>> and documentation. (And a few other things -- some of which are implemented
>> already, and some of which are not [such as version control])
>> 
>> There's a generic description of requirements in the documentation at
>> 
>>         http://www.triode.net.au/~skaller/interscript
>
>Cool!  I'm looking through the Interscript docs now.  

        I'm going to grab Macr-Andre's mxDistUtils just as soon as
I can; to see what can be integrated. {I can't seem to get the URL at the
moment}

>I have long
>thought that Python and Perl would both be eminently suitable for
>literate programming because of their nifty embedded documentation
>features.  I've never really figured out how you would resolve the
>conflict between the various target audiences implicit in the
>conventional ways those embedded documentation standards are used.  

        I haven't really figured it out either. I don't think
what interscript does now is a solution, so much as a toolkit,
with a couple of tools that seem useful. I guess the idea is to add
more, and _then_ see how to integrate them.

        One of the key integrating tools, a parser, is sitting there,
unused so far.

>For
>instance, in the Perl world, pod documentation is generally targeted at
>the user of the module, and the better pods provide examples and plenty
>of explanatory text in addition to the nuts and bolts of "here are the
>subroutines/methods provided, and here are the parameters they take".

        Yeah. There's a little program I have somewhere that typesets
the whole perl module library. I used that to test the POD features of the
Perl tangler. Hmmm. It isn't clear I put it in the distribution,
since it has the location of the library (on my machine) hard coded into
it.

>My impression of the use of docstrings in the Python world is that
>because they wind up in the runtime VM code, people tend to make them a
>lot terser, and only give nuts 'n bolts descriptions of modules,
>classes, and subroutines.  Thus building a useful document for most
>modules simply by gluing docstrings together would be a dubious
>prospect.  But still, Python docstrings are targeted at the module's
>users.

        Some doc strings are available at run time. This is important.
They're a kind of introspection utility. I'm considering using them
as links to their interscript web documentation.

>The third target audience, and a much smaller one, is the people who
>really want to understand the implementation.  It has always been my
>impression that this was the goal of literate programming: to provide
>explanations of the data structures and algorithms embodied in the code
>as a high-tech replacement for poorly-maintained or non-existent
>comments.  The "complete documentation" approach of POD, or the
>"barebones nuts 'n bolts documentation" of Python docstrings both seem
>at odds with this.

        My idea is that documentation at all the 'levels' has to be linked.
For example, I like the idea that regressions tests belong in the tutorial.
That way the tutorial has lots of examples, and the documentation describing
the client interface can be 'verified' reasonably easily, since the description
is lexically close to examples that test it.

        Similarly, there must be 'links' between requirements and design
documents, and actual implementation etc etc.

>Anyways, this is way off topic.  

        How can your comments about literate programming be considered
off topic in a discussion of literate programming? <grin>

>I've always been intrigued by the idea
>of literate programming, but never really got much past poking around
>the TeX source and looking (briefly) at CWeb once.  I had heard of
>Interscript from several of the real Python gurus (who I happen to work
>with), but nobody mentioned that it includes an extension building
>system!

        It doesn't. Yet. It includes a _plan_ for one. At least,
a notice of intent :-)

>>         Almost done: interscript creates a source tree in the doco.
>> It doesn't yet provide the ability to 'tar' up those files,
>> but that is fairly trivial.
>
>I assume this is part of tangling: extract source code to make source
>files, and then you know what goes into a source distribution.  

        From the interscript point of view, generated source code
is a transient output, not source. The sources are the _interscript_
sources (called 'original sources').

        But it depends on what you are building for who, when,
etc etc. That is, it seems simple to package up files for distribution,
but more complex to decide which ones, and _tell_ interscript.

        I plan to use categories for this. That's one reason the 'felix' stuff
is in the workspace. ['Felix' is named after a certain cat who had a black bag
of tricks.]

>Of
>course, documentation and test suites also belong in the source
>distributions, so I guess weaving comes into it as well.  Hmmm...

        In the end, the distinction between weavers and tanglers
isn't justified. That is, the distinction between documentation
and code isn't justified. One is just a more 'formal' version of the
other .. or something :-)

>> >   install  - install a built library on the local machine
>> 
>> This is MUCH harder. 

>I don't think installation has to be that hard.  If you go back and look
>at the summary of the "Extension Building" Developer's Day session,
>you'll find a bit about the "blib" directory approach, which I want to
>steal from Perl's MakeMaker.  

        .. OK. Remember, though, that I have a slightly different problem:
I have to cater for 'installation' of systems written in Java, C++,
and MyFavouriteLanguage <TM>, on machines with one user, networks
of workstations in a research lab, commercial production houses,
complience testing laboritories ....

        I firmly believe that a specialised Python installation
mechanism is best designed as a 'subclass' <there, I can say it too :->
of a more general one. Else you get too many non-orthogonal quirks.

>My idea for the distutils is to blatantly rip off as many of these good
>ideas as possible, while making them more cross-platform (ie. no
>requirement for Makefiles) and a bit more object-oriented.  However,
>most of the ideas carry over quite cleanly from Perl to Python.

        I guess what I want is a toolkit for you to build your
installation model _with_. That is, I want to support multiple
models .. in a consistent way.

>> >   gen_make - generate a Makefile to do some of the above tasks
>> >              (mainly 'build', for developer convenience and efficiency)
>> 
>> Please don't! Please generate PYTHON script if it is necessary.
>> Python runs on all platforms, and it has -- or should have --
>> access to all the configuration information.
>
>Yech!  I can't imagine why you'd want to generate a Python script --
>Python is such a dynamic, module-oriented, introspective language that
>it seems crazy to have to generate Python code.  Just write enough
>modules, and make them flexible and well-documented, so that every
>needed build/package/install task is trivial.  Should be doable.

        Hang on! It is clear, that the best way to generate python
script, is to have a library available so the generated script
is small. But you still need to generate the script that ties
it all togther.

        In other words, I agree with you. But it still entails
generating script: a library cannot 'do' anything unless something
call it :-)

>Please note: ***NO ONE WILL HAVE TO USE MAKE FOR ANY OF THIS*** The
>reason I propose a 'gen_make' option is largely for the convenience of
>developers writing collections of C extensions under Unix.  

        The point of the interscript 'compilers' module is that
often you don't need 'make' for compiling C extensions.

        Interscript contains a couple of small optimisations
written in C. If you build interscript .. they get compiled
automatically. As long as you have gcc running on a Linux box
configured like mine. If not, you can edit the compilers module
to suit. All it does is call


        os.system('gcc '+flags+' '+modulename)

[more or less]. Why do we need make?

>However, the people who write collections of C extensions under Unix are
>a small but crucial segment of the Python world.  If we can convince
>these people to use distutils, we win.  

        Some way of specifying external libraries is required.
Make isn't needed at all. It can all be done with os.system()
calls to external tools, plus other Python features.

        Python is quite powerful enough for this!!!!!
Why use make?

>It would be of great convenience
>for these folks if the distutils can generate a Makefile that does
>everything (or almost everything) they need.  MakeMaker can do it -- so
>why can't we?

        We can. But why not generate python script instead of makescript!

>There appears to be growing support for writing a next-generation 'make'
>in Python.  Great idea, but I don't think this is the place for that; if
>such a beast does come into existence, 


        I'll try again. Interscript IS that tool. 
Interscript already builds both Python and C. Make is not required.
But more 'tools' certainly are. Such as a way to read and wrire
tar and zip files. [Planned]

>>         For example, the compilers module I provide happens to 
>> use gnu gcc/g++. So it won't work on NT. Or a Solaris system
>> where the client wants to use the native cc. Etc etc. 
>
>OK, that's a problem.  But as you said in your first message, we should
>look more at your interface than your implementation.  Do you think your 
>implementation could be easily extended to work with more compilers.

        It was design to. But it was designed by a person without
the experience a whole group of users have. So it will probably
need work and a lot of input and discussion to turn the
prototype into a platform independent standard.

>(Most importantly, it should work with the compiler and compiler flags
>used to build Python.  

        I agree. That information should be available.
A distutils package to provide it would be superb.

        Alternatively, or as well as this, one could think about the
platform, site, user, and global frames in interscript, which are designed
to carry that information -- but don't. 

>If you don't have access to that information and
>don't use it, then you can't build properly build extensions to be
>dynamically loaded by Python.  

        Yes. At the moment, the compilers module just has to be
rewritten by the client, for example. It would be better if all the 
required information was available to make a single platform independent
compilers module.

        There must be a way to _extend_ the build information for extensions,
i.e. to add new  libraries and header files to some kind of data base,
with a platform independent key that allows a platform independent build
script to be written.(in python)

>That, incidentally, is why I think a
>patch might be needed for 1.5.2 -- it would probably be a patch to the
>configure/build stuff, and the addition of whatever code is needed to
>make a builtin 'sys.config' module which provides access to everything
>Python's configure/build process knows.  The intention is that this
>stuff would be standard in 1.6.)

        Ah, OK! I agree!! I think there is not enough information.
Shall we make a list of all the information we think could be needed?

>Cool!  That's a neat idea -- had never occured to me.  Not sure if it
>belongs in the "standard Python way of doing things", though.  

        I'm more interested in a 'standard' way of doing things
that covers a wider scope than just Python. After all, Python
has to interface to other things, so you need them too,
whatever they are.

>Do you
>find that having test code intermingled with production code greatly
>increases the size of things?  

        Not so much as having the documentation in there as well.
But then, human written source code is limited in size by the
number of contributors and they're typing speed. :-)

        The point is to avoid too much repetitive code, by writing
code to generate it. In Python, I think you would use this
ability less than say C++, where there's a fair bit of ugly
repetition (declarations in two places, for example).

>I've released a couple of fair-sized
>collections of Perl modules complete with test suites, and I wrote
>roughly as much test code as production code.  

        Then you probably didn't write enough. :-)

        I think it should be more like 2/1.
For doco, more like 5/1. In other words, I expect that the
total amount of production code in publication quality
software should be about 10%.

        PS: I'm talking about 'original sources'.

>I'm not sure if I'd want
>to wade through the test code at the same time as the production code,

        This is a very important point. I do not agree.
I do not disagree. That is the point! It is necessary to
be able to have multiple views of systems. Ideally, they're
dynamically configurable. In the current batch oriented version
of interscript, this is not the case. So eventually, it will
need a GUI.

        The 'web' output is a rough test bed/prototype for the kind
of thing the client will need to configure as a view.

>but I can certainly see the ease-of-access benefits: add a feature to
>the production code, and immediately add a test for it (in addition to
>documenting it).

        or .. at least, have a reminder that you have not yet added a test.
I think the key has to be a flexible system: different programming
languages, systems, and people ...

>> >  * a standard for representing and comparing version numbers
>> 
>>         I think this is very hard. It isn't enough to just use numbers.
>> RCS/CVS provides a tree. With labelled cut points. AFAIK, it cannot
>> handle the important thing: dependencies.
>
>RCS/CVS (and SCCS, for that matter) address the developer; version
>numbers should address the user.  

        I think one could make the following argument:

        1. There are version controls for the developer
        2. There are version controls for the client
        3. They not the same.
        4. They are related.

and that the problem is how to relate them. Since I do not know
how, I plan to use categories to express the relation .. because that
allows any model to be built.

        If you cannot fix a specific soluton, fix an abstract one:
or, 'the solution to all problems in computing is one more level
of indirection' :-)

>And they should be in an obvious
>linear sequence, or your users will get terribly confused.  

        No, that isn't enough. Consider:

        MacVersion
        UnixVersion
        NTVersion

Now consider:

        FreeVersion
        Commercial Version

and then:

        BareBones version
        Delux Version
        Everything including the Kitchen Sink Version

and then there is:

        Informix version
        Sybase Version

etc etc etc.... There cannot be a 'linear' solution to the actual
problem.

>There are a
>whole bunch of possible ways to do version numbering; I'm inclined
>towards the GNU style (1.2.3) with optional alpha/beta tags
>(eg. "1.5.2a2").  Coincentally, this seems to be Guido's version
>numbering system for Python... 

        Yes, I plan the same kind of numbering for interscript;
at least until I have something significantly better.

>Thanks for picking apart my proposal -- I was getting worried that
>everyone would agree with everything I proposed, and I'd be stuck with
>writing all the code.  ;-)

        No, I'll do everything I can to help.
The first thing to do is look at Marc-Andre's stuff. No point wasting
good code!


-------------------------------------------------------
John Skaller    email: skaller@maxtal.com.au
		http://www.maxtal.com.au/~skaller
		phone: 61-2-96600850
		snail: 10/1 Toxteth Rd, Glebe NSW 2037, Australia