[Distutils] Revisiting the "pkginfo" patch

Michael Muller mmuller@enduden.com
Fri, 21 Apr 2000 19:38:29 -0400

Hello again, one moment whilst I swap in some pages...

Greg Ward wrote:
>   * I'm not convinced a separate PackageInfo class is needed -- the
>     Distribution stuff is the home for package meta-data, and if it
>     gets a bit more complex (eg. dependencies list), I think that's
>     OK.  I definitely don't like having two classes (Distribution
>     and PackageInfo) with largely the same info, though.

I disagree.  Distribution contains package meta-info, but it also contains a
lot of information that is relevant to a source distribution: packages,
modules, source files.  PackageInfo includes a subset of that information
(package name, version, author...) and it also includes the final set of
installed files, which appears to me to be the product of the install
commands, not of the Distribution.  [correct me if I'm missing something here,
obviously you know your code better than I do, particularly since I haven't
looked at it in several months].

Furthermore, package information deserves to be seperated out for purposes of
modularity: if people want to create alternate forms of the module (based,
perhaps, on RPM or DBM files), they should be able to plug their replacement
right into the system as long as they conform to a very simple, specific
interface.  Likewise, programmers using alternate build/distribution
technologies should be able to define package information without having to
use distutils.

>   * I'm leery of doing the fancy stuff, namely required packages
>     and compatible versions.  While your data model might well be
>     the Right Thing, it might not, and I don't think this stuff
>     has been sufficiently discussed on the SIG.  And I'm also
>     not sure that adding slots for the data without having code
>     to back them up is right, either.  On the one hand, it's good
>     to get people in the habit of listing requirements/dependencies,
>     but I don't want to raise false expectations that the Distutils
>     will actually *do* anything with that information.  (It will
>     someday, but post-Distutils 1.0/Python 1.6.)

Yeah, I thought about that when I wrote it: I added it anyway in hopes that
the issue of what "The Right Thing" is would be thrashed out on the SIG. 
Under the circumstances, I agree that it should be removed (for now, at

>   * I find your type-checking machinery in pkginfo.py intriguing, but
>     again I'm not sure if it's appropriate.  It's a neat approach to a
>     common problem, but strikes me as over-engineered for this one
>     module.  If I'm going to do really thorough type-checking on the
>     attributes of one class, I'd rather do it everywhere.

It's funny: as I looked at the code again now I was having one of those "what
the &^$@ was I thinking???" monents; but then it all came flooding back to me
like some twisted repressed memory...

The problem is not so much the type checking: it's the persistence.  In order
to read and write the package information, we need to either have code to
write and read each field individually, or have some sort of generalized way
of writing and reading different kinds of content.  The former approach is
difficult to extend and maintain, particularly when you start dealing with
complex nested structures.

The persistence problem is complicated further by the fact that ConfigParser
files favor readability over unambiguous expression.  For example, a string
with no trailing or leading whitespace and no embedded newlines can (and
should) be easily be expressed as "header: value of the string".  Strings with
special needs require special escapes so that these characters are preserved

In order to simplify things and try to keep the package info file syntax as
clear as possible, I decided to do the following:

1) Create classes that know how to read and write certain kinds of data.
2) Map the attributes of the PackageInfo object to instances of these classes
so that reading and writing is just a matter of iterating over the attributes
and calling the associated 'write()' method.
3) Verify that the attribute types are correct in the PackageInfo constructor
to keep an error from occuring at the point where the information is written.

This approach also allows us to do context sensitive parsing, which does a
great deal to clean up the syntax of the package info file.

In the absence of dependency and compatibility information, all of this isn't
as important: however, at some point I'm sure it will be desirable to add more
complicated information to this object.

If it wasn't for the fact that I'd like it to be readable, I'd say we should
just pickle the object.  I liked the original pprint/execfile approach because
it seemed to be the best of both worlds.

michaelMuller = mmuller@enduden.com | http://www.cloud9.net/~proteus
Those who do not understand Unix are condemned to reinvent it, poorly.
                -- Henry Spencer