[Distutils] Package Meta-information Patch

Greg Ward gward@cnri.reston.va.us
Mon, 17 Jan 2000 12:15:58 -0500


On 28 December 1999, Michael Muller said:
> I've enclosed a patch which addresses a particular concern of mine: package
> meta-information.  At the end of the install command, it creates a package
> information file in <install_py.install_dir>/_pkginfo named after the package
> (it also creates the "_pkginfo" directory, if necessary).  The file contains
> python variable definitions for the package name, version number, list of
> files installed, dependencies, and compatible versions (although the latter
> two are always empty at this time).

Hmmm... interesting idea.  I mean, we've known all along that some sort
of "package metainfo database" (metadatabase? ugh) is going to be needed
for exactly the reasons you listed (uninstall, dependeny analysis, and
system cataloging).  I have not spent a lot of time thinking about it,
but I think I was stuck in a "database must be one big file" rut, with
all the attendant problems of performance, concurrent access, etc.
About as far as I got was thinking "text files suck for size and
performance, and DB or dbm files might not be portable enough".

But I think I like your approach -- at least part of it.  Specifically,
I think I like the notion of spreading the "metainfo database" across
many files in many directories.  To find information about all
module distributions installed, you troll sys.path, looking for a
"_pkginfo" subdirectory in each entry, and then look at the files
installed there.  At least, that's the understanding I get from reading
your message and a cursory scan of the patch -- am I right?

This pretty much solves the practical side of "what to do about
concurrent access" -- in practice, it's not going to happen much, so
don't get too worried about it.  It doesn't sound very good for
performance, unless all you want is a list of packages installed -- that
should be pretty fast (you can get everything you need from a succession
of os.listdir() calls).

What I'm a little leery about is using Python code as a data format.
It's attractive because we all know the syntax and don't have to write a
parser.  But using a general-purpose language for *such* a specific,
tightly-targeted task seems ... I dunno ... overkill-ish.  And I wonder
if there are security holes lurking in the concept of using code for
system catalog data.

Does anyone else share my reservations (which are vague, ill-defined,
and more superstitious than anything else)?  Conversely, does anyone
think that Python code is absolutely the right way to store module
distribution metadata?

Thanks again for the patch -- I think it should find its way into
Distutils 0.2 after the SIG has thrashed through some of the issues it
raises.

        Greg
-- 
Greg Ward - software developer                    gward@cnri.reston.va.us
Corporation for National Research Initiatives    
1895 Preston White Drive                           voice: +1-703-620-8990
Reston, Virginia, USA  20191-5434                    fax: +1-703-620-0913