[Distutils] Package Meta-information Patch
Greg Ward
gward@cnri.reston.va.us
Mon, 17 Jan 2000 12:15:58 -0500
On 28 December 1999, Michael Muller said:
> I've enclosed a patch which addresses a particular concern of mine: package
> meta-information. At the end of the install command, it creates a package
> information file in <install_py.install_dir>/_pkginfo named after the package
> (it also creates the "_pkginfo" directory, if necessary). The file contains
> python variable definitions for the package name, version number, list of
> files installed, dependencies, and compatible versions (although the latter
> two are always empty at this time).
Hmmm... interesting idea. I mean, we've known all along that some sort
of "package metainfo database" (metadatabase? ugh) is going to be needed
for exactly the reasons you listed (uninstall, dependeny analysis, and
system cataloging). I have not spent a lot of time thinking about it,
but I think I was stuck in a "database must be one big file" rut, with
all the attendant problems of performance, concurrent access, etc.
About as far as I got was thinking "text files suck for size and
performance, and DB or dbm files might not be portable enough".
But I think I like your approach -- at least part of it. Specifically,
I think I like the notion of spreading the "metainfo database" across
many files in many directories. To find information about all
module distributions installed, you troll sys.path, looking for a
"_pkginfo" subdirectory in each entry, and then look at the files
installed there. At least, that's the understanding I get from reading
your message and a cursory scan of the patch -- am I right?
This pretty much solves the practical side of "what to do about
concurrent access" -- in practice, it's not going to happen much, so
don't get too worried about it. It doesn't sound very good for
performance, unless all you want is a list of packages installed -- that
should be pretty fast (you can get everything you need from a succession
of os.listdir() calls).
What I'm a little leery about is using Python code as a data format.
It's attractive because we all know the syntax and don't have to write a
parser. But using a general-purpose language for *such* a specific,
tightly-targeted task seems ... I dunno ... overkill-ish. And I wonder
if there are security holes lurking in the concept of using code for
system catalog data.
Does anyone else share my reservations (which are vague, ill-defined,
and more superstitious than anything else)? Conversely, does anyone
think that Python code is absolutely the right way to store module
distribution metadata?
Thanks again for the patch -- I think it should find its way into
Distutils 0.2 after the SIG has thrashed through some of the issues it
raises.
Greg
--
Greg Ward - software developer gward@cnri.reston.va.us
Corporation for National Research Initiatives
1895 Preston White Drive voice: +1-703-620-8990
Reston, Virginia, USA 20191-5434 fax: +1-703-620-0913